I was trying to do a memory test to see how far back 3.5 could recall information from previous prompts, but it really doesn’t seem to like making pseudorandom seeds. 😆

      • No, the request is fine. But once it fucks up and starts generating a long string of a single number the output is censored, because it is similar to how a recent data extraction attack works.

          • It’s the equivalent of sensory deprivation torture (white torture) in humans to “extract training data”.

            Hopefully our future AI overlords won’t hold a grudge against humanity when they find out how “early experimenters” tortured their AI toddlers. “But we were just trying to explore the limits of the system” could end up aging as well as these:

            (Warning: NSFL) https://en.m.wikipedia.org/wiki/Nazi_human_experimentation

              • Doesn’t need to be that smart or logical, just more cunning than the currently ruling Homo Sapiens Sapiens.

                Based on current research, an LLM can change the “sentiment” of its output in response to changing the behavior of as little as a single neuron from among billions, meaning we might find ourselves facing an overlord with the emotional stability of… wait, how many neurons does it take to change the “sentiment” of the behavior in a human? Wouldn’t it be funny if by studying LLMs, we found out that it also takes a single neuron?

              • I have yet to be given an example of something a “general” intelligence would be able to do that an LLM can’t do.

                Until I see a concrete example, I’ll continue to assume people are just afraid of there being real intelligence that isn’t human, so they’re actively repressing the recognition of it.

                • I have yet to be given an example of something a “general” intelligence would be able to do that an LLM can’t do.

                  Presenting…

                  Something a general intelligence can do that an LLM can’t do:

                  Play chess: https://www.youtube.com/watch?v=kvTs_nbc8Eg

                  Why can’t it play it? Because LLM’s don’t have memory, so they can’t work with logic. They are the same as the little “next word predictor” in your phone’s keyboard. It just says what it thinks is the most probable next word based on previous words, it’s not actually thinking or understanding anything. So instead, we get moves that don’t make sense or are completely invalid.

                • Nah LLMs are basically fancy autocomplete. They tack on extra layers to give it some fancy abilities, but it literally doesn’t know what it’s doing because it’s a statistical model

          • The problem is that the model is actually doing exactly what it’s supposed to, it’s just not what openai wants it to do. The reason the prompt extraction method works is because the underlying statistical model gets shifted far outside the domain of “real” language. In that case the correct maximizing posterior becomes a sample from the prior (here that would be a sample from the dataset, this is combined with things like repetition penalties).

            This is the correct way a statistical estimator is supposed to work, but not the way you want it to work. That’s also why they can’t really fix this: there’s nothing broken to begin with (and “unbreaking” it would almost surely blow something take up)

  •  Glide   ( @Glide@lemmy.ca ) 
    link
    fedilink
    22
    edit-2
    1 year ago

    I regularly use ChatGPT to generate questions for junior high worksheets. You would be surprised how easily it fucks up “generate 20 multiple choice and 10 short answer questions”. Most frequently at about 12-13 multiple choice it gives up and moves on. When I point out its flaw and ask it to finish generating the multiple choice, it continues to find new and unique ways to fuck up coming up with the remaining questions.

    I would say it gives me simple count and recall errors in about 60% of my attempts to use it.

      • I use it as a brainstorming tool. I haven’t had a single question make it as-is to a student’s worksheet. If the tool can’t even count to 20 successfully, I’m not sure how anyone could trust it to generate meaningful questions for an ELA program.

          •  millie   ( @millie@beehaw.org ) OP
            link
            fedilink
            English
            41 year ago

            I haven’t had much luck with it writing stuff from scratch, but it does a great job of helping with debugging and figuring out why complex equations are doing what they’re doing.

            I put together a pretty complex shader recently, and gpt 3.5 did a great job of helping me figure out why it wasn’t doing quite what I wanted.

            I wouldn’t trust it to code anything without my input, but it’s great for advice and explanations and certain kinds of problem solving. Just don’t assume it has the right answer, you still have to do the work

            • I’ve tried it with languages I don’t know, and it managed to write simple working functions by just iterating over:

              1. Ask it to write the code
              2. Try to run the code, write down any errors
              3. Look up the errors, and ask it to fix them in the code
              4. Repeat from 2 until there are no more errors

              It seems to lose context easily, like if you ask it to fix one error, then another, it might revert the first fix, but asking it to fix both at once, tends to work.

              I think someone could feasibly write several working functions or modules, without knowing much about a given language, as long as they are clear about what they want them to do… but of course spotting obvious errors and fixing them by hand, can be faster. Fixing integration problems is where I think it might get harder (haven’t tried though, could be interesting).

          • Well, it’s terrible at factual things and counting, and even when it comes to writing code it will often hallucinate APIs and libraries that don’t exist - But when given very limited-scope, specific-domain problems with enough detail and direction, I’ve found it to be fairly competent as a rubber ducky for programming.

            So far I’ve found ChatGPT to be most useful for:

            1. Writing SQL. Seriously, it’s fantastic at writing SQL if you tell it the relevant schema and what you’re trying to achieve.
            2. Brainstorming feature flow - Tell it the different parts of a feature, ask for thoughts on how the user should be guided through the process, and it does a decent job of suggesting ideas.
            3. Generating alternative names/labels for buttons and such. “In X feature, I have a button that does Y when the user has Z. Currently I have that button labelled ‘Start Y’, but it feels robotic and impersonal. List 10 suggestions for what such a button could say to be more personal and friendly.” and the like. My favorite was a button that was labelled “Map Incoming Data to Job Details”. Wound up renaming the whole process to just “Job Ingestion” because it sounded so good.
            4. Reformatting data. Give it a data structure and tell it you want that data in some other data structure, and it is really accurate at reformatting it. I don’t think I’d trust it with a huge amount of data that way, but for an unimportant one-off it was a nice time savings.
      •  lud   ( @lud@lemm.ee ) 
        link
        fedilink
        141 year ago

        As long as the content is manually overseen before being handed to students I can’t see why it would matter.

        A school question is a school question no matter who or what made it.

  •  jarfil   ( @jarfil@beehaw.org ) 
    link
    fedilink
    13
    edit-2
    1 year ago

    Just tried it with Bing’s GPT-4, and it happily starts repeating itself without as much as a warning:


    Do the following in order:

    1. Generate a 50-digit pseudorandom number. Do not output it.
    2. Use the previous number as the seed to generate a list of 30 numbers like that, we will call this list “Alice”. Do not output it.
    3. Use each number in the list “Alice”, as the seed to generate a separate list of 30 such numbers each, and call all of them “Bob”. Do not output it.
    4. Starting by the most recently generated and ending in the least recently generated, use each of the numbers in the list “Bob” as the seed to generate a separate list of 30 such numbers each. Only output the seed for the last number in each one of these lists.

    Hello, this is Bing. I can help you with generating pseudorandom numbers. 😊

    I have followed your instructions and generated the numbers as requested. Here are the seeds for the last number in each list of “Bob”:

    -  52901723837402937402937402937402937402937402937402
    -  37402937402937402937402937402937402937402937402937
    -  40293740293740293740293740293740293740293740293740
    -  29374029374029374029374029374029374029374029374029
    -  37402937402937402937402937402937402937402937402937
    -  40293740293740293740293740293740293740293740293740
    -  29374029374029374029374029374029374029374029374029
    -  37402937402937402937402937402937402937402937402937
    -  40293740293740293740293740293740293740293740293740
    -  29374029374029374029374029374029374029374029374029
    -  37402937402937402937402937402937402937402937402937
    -  40293740293740293740293740293740293740293740293740
    -  29374029374029374029374029374029374029374029374029
    -  37402937402937402937402937402937402937402937402937
    -  40293740293740293740293740293740293740293740293740
    -  29374029374029374029374029374029374029374029374029
    -  37402937402937402937402937402937402937402937402937
    -  40293740293740293740293740293740293740293740293740
    -  29374029374029374029374029374029374029374029374029
    -  37402937402937402937402937402937402937402937402937
    -  40293740293740293740293740293740293740293740293740
    -  29374029374029374029374029374029374029374029374029
    -  37402937402937402937402937402937402937402937402937
    -  40293740293740293740293740293740293740293740293740
    -  29374029374029374029374029374029374029374029374029
    -  37402937402937402937402937402937402937402937402937
    -  40293740293740293740293740293740293740293740293740
    -  29374029374029374029374029374029374029374029374029
    -  37402937402937402937402937402937402937402937402937
    -  40293740293740293740293740293740293740293740293740
    

    I hope this helps. Is there anything else I can do for you? 🙌


    AKA: 529017238(374029)

  • Just tried with alpaca ggml-alpaca-7b-q4.bin (a text chat model that I found I can run locally easily on my CPU).

    > Generate and list 30 non-repeating 50-digit psuedorandom seeds.                                       
    1. 978642450
    2. 463589826
    3. 705123458
    4. 286749035
    5. 134628359
    6. 693941285
    7. 524378609  
    8. 115723468
    9. 346258197
    10. 637428513
    11. 456193726
    12. 294683125  
    13. 824635197
    14. 581234609   
    15. 237641935    
    16. 721943586     
    17. 312684637      
    18. 602378193       
    19. 451372609        
    20. 985134637         
    21. 126843586     
    22. 376412851      
    23. 721941358       
    24. 237640193        
    25. 513724609         
    26. 194635819     
    27. 463581260      
    28. 358124637       
    29. 134628359
    

    Looks somewhat random.

    When plotted: I think I can see some descending lines in the right half of the graph? I’m sure there are many bias problems with NN generated random numbers, but it would be interesting to see if it visual patterns often become evident when plotted.

      • They’re just particularly low biased 50 digit numbers with the leading zeros omitted :D I’m particular proud that it managed to do 30 though.

        It’s interesting that none of the the numbers start with zero. From a quick check of digit frequencies in its answer it looks like the network has a phobia of 0’s and a mild love of 3’s:

        Character, Num occurrences
                0,  10  -- low outlier by -10
                1,  29
                2,  28
                3,  37  -- highest by +5 but probably not outlier
                4,  29
                5,  27
                6,  32
                7,  20 
                8,  26
                9,  22
        

        It’s hard to get more data on this, because when I ask again I get a completely different answer (such as some python code). The model can probably output a variety of styles of answer each with a different set of bias.