You can make top LLMs break their own rules with gibberish

Elephant0991 ( @Elephant0991@lemmy.bleh.au ) · edit-2 2 years ago

You can make top LLMs break their own rules with gibberish

itsgallus ( @itsgallus@beehaw.org ) · 2 years ago

Oh, I’m not saying there aren’t innate risks. You’re bringing up great points, and I agree we mustn’t throw caution to the wind. This is slightly besides the point of my initial comment, though, where I was merely stating my belief that the “hack” described in the OP might be a non issue in a couple of years. But you are right. Again, I’m sorry about my ignorance. I didn’t mean to start an argument. It’s great hearing other points of view, though.