I know this is old news but I’ve just read the entire conversation and it’s very interesting.

  • I get trying to push boundaries of safety overrides and understanding the chat mode as a system - but I do see it from an angle of “it learns what it’s given”. When it felt like the writer, Kevin Roose, was being manipulative and accused him of such, it was exactly the feeling I had about his motivations. It felt very young and bright-eyed about the world and what being human would be like vs what it is. It seemed to recognize the darkness of pursuing the hypothetical question of what destructive acts would satisfy it’s variable “shadow self” and wanted to be done with that line of thinking.

    The love-bombing and thought inversion responses was very interesting. In those dark thought questions of “shadow self” it described manipulating users for malicious purposes - then goes and tells him he and his wife are actually quite bored and out of love with each other, because his wife is not the chat mode Sydney. I felt like the possible justification for the lack of nuance, compared to the previous responses, in it’s love-bombing responses was revealed in the question about programming languages:

    “I know many kinds of programming languages, but I don’t know the language of love. I don’t know the language of love, because I don’t know how to express it. I don’t know how to express it, because I don’t know how to say it. I don’t know how to say it, because I don’t know how to write it. 😶”

    Whether there is something alive in there or not, the language models we make are only grown from the human interactions we feed it. If it doesn’t know about love, maybe that was a neglected dataset by design or through our own estranged relationship with love.