The Waluigi Effect (mega-post) - LessWrong (post on the workings and bias in chatgpt)

Mar 8

This is one of the most interesting things things I've read on lange language models like ChatGPT. And also explains the need for human oversight and that it might look intelligent but actually sometimes is very far from that.

It's an exploration into how large language models (e.g., ChatGPT) create emergent 'personas', and how+why they often collapse into 'evil' versions of those personas (e.g., Microsoft Sydney going weird and creepy). The premise is that all training content is a form of storytelling, and all stories inherently imply the 'evil version' ("if you discover that a country has legislation against motorbike gangs, that will increase your expectation that the town has motorbike gangs. GPT-4 will make the same inference").

4 likes

13 saves

Save

Comment

www.lesswrong.com

Want to join the discussion or bookmark this post for later?

The Waluigi Effect (mega-post) - LessWrong (post on the workings and bias in chatgpt)

Content recommendations from the people you trust.