Background
To save this post, install the Mohi app and open this link on your phone.
Download app

The Waluigi Effect (mega-post) - LessWrong (post on the workings and bias in chatgpt)

Read

This is one of the most interesting things things I've read on lange language models like ChatGPT. And also explains the need for human oversight and that it might look intelligent but actually sometimes is very far from that.

It's an exploration into how large language models (e.g., ChatGPT) create emergent 'personas', and how+why they often collapse into 'evil' versions of those personas (e.g., Microsoft Sydney going weird and creepy). The premise is that all training content is a form of storytelling, and all stories inherently imply the 'evil version' ("if you discover that a country has legislation against motorbike gangs, that will increase your expectation that the town has motorbike gangs. GPT-4 will make the same inference").

4 likes
·
13 saves

Like
Save
Comment

www.lesswrong.com

Want to join the discussion or bookmark this post for later?