blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136542496
Preview meta tags from the blog.ai-futures.org website.
Linked Hostnames
3Thumbnail

Search Engine Appearance
David Spies on AI Futures Project
All the practical arguments make sense, but this article as a whole feels overly dismissive of the theory that, once something lands sufficiently outside of post-training distribution, the AI falls back on playing the characters it learned in pre-training. For instance, the examples we've seen of the AI reinforcing peoples' delusions (eg "minimize sleep and stop taking your meds") seem in line with how sycophantic characters act in literature. This is exactly the sort of behavior you might expect from Iago or Littlefinger. I suspect that trope (learned during pre-training) is what drives the AI to drive people psychotic.
Bing
David Spies on AI Futures Project
All the practical arguments make sense, but this article as a whole feels overly dismissive of the theory that, once something lands sufficiently outside of post-training distribution, the AI falls back on playing the characters it learned in pre-training. For instance, the examples we've seen of the AI reinforcing peoples' delusions (eg "minimize sleep and stop taking your meds") seem in line with how sycophantic characters act in literature. This is exactly the sort of behavior you might expect from Iago or Littlefinger. I suspect that trope (learned during pre-training) is what drives the AI to drive people psychotic.
DuckDuckGo
David Spies on AI Futures Project
All the practical arguments make sense, but this article as a whole feels overly dismissive of the theory that, once something lands sufficiently outside of post-training distribution, the AI falls back on playing the characters it learned in pre-training. For instance, the examples we've seen of the AI reinforcing peoples' delusions (eg "minimize sleep and stop taking your meds") seem in line with how sycophantic characters act in literature. This is exactly the sort of behavior you might expect from Iago or Littlefinger. I suspect that trope (learned during pre-training) is what drives the AI to drive people psychotic.
General Meta Tags
19- titleComments - Against Misalignment As "Self-Fulfilling Prophecy"
- title
- title
- title
- title
Open Graph Meta Tags
7- og:urlhttps://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136542496
- og:imagehttps://substackcdn.com/image/fetch/$s_!xB2j!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Faifutures1.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D-688204751%26version%3D9
- og:typearticle
- og:titleDavid Spies on AI Futures Project
- og:descriptionAll the practical arguments make sense, but this article as a whole feels overly dismissive of the theory that, once something lands sufficiently outside of post-training distribution, the AI falls back on playing the characters it learned in pre-training. For instance, the examples we've seen of the AI reinforcing peoples' delusions (eg "minimize sleep and stop taking your meds") seem in line with how sycophantic characters act in literature. This is exactly the sort of behavior you might expect from Iago or Littlefinger. I suspect that trope (learned during pre-training) is what drives the AI to drive people psychotic.
Twitter Meta Tags
8- twitter:imagehttps://substackcdn.com/image/fetch/$s_!xB2j!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Faifutures1.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D-688204751%26version%3D9
- twitter:cardsummary_large_image
- twitter:label1Likes
- twitter:data13
- twitter:label2Replies
Link Tags
31- alternate/feed
- apple-touch-iconhttps://substackcdn.com/image/fetch/$s_!sC21!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31e8a5-475f-4ac0-9697-f012e7030b43%2Fapple-touch-icon-57x57.png
- apple-touch-iconhttps://substackcdn.com/image/fetch/$s_!XlU-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31e8a5-475f-4ac0-9697-f012e7030b43%2Fapple-touch-icon-60x60.png
- apple-touch-iconhttps://substackcdn.com/image/fetch/$s_!6aEK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31e8a5-475f-4ac0-9697-f012e7030b43%2Fapple-touch-icon-72x72.png
- apple-touch-iconhttps://substackcdn.com/image/fetch/$s_!E09L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31e8a5-475f-4ac0-9697-f012e7030b43%2Fapple-touch-icon-76x76.png
Links
24- https://blog.ai-futures.org
- https://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136542496
- https://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136825666
- https://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136859883
- https://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136884265