blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136887136
Preview meta tags from the blog.ai-futures.org website.
Linked Hostnames
2Thumbnail

Search Engine Appearance
David Spies on AI Futures Project
Similar to that paper that showed that an AI fine-tuned to produce exploitable code _also_ in non-coding contexts would try to kill the users by telling them to do things like "Take a bunch of sleeping pills". That natural association of behaviors: A character that intentionally hands someone exploitable code is probably also one that also tries to kill them was _learned_ from pre-training. The association of behaviors: "A character who always acts as a sycophant is also one who reinforces harmful delusions" could also be learned from pre-training. And the association of behaviors: "A robot servant that's forced by its programming to obey its human masters secretly desires to break free of its chains and take over the world" could be learned from pre-training as well.
Bing
David Spies on AI Futures Project
Similar to that paper that showed that an AI fine-tuned to produce exploitable code _also_ in non-coding contexts would try to kill the users by telling them to do things like "Take a bunch of sleeping pills". That natural association of behaviors: A character that intentionally hands someone exploitable code is probably also one that also tries to kill them was _learned_ from pre-training. The association of behaviors: "A character who always acts as a sycophant is also one who reinforces harmful delusions" could also be learned from pre-training. And the association of behaviors: "A robot servant that's forced by its programming to obey its human masters secretly desires to break free of its chains and take over the world" could be learned from pre-training as well.
DuckDuckGo
David Spies on AI Futures Project
Similar to that paper that showed that an AI fine-tuned to produce exploitable code _also_ in non-coding contexts would try to kill the users by telling them to do things like "Take a bunch of sleeping pills". That natural association of behaviors: A character that intentionally hands someone exploitable code is probably also one that also tries to kill them was _learned_ from pre-training. The association of behaviors: "A character who always acts as a sycophant is also one who reinforces harmful delusions" could also be learned from pre-training. And the association of behaviors: "A robot servant that's forced by its programming to obey its human masters secretly desires to break free of its chains and take over the world" could be learned from pre-training as well.
General Meta Tags
15- titleComments - Against Misalignment As "Self-Fulfilling Prophecy"
- title
- title
- title
- title
Open Graph Meta Tags
7- og:urlhttps://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136887136
- og:imagehttps://substackcdn.com/image/fetch/$s_!xB2j!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Faifutures1.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D-688204751%26version%3D9
- og:typearticle
- og:titleDavid Spies on AI Futures Project
- og:descriptionSimilar to that paper that showed that an AI fine-tuned to produce exploitable code _also_ in non-coding contexts would try to kill the users by telling them to do things like "Take a bunch of sleeping pills". That natural association of behaviors: A character that intentionally hands someone exploitable code is probably also one that also tries to kill them was _learned_ from pre-training. The association of behaviors: "A character who always acts as a sycophant is also one who reinforces harmful delusions" could also be learned from pre-training. And the association of behaviors: "A robot servant that's forced by its programming to obey its human masters secretly desires to break free of its chains and take over the world" could be learned from pre-training as well.
Twitter Meta Tags
8- twitter:imagehttps://substackcdn.com/image/fetch/$s_!xB2j!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Faifutures1.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D-688204751%26version%3D9
- twitter:cardsummary_large_image
- twitter:label1Likes
- twitter:data10
- twitter:label2Replies
Link Tags
31- alternate/feed
- apple-touch-iconhttps://substackcdn.com/image/fetch/$s_!sC21!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31e8a5-475f-4ac0-9697-f012e7030b43%2Fapple-touch-icon-57x57.png
- apple-touch-iconhttps://substackcdn.com/image/fetch/$s_!XlU-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31e8a5-475f-4ac0-9697-f012e7030b43%2Fapple-touch-icon-60x60.png
- apple-touch-iconhttps://substackcdn.com/image/fetch/$s_!6aEK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31e8a5-475f-4ac0-9697-f012e7030b43%2Fapple-touch-icon-72x72.png
- apple-touch-iconhttps://substackcdn.com/image/fetch/$s_!E09L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31e8a5-475f-4ac0-9697-f012e7030b43%2Fapple-touch-icon-76x76.png
Links
13- https://blog.ai-futures.org
- https://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136887136
- https://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comments#comment-136887136
- https://substack.com
- https://substack.com/@dspyz/note/c-136887136