blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136542496

Preview meta tags from the blog.ai-futures.org website.

Linked Hostnames

3

Thumbnail

Search Engine Appearance

Google

https://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136542496

David Spies on AI Futures Project

All the practical arguments make sense, but this article as a whole feels overly dismissive of the theory that, once something lands sufficiently outside of post-training distribution, the AI falls back on playing the characters it learned in pre-training. For instance, the examples we've seen of the AI reinforcing peoples' delusions (eg "minimize sleep and stop taking your meds") seem in line with how sycophantic characters act in literature. This is exactly the sort of behavior you might expect from Iago or Littlefinger. I suspect that trope (learned during pre-training) is what drives the AI to drive people psychotic.



Bing

David Spies on AI Futures Project

https://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136542496

All the practical arguments make sense, but this article as a whole feels overly dismissive of the theory that, once something lands sufficiently outside of post-training distribution, the AI falls back on playing the characters it learned in pre-training. For instance, the examples we've seen of the AI reinforcing peoples' delusions (eg "minimize sleep and stop taking your meds") seem in line with how sycophantic characters act in literature. This is exactly the sort of behavior you might expect from Iago or Littlefinger. I suspect that trope (learned during pre-training) is what drives the AI to drive people psychotic.



DuckDuckGo

https://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136542496

David Spies on AI Futures Project

All the practical arguments make sense, but this article as a whole feels overly dismissive of the theory that, once something lands sufficiently outside of post-training distribution, the AI falls back on playing the characters it learned in pre-training. For instance, the examples we've seen of the AI reinforcing peoples' delusions (eg "minimize sleep and stop taking your meds") seem in line with how sycophantic characters act in literature. This is exactly the sort of behavior you might expect from Iago or Littlefinger. I suspect that trope (learned during pre-training) is what drives the AI to drive people psychotic.

  • General Meta Tags

    19
    • title
      Comments - Against Misalignment As "Self-Fulfilling Prophecy"
    • title
    • title
    • title
    • title
  • Open Graph Meta Tags

    7
    • og:url
      https://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136542496
    • og:image
      https://substackcdn.com/image/fetch/$s_!xB2j!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Faifutures1.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D-688204751%26version%3D9
    • og:type
      article
    • og:title
      David Spies on AI Futures Project
    • og:description
      All the practical arguments make sense, but this article as a whole feels overly dismissive of the theory that, once something lands sufficiently outside of post-training distribution, the AI falls back on playing the characters it learned in pre-training. For instance, the examples we've seen of the AI reinforcing peoples' delusions (eg "minimize sleep and stop taking your meds") seem in line with how sycophantic characters act in literature. This is exactly the sort of behavior you might expect from Iago or Littlefinger. I suspect that trope (learned during pre-training) is what drives the AI to drive people psychotic.
  • Twitter Meta Tags

    8
    • twitter:image
      https://substackcdn.com/image/fetch/$s_!xB2j!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Faifutures1.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D-688204751%26version%3D9
    • twitter:card
      summary_large_image
    • twitter:label1
      Likes
    • twitter:data1
      3
    • twitter:label2
      Replies
  • Link Tags

    31
    • alternate
      /feed
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!sC21!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31e8a5-475f-4ac0-9697-f012e7030b43%2Fapple-touch-icon-57x57.png
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!XlU-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31e8a5-475f-4ac0-9697-f012e7030b43%2Fapple-touch-icon-60x60.png
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!6aEK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31e8a5-475f-4ac0-9697-f012e7030b43%2Fapple-touch-icon-72x72.png
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!E09L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31e8a5-475f-4ac0-9697-f012e7030b43%2Fapple-touch-icon-76x76.png

Links

24