blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136859883

Preview meta tags from the blog.ai-futures.org website.

Linked Hostnames

3

Thumbnail

Search Engine Appearance

Google

https://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136859883

Scott Alexander on AI Futures Project

I don't think it's helpful to compare AI sycophancy to sycophantic characters in literature. After all, why didn't AI model itself after the many mean, insulting characters in literature? I think sycophancy mostly comes from post-training where people rate friendly and approving answers higher - sycophancy is just the extreme of "friendly and approving". Maybe they also rate sycophantic answers higher, but I would have expected AI companies to get smarter raters than that, I'm not sure.



Bing

Scott Alexander on AI Futures Project

https://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136859883

I don't think it's helpful to compare AI sycophancy to sycophantic characters in literature. After all, why didn't AI model itself after the many mean, insulting characters in literature? I think sycophancy mostly comes from post-training where people rate friendly and approving answers higher - sycophancy is just the extreme of "friendly and approving". Maybe they also rate sycophantic answers higher, but I would have expected AI companies to get smarter raters than that, I'm not sure.



DuckDuckGo

https://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136859883

Scott Alexander on AI Futures Project

I don't think it's helpful to compare AI sycophancy to sycophantic characters in literature. After all, why didn't AI model itself after the many mean, insulting characters in literature? I think sycophancy mostly comes from post-training where people rate friendly and approving answers higher - sycophancy is just the extreme of "friendly and approving". Maybe they also rate sycophantic answers higher, but I would have expected AI companies to get smarter raters than that, I'm not sure.

  • General Meta Tags

    19
    • title
      Comments - Against Misalignment As "Self-Fulfilling Prophecy"
    • title
    • title
    • title
    • title
  • Open Graph Meta Tags

    7
    • og:url
      https://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling/comment/136859883
    • og:image
      https://substackcdn.com/image/fetch/$s_!xB2j!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Faifutures1.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D-688204751%26version%3D9
    • og:type
      article
    • og:title
      Scott Alexander on AI Futures Project
    • og:description
      I don't think it's helpful to compare AI sycophancy to sycophantic characters in literature. After all, why didn't AI model itself after the many mean, insulting characters in literature? I think sycophancy mostly comes from post-training where people rate friendly and approving answers higher - sycophancy is just the extreme of "friendly and approving". Maybe they also rate sycophantic answers higher, but I would have expected AI companies to get smarter raters than that, I'm not sure.
  • Twitter Meta Tags

    8
    • twitter:image
      https://substackcdn.com/image/fetch/$s_!xB2j!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Faifutures1.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D-688204751%26version%3D9
    • twitter:card
      summary_large_image
    • twitter:label1
      Likes
    • twitter:data1
      3
    • twitter:label2
      Replies
  • Link Tags

    31
    • alternate
      /feed
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!sC21!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31e8a5-475f-4ac0-9697-f012e7030b43%2Fapple-touch-icon-57x57.png
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!XlU-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31e8a5-475f-4ac0-9697-f012e7030b43%2Fapple-touch-icon-60x60.png
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!6aEK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31e8a5-475f-4ac0-9697-f012e7030b43%2Fapple-touch-icon-72x72.png
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!E09L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31e8a5-475f-4ac0-9697-f012e7030b43%2Fapple-touch-icon-76x76.png

Links

20