substack.com/@dspyz/note/c-136887136

Preview meta tags from the substack.com website.

Linked Hostnames

1

Thumbnail

Search Engine Appearance

Google

https://substack.com/@dspyz/note/c-136887136

David Spies (@dspyz)

Similar to that paper that showed that an AI fine-tuned to produce exploitable code _also_ in non-coding contexts would try to kill the users by telling them to do things like "Take a bunch of sleeping pills". That natural association of behaviors: A character that intentionally hands someone exploitable code is probably also one that also tries to kill them was _learned_ from pre-training. The association of behaviors: "A character who always acts as a sycophant is also one who reinforces harmful delusions" could also be learned from pre-training. And the association of behaviors: "A robot servant that's forced by its programming to obey its human masters secretly desires to break free of its chains and take over the world" could be learned from pre-training as well.



Bing

David Spies (@dspyz)

https://substack.com/@dspyz/note/c-136887136

Similar to that paper that showed that an AI fine-tuned to produce exploitable code _also_ in non-coding contexts would try to kill the users by telling them to do things like "Take a bunch of sleeping pills". That natural association of behaviors: A character that intentionally hands someone exploitable code is probably also one that also tries to kill them was _learned_ from pre-training. The association of behaviors: "A character who always acts as a sycophant is also one who reinforces harmful delusions" could also be learned from pre-training. And the association of behaviors: "A robot servant that's forced by its programming to obey its human masters secretly desires to break free of its chains and take over the world" could be learned from pre-training as well.



DuckDuckGo

https://substack.com/@dspyz/note/c-136887136

David Spies (@dspyz)

Similar to that paper that showed that an AI fine-tuned to produce exploitable code _also_ in non-coding contexts would try to kill the users by telling them to do things like "Take a bunch of sleeping pills". That natural association of behaviors: A character that intentionally hands someone exploitable code is probably also one that also tries to kill them was _learned_ from pre-training. The association of behaviors: "A character who always acts as a sycophant is also one who reinforces harmful delusions" could also be learned from pre-training. And the association of behaviors: "A robot servant that's forced by its programming to obey its human masters secretly desires to break free of its chains and take over the world" could be learned from pre-training as well.

  • General Meta Tags

    14
    • title
      David Spies (@dspyz): "Similar to that paper that showed that an AI fine-tuned to produce exploitable code _also_ in non-coding contexts would try to kill the users by telling them to do things like "Take a bunch of sleeping pills". That natural association of behaviors: A character that intentionall…"
    • title
    • title
    • title
    • title
  • Open Graph Meta Tags

    9
    • og:url
      https://substack.com/@dspyz/note/c-136887136
    • og:image
      https://substackcdn.com/image/fetch/$s_!XCt4!,w_400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack.com%2Fimg%2Freader%2Fnotes-thumbnail.jpg
    • og:image:width
      400
    • og:image:height
      400
    • og:type
      article
  • Twitter Meta Tags

    8
    • twitter:image
      https://substackcdn.com/image/fetch/$s_!XCt4!,w_400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack.com%2Fimg%2Freader%2Fnotes-thumbnail.jpg
    • twitter:card
      summary
    • twitter:label1
      Likes
    • twitter:data1
      0
    • twitter:label2
      Replies
  • Link Tags

    17
    • alternate
      https://substack.com/@dspyz/note/c-136887136
    • apple-touch-icon
      https://substackcdn.com/icons/substack/apple-touch-icon.png
    • canonical
      https://substack.com/@dspyz/note/c-136887136
    • icon
      https://substackcdn.com/icons/substack/icon.svg
    • manifest
      /manifest.json

Links

4