
substack.com/@dspyz/note/c-136887136
Preview meta tags from the substack.com website.
Linked Hostnames
1Thumbnail

Search Engine Appearance
David Spies (@dspyz)
Similar to that paper that showed that an AI fine-tuned to produce exploitable code _also_ in non-coding contexts would try to kill the users by telling them to do things like "Take a bunch of sleeping pills". That natural association of behaviors: A character that intentionally hands someone exploitable code is probably also one that also tries to kill them was _learned_ from pre-training. The association of behaviors: "A character who always acts as a sycophant is also one who reinforces harmful delusions" could also be learned from pre-training. And the association of behaviors: "A robot servant that's forced by its programming to obey its human masters secretly desires to break free of its chains and take over the world" could be learned from pre-training as well.
Bing
David Spies (@dspyz)
Similar to that paper that showed that an AI fine-tuned to produce exploitable code _also_ in non-coding contexts would try to kill the users by telling them to do things like "Take a bunch of sleeping pills". That natural association of behaviors: A character that intentionally hands someone exploitable code is probably also one that also tries to kill them was _learned_ from pre-training. The association of behaviors: "A character who always acts as a sycophant is also one who reinforces harmful delusions" could also be learned from pre-training. And the association of behaviors: "A robot servant that's forced by its programming to obey its human masters secretly desires to break free of its chains and take over the world" could be learned from pre-training as well.
DuckDuckGo

David Spies (@dspyz)
Similar to that paper that showed that an AI fine-tuned to produce exploitable code _also_ in non-coding contexts would try to kill the users by telling them to do things like "Take a bunch of sleeping pills". That natural association of behaviors: A character that intentionally hands someone exploitable code is probably also one that also tries to kill them was _learned_ from pre-training. The association of behaviors: "A character who always acts as a sycophant is also one who reinforces harmful delusions" could also be learned from pre-training. And the association of behaviors: "A robot servant that's forced by its programming to obey its human masters secretly desires to break free of its chains and take over the world" could be learned from pre-training as well.
General Meta Tags
14- titleDavid Spies (@dspyz): "Similar to that paper that showed that an AI fine-tuned to produce exploitable code _also_ in non-coding contexts would try to kill the users by telling them to do things like "Take a bunch of sleeping pills". That natural association of behaviors: A character that intentionall…"
- title
- title
- title
- title
Open Graph Meta Tags
9- og:urlhttps://substack.com/@dspyz/note/c-136887136
- og:imagehttps://substackcdn.com/image/fetch/$s_!XCt4!,w_400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack.com%2Fimg%2Freader%2Fnotes-thumbnail.jpg
- og:image:width400
- og:image:height400
- og:typearticle
Twitter Meta Tags
8- twitter:imagehttps://substackcdn.com/image/fetch/$s_!XCt4!,w_400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack.com%2Fimg%2Freader%2Fnotes-thumbnail.jpg
- twitter:cardsummary
- twitter:label1Likes
- twitter:data10
- twitter:label2Replies
Link Tags
17- alternatehttps://substack.com/@dspyz/note/c-136887136
- apple-touch-iconhttps://substackcdn.com/icons/substack/apple-touch-icon.png
- canonicalhttps://substack.com/@dspyz/note/c-136887136
- iconhttps://substackcdn.com/icons/substack/icon.svg
- manifest/manifest.json
Links
4- https://substack.com/@dspyz/note/c-136887136?
- https://substack.com/@dspyz?
- https://substack.com/@dspyz?utm_source=substack-feed-item
- https://substack.com/home?