redwoodresearch.substack.com/p/comparing-risk-from-internally-deployed/comment/128911804

Preview meta tags from the redwoodresearch.substack.com website.

Linked Hostnames

Thumbnail

Search Engine Appearance

Google

https://redwoodresearch.substack.com/p/comparing-risk-from-internally-deployed/comment/128911804

Amelia Frank on Redwood Research blog

When it comes to "insider threats" I think there is a lack of oversight where it concerns automated TEVV or post training fine tuning for safety using task specific AI models or agents. A hypothetical scenario in which unaligned AI agents engage in recursion through sabotaging monitoring schemes could be catastrophic. In addition, emergent behaviors and increased situational awareness in models could further trigger incentives for deception and hidden objectives. For these problems, I find it hard to cross apply existing cybersecurity measures or traditional monitoring.

Bing

Amelia Frank on Redwood Research blog

https://redwoodresearch.substack.com/p/comparing-risk-from-internally-deployed/comment/128911804

DuckDuckGo

https://redwoodresearch.substack.com/p/comparing-risk-from-internally-deployed/comment/128911804

Amelia Frank on Redwood Research blog

General Meta Tags
16
- title
  Comments - Comparing risk from internally-deployed AI to insider and outsider threats from humans
- title
- title
- title
- title
Open Graph Meta Tags
7
- og:url
  https://redwoodresearch.substack.com/p/comparing-risk-from-internally-deployed/comment/128911804
- og:image
  https://substackcdn.com/image/fetch/$s_!0h0E!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Fredwoodresearch.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D1467347670%26version%3D9
- og:type
  article
- og:title
  Amelia Frank on Redwood Research blog
- og:description
  When it comes to "insider threats" I think there is a lack of oversight where it concerns automated TEVV or post training fine tuning for safety using task specific AI models or agents. A hypothetical scenario in which unaligned AI agents engage in recursion through sabotaging monitoring schemes could be catastrophic. In addition, emergent behaviors and increased situational awareness in models could further trigger incentives for deception and hidden objectives. For these problems, I find it hard to cross apply existing cybersecurity measures or traditional monitoring.
Twitter Meta Tags
8
- twitter:image
  https://substackcdn.com/image/fetch/$s_!0h0E!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Fredwoodresearch.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D1467347670%26version%3D9
- twitter:card
  summary_large_image
- twitter:label1
  Likes
- twitter:data1
  0
- twitter:label2
  Replies
Link Tags
33
- alternate
  /feed
- apple-touch-icon
  https://substackcdn.com/image/fetch/$s_!dXu3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d382275-365e-4d62-bf76-f59fd0592028%2Fapple-touch-icon-57x57.png
- apple-touch-icon
  https://substackcdn.com/image/fetch/$s_!yqWx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d382275-365e-4d62-bf76-f59fd0592028%2Fapple-touch-icon-60x60.png
- apple-touch-icon
  https://substackcdn.com/image/fetch/$s_!hPZ0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d382275-365e-4d62-bf76-f59fd0592028%2Fapple-touch-icon-72x72.png
- apple-touch-icon
  https://substackcdn.com/image/fetch/$s_!U-0e!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d382275-365e-4d62-bf76-f59fd0592028%2Fapple-touch-icon-76x76.png

redwoodresearch.substack.com/p/comparing-risk-from-internally-deployed/comment/128911804

Linked Hostnames

Thumbnail

Search Engine Appearance

Google

Amelia Frank on Redwood Research blog

Bing

Amelia Frank on Redwood Research blog

DuckDuckGo

Amelia Frank on Redwood Research blog

General Meta Tags

Open Graph Meta Tags

Twitter Meta Tags

Link Tags

Links