redwoodresearch.substack.com/p/comparing-risk-from-internally-deployed/comment/128911804

Preview meta tags from the redwoodresearch.substack.com website.

Linked Hostnames

2

Thumbnail

Search Engine Appearance

Google

https://redwoodresearch.substack.com/p/comparing-risk-from-internally-deployed/comment/128911804

Amelia Frank on Redwood Research blog

When it comes to "insider threats" I think there is a lack of oversight where it concerns automated TEVV or post training fine tuning for safety using task specific AI models or agents. A hypothetical scenario in which unaligned AI agents engage in recursion through sabotaging monitoring schemes could be catastrophic. In addition, emergent behaviors and increased situational awareness in models could further trigger incentives for deception and hidden objectives. For these problems, I find it hard to cross apply existing cybersecurity measures or traditional monitoring.



Bing

Amelia Frank on Redwood Research blog

https://redwoodresearch.substack.com/p/comparing-risk-from-internally-deployed/comment/128911804

When it comes to "insider threats" I think there is a lack of oversight where it concerns automated TEVV or post training fine tuning for safety using task specific AI models or agents. A hypothetical scenario in which unaligned AI agents engage in recursion through sabotaging monitoring schemes could be catastrophic. In addition, emergent behaviors and increased situational awareness in models could further trigger incentives for deception and hidden objectives. For these problems, I find it hard to cross apply existing cybersecurity measures or traditional monitoring.



DuckDuckGo

https://redwoodresearch.substack.com/p/comparing-risk-from-internally-deployed/comment/128911804

Amelia Frank on Redwood Research blog

When it comes to "insider threats" I think there is a lack of oversight where it concerns automated TEVV or post training fine tuning for safety using task specific AI models or agents. A hypothetical scenario in which unaligned AI agents engage in recursion through sabotaging monitoring schemes could be catastrophic. In addition, emergent behaviors and increased situational awareness in models could further trigger incentives for deception and hidden objectives. For these problems, I find it hard to cross apply existing cybersecurity measures or traditional monitoring.

  • General Meta Tags

    16
    • title
      Comments - Comparing risk from internally-deployed AI to insider and outsider threats from humans
    • title
    • title
    • title
    • title
  • Open Graph Meta Tags

    7
    • og:url
      https://redwoodresearch.substack.com/p/comparing-risk-from-internally-deployed/comment/128911804
    • og:image
      https://substackcdn.com/image/fetch/$s_!0h0E!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Fredwoodresearch.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D1467347670%26version%3D9
    • og:type
      article
    • og:title
      Amelia Frank on Redwood Research blog
    • og:description
      When it comes to "insider threats" I think there is a lack of oversight where it concerns automated TEVV or post training fine tuning for safety using task specific AI models or agents. A hypothetical scenario in which unaligned AI agents engage in recursion through sabotaging monitoring schemes could be catastrophic. In addition, emergent behaviors and increased situational awareness in models could further trigger incentives for deception and hidden objectives. For these problems, I find it hard to cross apply existing cybersecurity measures or traditional monitoring.
  • Twitter Meta Tags

    8
    • twitter:image
      https://substackcdn.com/image/fetch/$s_!0h0E!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Fredwoodresearch.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D1467347670%26version%3D9
    • twitter:card
      summary_large_image
    • twitter:label1
      Likes
    • twitter:data1
      0
    • twitter:label2
      Replies
  • Link Tags

    33
    • alternate
      /feed
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!dXu3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d382275-365e-4d62-bf76-f59fd0592028%2Fapple-touch-icon-57x57.png
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!yqWx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d382275-365e-4d62-bf76-f59fd0592028%2Fapple-touch-icon-60x60.png
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!hPZ0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d382275-365e-4d62-bf76-f59fd0592028%2Fapple-touch-icon-72x72.png
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!U-0e!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d382275-365e-4d62-bf76-f59fd0592028%2Fapple-touch-icon-76x76.png

Links

13