importai.substack.com/p/import-ai-405-what-if-the-timelines/comment/103452193

Preview meta tags from the importai.substack.com website.

Linked Hostnames

2

Thumbnail

Search Engine Appearance

Google

https://importai.substack.com/p/import-ai-405-what-if-the-timelines/comment/103452193

Austin Morrissey on Import AI

In a similar vein, I’ve been grappling with CBRN risks. Our evals miss the mark and in turn understate capabilities and risk. Here’s what’s wrong -- and what may work better. Most evals use multiple choice questions alone. These methods are abstracted away from reality. The questions are made by domain experts but assess only whether the LLM produces an answer that overlaps with their model of reality. Unless they’ve tested every wrong answer empirically, not conceptually but experimentally – they will undershoot. Our scientific models are still a work in progress. The LLMs answer can violate our understanding –- and still work. I’ve witnessed firsthand how LLMs can propose novel experimental solutions that even senior scientists initially dismiss as improbable. Yet, when tested at the bench, they frequently deliver surprisingly effective outcomes. Human experts naturally tend to reject ideas that deviate from established norms. Experimental evidence -- and experimental evidence alone -- is the read-out we need. To properly evaluate these models, to assess their risk and reap their reward, we must bring evaluations from mental abstraction to experimental science. Here’s how we might do so, while keeping safety in mind: By employing fully automated laboratory pipelines capable of plasmid synthesis, restriction digestion, and standardized transfection assays—using harmless, quantifiable reporter genes like eGFP or FLUC—we can directly measure how effectively AI guidance improves real biological outcomes. Metrics such as protein expression levels, cellular uptake efficiency, and experimental reproducibility offer concrete, objective evidence of the AI’s practical impact. This technique is co-opted from drug discovery, where it’s used to evaluate how small changes in drug-design impact the results. mRNA is a particularly attractive testing ground for automated risk evals, as we can represent nucleotides in plain text. A huge corpus of knowledge regarding sequences is within the training data. A motivated Anthropic team could partner with an experienced contract research organization specializing in molecular biology and gene synthesis to implement this. The partnership would provide access to automated instruments, reagents, and technical staff needed to execute the study. The evidence we’d produce from this would compelling for policy makers – while also incidentally inventing better methods for mRNA therapeutics. If you made it this far and think the idea may have merit, I just applied to your biosecurity red team. :-)



Bing

Austin Morrissey on Import AI

https://importai.substack.com/p/import-ai-405-what-if-the-timelines/comment/103452193

In a similar vein, I’ve been grappling with CBRN risks. Our evals miss the mark and in turn understate capabilities and risk. Here’s what’s wrong -- and what may work better. Most evals use multiple choice questions alone. These methods are abstracted away from reality. The questions are made by domain experts but assess only whether the LLM produces an answer that overlaps with their model of reality. Unless they’ve tested every wrong answer empirically, not conceptually but experimentally – they will undershoot. Our scientific models are still a work in progress. The LLMs answer can violate our understanding –- and still work. I’ve witnessed firsthand how LLMs can propose novel experimental solutions that even senior scientists initially dismiss as improbable. Yet, when tested at the bench, they frequently deliver surprisingly effective outcomes. Human experts naturally tend to reject ideas that deviate from established norms. Experimental evidence -- and experimental evidence alone -- is the read-out we need. To properly evaluate these models, to assess their risk and reap their reward, we must bring evaluations from mental abstraction to experimental science. Here’s how we might do so, while keeping safety in mind: By employing fully automated laboratory pipelines capable of plasmid synthesis, restriction digestion, and standardized transfection assays—using harmless, quantifiable reporter genes like eGFP or FLUC—we can directly measure how effectively AI guidance improves real biological outcomes. Metrics such as protein expression levels, cellular uptake efficiency, and experimental reproducibility offer concrete, objective evidence of the AI’s practical impact. This technique is co-opted from drug discovery, where it’s used to evaluate how small changes in drug-design impact the results. mRNA is a particularly attractive testing ground for automated risk evals, as we can represent nucleotides in plain text. A huge corpus of knowledge regarding sequences is within the training data. A motivated Anthropic team could partner with an experienced contract research organization specializing in molecular biology and gene synthesis to implement this. The partnership would provide access to automated instruments, reagents, and technical staff needed to execute the study. The evidence we’d produce from this would compelling for policy makers – while also incidentally inventing better methods for mRNA therapeutics. If you made it this far and think the idea may have merit, I just applied to your biosecurity red team. :-)



DuckDuckGo

https://importai.substack.com/p/import-ai-405-what-if-the-timelines/comment/103452193

Austin Morrissey on Import AI

In a similar vein, I’ve been grappling with CBRN risks. Our evals miss the mark and in turn understate capabilities and risk. Here’s what’s wrong -- and what may work better. Most evals use multiple choice questions alone. These methods are abstracted away from reality. The questions are made by domain experts but assess only whether the LLM produces an answer that overlaps with their model of reality. Unless they’ve tested every wrong answer empirically, not conceptually but experimentally – they will undershoot. Our scientific models are still a work in progress. The LLMs answer can violate our understanding –- and still work. I’ve witnessed firsthand how LLMs can propose novel experimental solutions that even senior scientists initially dismiss as improbable. Yet, when tested at the bench, they frequently deliver surprisingly effective outcomes. Human experts naturally tend to reject ideas that deviate from established norms. Experimental evidence -- and experimental evidence alone -- is the read-out we need. To properly evaluate these models, to assess their risk and reap their reward, we must bring evaluations from mental abstraction to experimental science. Here’s how we might do so, while keeping safety in mind: By employing fully automated laboratory pipelines capable of plasmid synthesis, restriction digestion, and standardized transfection assays—using harmless, quantifiable reporter genes like eGFP or FLUC—we can directly measure how effectively AI guidance improves real biological outcomes. Metrics such as protein expression levels, cellular uptake efficiency, and experimental reproducibility offer concrete, objective evidence of the AI’s practical impact. This technique is co-opted from drug discovery, where it’s used to evaluate how small changes in drug-design impact the results. mRNA is a particularly attractive testing ground for automated risk evals, as we can represent nucleotides in plain text. A huge corpus of knowledge regarding sequences is within the training data. A motivated Anthropic team could partner with an experienced contract research organization specializing in molecular biology and gene synthesis to implement this. The partnership would provide access to automated instruments, reagents, and technical staff needed to execute the study. The evidence we’d produce from this would compelling for policy makers – while also incidentally inventing better methods for mRNA therapeutics. If you made it this far and think the idea may have merit, I just applied to your biosecurity red team. :-)

  • General Meta Tags

    15
    • title
      Comments - Import AI 405: What if the timelines are correct?
    • title
    • title
    • title
    • title
  • Open Graph Meta Tags

    7
    • og:url
      https://importai.substack.com/p/import-ai-405-what-if-the-timelines/comment/103452193
    • og:image
      https://substackcdn.com/image/fetch/$s_!Ldcu!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Fimportai.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D341972494%26version%3D9
    • og:type
      article
    • og:title
      Austin Morrissey on Import AI
    • og:description
      In a similar vein, I’ve been grappling with CBRN risks. Our evals miss the mark and in turn understate capabilities and risk. Here’s what’s wrong -- and what may work better. Most evals use multiple choice questions alone. These methods are abstracted away from reality. The questions are made by domain experts but assess only whether the LLM produces an answer that overlaps with their model of reality. Unless they’ve tested every wrong answer empirically, not conceptually but experimentally – they will undershoot. Our scientific models are still a work in progress. The LLMs answer can violate our understanding –- and still work. I’ve witnessed firsthand how LLMs can propose novel experimental solutions that even senior scientists initially dismiss as improbable. Yet, when tested at the bench, they frequently deliver surprisingly effective outcomes. Human experts naturally tend to reject ideas that deviate from established norms. Experimental evidence -- and experimental evidence alone -- is the read-out we need. To properly evaluate these models, to assess their risk and reap their reward, we must bring evaluations from mental abstraction to experimental science. Here’s how we might do so, while keeping safety in mind: By employing fully automated laboratory pipelines capable of plasmid synthesis, restriction digestion, and standardized transfection assays—using harmless, quantifiable reporter genes like eGFP or FLUC—we can directly measure how effectively AI guidance improves real biological outcomes. Metrics such as protein expression levels, cellular uptake efficiency, and experimental reproducibility offer concrete, objective evidence of the AI’s practical impact. This technique is co-opted from drug discovery, where it’s used to evaluate how small changes in drug-design impact the results. mRNA is a particularly attractive testing ground for automated risk evals, as we can represent nucleotides in plain text. A huge corpus of knowledge regarding sequences is within the training data. A motivated Anthropic team could partner with an experienced contract research organization specializing in molecular biology and gene synthesis to implement this. The partnership would provide access to automated instruments, reagents, and technical staff needed to execute the study. The evidence we’d produce from this would compelling for policy makers – while also incidentally inventing better methods for mRNA therapeutics. If you made it this far and think the idea may have merit, I just applied to your biosecurity red team. :-)
  • Twitter Meta Tags

    8
    • twitter:image
      https://substackcdn.com/image/fetch/$s_!Ldcu!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Fimportai.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D341972494%26version%3D9
    • twitter:card
      summary_large_image
    • twitter:label1
      Likes
    • twitter:data1
      0
    • twitter:label2
      Replies
  • Link Tags

    31
    • alternate
      /feed
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!kjto!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031520ec-8765-4d1b-a233-67c6fdb6258a%2Fapple-touch-icon-57x57.png
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!vVit!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031520ec-8765-4d1b-a233-67c6fdb6258a%2Fapple-touch-icon-60x60.png
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!JFQS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031520ec-8765-4d1b-a233-67c6fdb6258a%2Fapple-touch-icon-72x72.png
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!4f9P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031520ec-8765-4d1b-a233-67c6fdb6258a%2Fapple-touch-icon-76x76.png

Links

13