importai.substack.com/p/import-ai-405-what-if-the-timelines/comment/103452193
Preview meta tags from the importai.substack.com website.
Linked Hostnames
2Thumbnail

Search Engine Appearance
Austin Morrissey on Import AI
In a similar vein, I’ve been grappling with CBRN risks. Our evals miss the mark and in turn understate capabilities and risk. Here’s what’s wrong -- and what may work better. Most evals use multiple choice questions alone. These methods are abstracted away from reality. The questions are made by domain experts but assess only whether the LLM produces an answer that overlaps with their model of reality. Unless they’ve tested every wrong answer empirically, not conceptually but experimentally – they will undershoot. Our scientific models are still a work in progress. The LLMs answer can violate our understanding –- and still work. I’ve witnessed firsthand how LLMs can propose novel experimental solutions that even senior scientists initially dismiss as improbable. Yet, when tested at the bench, they frequently deliver surprisingly effective outcomes. Human experts naturally tend to reject ideas that deviate from established norms. Experimental evidence -- and experimental evidence alone -- is the read-out we need. To properly evaluate these models, to assess their risk and reap their reward, we must bring evaluations from mental abstraction to experimental science. Here’s how we might do so, while keeping safety in mind: By employing fully automated laboratory pipelines capable of plasmid synthesis, restriction digestion, and standardized transfection assays—using harmless, quantifiable reporter genes like eGFP or FLUC—we can directly measure how effectively AI guidance improves real biological outcomes. Metrics such as protein expression levels, cellular uptake efficiency, and experimental reproducibility offer concrete, objective evidence of the AI’s practical impact. This technique is co-opted from drug discovery, where it’s used to evaluate how small changes in drug-design impact the results. mRNA is a particularly attractive testing ground for automated risk evals, as we can represent nucleotides in plain text. A huge corpus of knowledge regarding sequences is within the training data. A motivated Anthropic team could partner with an experienced contract research organization specializing in molecular biology and gene synthesis to implement this. The partnership would provide access to automated instruments, reagents, and technical staff needed to execute the study. The evidence we’d produce from this would compelling for policy makers – while also incidentally inventing better methods for mRNA therapeutics. If you made it this far and think the idea may have merit, I just applied to your biosecurity red team. :-)
Bing
Austin Morrissey on Import AI
In a similar vein, I’ve been grappling with CBRN risks. Our evals miss the mark and in turn understate capabilities and risk. Here’s what’s wrong -- and what may work better. Most evals use multiple choice questions alone. These methods are abstracted away from reality. The questions are made by domain experts but assess only whether the LLM produces an answer that overlaps with their model of reality. Unless they’ve tested every wrong answer empirically, not conceptually but experimentally – they will undershoot. Our scientific models are still a work in progress. The LLMs answer can violate our understanding –- and still work. I’ve witnessed firsthand how LLMs can propose novel experimental solutions that even senior scientists initially dismiss as improbable. Yet, when tested at the bench, they frequently deliver surprisingly effective outcomes. Human experts naturally tend to reject ideas that deviate from established norms. Experimental evidence -- and experimental evidence alone -- is the read-out we need. To properly evaluate these models, to assess their risk and reap their reward, we must bring evaluations from mental abstraction to experimental science. Here’s how we might do so, while keeping safety in mind: By employing fully automated laboratory pipelines capable of plasmid synthesis, restriction digestion, and standardized transfection assays—using harmless, quantifiable reporter genes like eGFP or FLUC—we can directly measure how effectively AI guidance improves real biological outcomes. Metrics such as protein expression levels, cellular uptake efficiency, and experimental reproducibility offer concrete, objective evidence of the AI’s practical impact. This technique is co-opted from drug discovery, where it’s used to evaluate how small changes in drug-design impact the results. mRNA is a particularly attractive testing ground for automated risk evals, as we can represent nucleotides in plain text. A huge corpus of knowledge regarding sequences is within the training data. A motivated Anthropic team could partner with an experienced contract research organization specializing in molecular biology and gene synthesis to implement this. The partnership would provide access to automated instruments, reagents, and technical staff needed to execute the study. The evidence we’d produce from this would compelling for policy makers – while also incidentally inventing better methods for mRNA therapeutics. If you made it this far and think the idea may have merit, I just applied to your biosecurity red team. :-)
DuckDuckGo
Austin Morrissey on Import AI
In a similar vein, I’ve been grappling with CBRN risks. Our evals miss the mark and in turn understate capabilities and risk. Here’s what’s wrong -- and what may work better. Most evals use multiple choice questions alone. These methods are abstracted away from reality. The questions are made by domain experts but assess only whether the LLM produces an answer that overlaps with their model of reality. Unless they’ve tested every wrong answer empirically, not conceptually but experimentally – they will undershoot. Our scientific models are still a work in progress. The LLMs answer can violate our understanding –- and still work. I’ve witnessed firsthand how LLMs can propose novel experimental solutions that even senior scientists initially dismiss as improbable. Yet, when tested at the bench, they frequently deliver surprisingly effective outcomes. Human experts naturally tend to reject ideas that deviate from established norms. Experimental evidence -- and experimental evidence alone -- is the read-out we need. To properly evaluate these models, to assess their risk and reap their reward, we must bring evaluations from mental abstraction to experimental science. Here’s how we might do so, while keeping safety in mind: By employing fully automated laboratory pipelines capable of plasmid synthesis, restriction digestion, and standardized transfection assays—using harmless, quantifiable reporter genes like eGFP or FLUC—we can directly measure how effectively AI guidance improves real biological outcomes. Metrics such as protein expression levels, cellular uptake efficiency, and experimental reproducibility offer concrete, objective evidence of the AI’s practical impact. This technique is co-opted from drug discovery, where it’s used to evaluate how small changes in drug-design impact the results. mRNA is a particularly attractive testing ground for automated risk evals, as we can represent nucleotides in plain text. A huge corpus of knowledge regarding sequences is within the training data. A motivated Anthropic team could partner with an experienced contract research organization specializing in molecular biology and gene synthesis to implement this. The partnership would provide access to automated instruments, reagents, and technical staff needed to execute the study. The evidence we’d produce from this would compelling for policy makers – while also incidentally inventing better methods for mRNA therapeutics. If you made it this far and think the idea may have merit, I just applied to your biosecurity red team. :-)
General Meta Tags
15- titleComments - Import AI 405: What if the timelines are correct?
- title
- title
- title
- title
Open Graph Meta Tags
7- og:urlhttps://importai.substack.com/p/import-ai-405-what-if-the-timelines/comment/103452193
- og:imagehttps://substackcdn.com/image/fetch/$s_!Ldcu!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Fimportai.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D341972494%26version%3D9
- og:typearticle
- og:titleAustin Morrissey on Import AI
- og:descriptionIn a similar vein, I’ve been grappling with CBRN risks. Our evals miss the mark and in turn understate capabilities and risk. Here’s what’s wrong -- and what may work better. Most evals use multiple choice questions alone. These methods are abstracted away from reality. The questions are made by domain experts but assess only whether the LLM produces an answer that overlaps with their model of reality. Unless they’ve tested every wrong answer empirically, not conceptually but experimentally – they will undershoot. Our scientific models are still a work in progress. The LLMs answer can violate our understanding –- and still work. I’ve witnessed firsthand how LLMs can propose novel experimental solutions that even senior scientists initially dismiss as improbable. Yet, when tested at the bench, they frequently deliver surprisingly effective outcomes. Human experts naturally tend to reject ideas that deviate from established norms. Experimental evidence -- and experimental evidence alone -- is the read-out we need. To properly evaluate these models, to assess their risk and reap their reward, we must bring evaluations from mental abstraction to experimental science. Here’s how we might do so, while keeping safety in mind: By employing fully automated laboratory pipelines capable of plasmid synthesis, restriction digestion, and standardized transfection assays—using harmless, quantifiable reporter genes like eGFP or FLUC—we can directly measure how effectively AI guidance improves real biological outcomes. Metrics such as protein expression levels, cellular uptake efficiency, and experimental reproducibility offer concrete, objective evidence of the AI’s practical impact. This technique is co-opted from drug discovery, where it’s used to evaluate how small changes in drug-design impact the results. mRNA is a particularly attractive testing ground for automated risk evals, as we can represent nucleotides in plain text. A huge corpus of knowledge regarding sequences is within the training data. A motivated Anthropic team could partner with an experienced contract research organization specializing in molecular biology and gene synthesis to implement this. The partnership would provide access to automated instruments, reagents, and technical staff needed to execute the study. The evidence we’d produce from this would compelling for policy makers – while also incidentally inventing better methods for mRNA therapeutics. If you made it this far and think the idea may have merit, I just applied to your biosecurity red team. :-)
Twitter Meta Tags
8- twitter:imagehttps://substackcdn.com/image/fetch/$s_!Ldcu!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Fimportai.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D341972494%26version%3D9
- twitter:cardsummary_large_image
- twitter:label1Likes
- twitter:data10
- twitter:label2Replies
Link Tags
31- alternate/feed
- apple-touch-iconhttps://substackcdn.com/image/fetch/$s_!kjto!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031520ec-8765-4d1b-a233-67c6fdb6258a%2Fapple-touch-icon-57x57.png
- apple-touch-iconhttps://substackcdn.com/image/fetch/$s_!vVit!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031520ec-8765-4d1b-a233-67c6fdb6258a%2Fapple-touch-icon-60x60.png
- apple-touch-iconhttps://substackcdn.com/image/fetch/$s_!JFQS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031520ec-8765-4d1b-a233-67c6fdb6258a%2Fapple-touch-icon-72x72.png
- apple-touch-iconhttps://substackcdn.com/image/fetch/$s_!4f9P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031520ec-8765-4d1b-a233-67c6fdb6258a%2Fapple-touch-icon-76x76.png
Links
13- https://importai.substack.com
- https://importai.substack.com/p/import-ai-405-what-if-the-timelines/comment/103452193
- https://importai.substack.com/p/import-ai-405-what-if-the-timelines/comments#comment-103452193
- https://substack.com
- https://substack.com/@austinmorrissey/note/c-103452193