bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04873-x

Preview meta tags from the bmcbioinformatics.biomedcentral.com website.

Linked Hostnames

Thumbnail

Search Engine Appearance

Google

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04873-x

TMbed: transmembrane proteins predicted through language model embeddings - BMC Bioinformatics

Despite the immense importance of transmembrane proteins (TMP) for molecular biology and medicine, experimental 3D structures for TMPs remain about 4–5 times underrepresented compared to non-TMPs. Today’s top methods such as AlphaFold2 accurately predict 3D structures for many TMPs, but annotating transmembrane regions remains a limiting step for proteome-wide predictions. Here, we present TMbed, a novel method inputting embeddings from protein Language Models (pLMs, here ProtT5), to predict for each residue one of four classes: transmembrane helix (TMH), transmembrane strand (TMB), signal peptide, or other. TMbed completes predictions for entire proteomes within hours on a single consumer-grade desktop machine at performance levels similar or better than methods, which are using evolutionary information from multiple sequence alignments (MSAs) of protein families. On the per-protein level, TMbed correctly identified 94 ± 8% of the beta barrel TMPs (53 of 57) and 98 ± 1% of the alpha helical TMPs (557 of 571) in a non-redundant data set, at false positive rates well below 1% (erred on 30 of 5654 non-membrane proteins). On the per-segment level, TMbed correctly placed, on average, 9 of 10 transmembrane segments within five residues of the experimental observation. Our method can handle sequences of up to 4200 residues on standard graphics cards used in desktop PCs (e.g., NVIDIA GeForce RTX 3060). Based on embeddings from pLMs and two novel filters (Gaussian and Viterbi), TMbed predicts alpha helical and beta barrel TMPs at least as accurately as any other method but at lower false positive rates. Given the few false positives and its outstanding speed, TMbed might be ideal to sieve through millions of 3D structures soon to be predicted, e.g., by AlphaFold2.

Bing

TMbed: transmembrane proteins predicted through language model embeddings - BMC Bioinformatics

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04873-x

DuckDuckGo

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04873-x

TMbed: transmembrane proteins predicted through language model embeddings - BMC Bioinformatics

General Meta Tags
151
- title
  TMbed: transmembrane proteins predicted through language model embeddings | BMC Bioinformatics | Full Text
- charset
  UTF-8
- X-UA-Compatible
  IE=edge
- applicable-device
  pc,mobile
- viewport
  width=device-width, initial-scale=1
Open Graph Meta Tags
6
- og:url
  https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04873-x
- og:type
  article
- og:site_name
  BioMed Central
- og:title
  TMbed: transmembrane proteins predicted through language model embeddings - BMC Bioinformatics
- og:description
  Background Despite the immense importance of transmembrane proteins (TMP) for molecular biology and medicine, experimental 3D structures for TMPs remain about 4–5 times underrepresented compared to non-TMPs. Today’s top methods such as AlphaFold2 accurately predict 3D structures for many TMPs, but annotating transmembrane regions remains a limiting step for proteome-wide predictions. Results Here, we present TMbed, a novel method inputting embeddings from protein Language Models (pLMs, here ProtT5), to predict for each residue one of four classes: transmembrane helix (TMH), transmembrane strand (TMB), signal peptide, or other. TMbed completes predictions for entire proteomes within hours on a single consumer-grade desktop machine at performance levels similar or better than methods, which are using evolutionary information from multiple sequence alignments (MSAs) of protein families. On the per-protein level, TMbed correctly identified 94 ± 8% of the beta barrel TMPs (53 of 57) and 98 ± 1% of the alpha helical TMPs (557 of 571) in a non-redundant data set, at false positive rates well below 1% (erred on 30 of 5654 non-membrane proteins). On the per-segment level, TMbed correctly placed, on average, 9 of 10 transmembrane segments within five residues of the experimental observation. Our method can handle sequences of up to 4200 residues on standard graphics cards used in desktop PCs (e.g., NVIDIA GeForce RTX 3060). Conclusions Based on embeddings from pLMs and two novel filters (Gaussian and Viterbi), TMbed predicts alpha helical and beta barrel TMPs at least as accurately as any other method but at lower false positive rates. Given the few false positives and its outstanding speed, TMbed might be ideal to sieve through millions of 3D structures soon to be predicted, e.g., by AlphaFold2.
Link Tags
12
- apple-touch-icon
  /static/img/favicons/bmc/apple-touch-icon-582ef1d0f5.png
- canonical
  https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04873-x
- icon
  /static/img/favicons/bmc/android-chrome-192x192-9625b7cdba.png
- icon
  /static/img/favicons/bmc/favicon-32x32-5d7879efe1.png
- icon
  /static/img/favicons/bmc/favicon-16x16-c241ac1a2f.png

Emails

Links

409

bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04873-x

Linked Hostnames

Thumbnail

Search Engine Appearance

Google

TMbed: transmembrane proteins predicted through language model embeddings - BMC Bioinformatics

Bing

TMbed: transmembrane proteins predicted through language model embeddings - BMC Bioinformatics

DuckDuckGo

TMbed: transmembrane proteins predicted through language model embeddings - BMC Bioinformatics

General Meta Tags

Open Graph Meta Tags

Link Tags

Emails

Links