generativehistory.substack.com/p/introducing-archive-studio/comment/144981227

Preview meta tags from the generativehistory.substack.com website.

Linked Hostnames

3

Thumbnail

Search Engine Appearance

Google

https://generativehistory.substack.com/p/introducing-archive-studio/comment/144981227

Mark Humphries on Generative History

Thanks for the comment. We did this in an article in Historical Methids using an earlier version of the software: https://www.tandfonline.com/doi/full/10.1080/01615440.2025.2500309. Transbrikus is a great tool but it is also expensive and our goal is not to create a competitor but an open-source alternative. For context, context, on a 10,000 word, 50 page English language 18th and 19th c test set using dozens of different hands, out of the box (ie without fine tuning or training), we found Gemini-2.5-pro achieved a WER of 4.89% and a CER of 2.63% (excluding punctuation and capitalization as both can be ambiguous). On the same test set, the latest Transkribus Titan model achieves 13.2% WER and 6.6% CER. Transkribus also costs around 24 cents per page versus 0.8 cents per page with Gemini-2.5-pro. Transkribus would probably approach and perhaps exceed Gemini’s performance if you fine tuned it on each ah d, but that requires around 50 pages of transcribed pages per hand. So on large datasets, Transkribus might be the best choice choice (and it might also be much better on non English sets, we don’t know). But for sets of mixed documents or small sets of documents (or where cost is an issue), Gemini-2.5-pro in the API via a program like Archive Studio offer an alternative.



Bing

Mark Humphries on Generative History

https://generativehistory.substack.com/p/introducing-archive-studio/comment/144981227

Thanks for the comment. We did this in an article in Historical Methids using an earlier version of the software: https://www.tandfonline.com/doi/full/10.1080/01615440.2025.2500309. Transbrikus is a great tool but it is also expensive and our goal is not to create a competitor but an open-source alternative. For context, context, on a 10,000 word, 50 page English language 18th and 19th c test set using dozens of different hands, out of the box (ie without fine tuning or training), we found Gemini-2.5-pro achieved a WER of 4.89% and a CER of 2.63% (excluding punctuation and capitalization as both can be ambiguous). On the same test set, the latest Transkribus Titan model achieves 13.2% WER and 6.6% CER. Transkribus also costs around 24 cents per page versus 0.8 cents per page with Gemini-2.5-pro. Transkribus would probably approach and perhaps exceed Gemini’s performance if you fine tuned it on each ah d, but that requires around 50 pages of transcribed pages per hand. So on large datasets, Transkribus might be the best choice choice (and it might also be much better on non English sets, we don’t know). But for sets of mixed documents or small sets of documents (or where cost is an issue), Gemini-2.5-pro in the API via a program like Archive Studio offer an alternative.



DuckDuckGo

https://generativehistory.substack.com/p/introducing-archive-studio/comment/144981227

Mark Humphries on Generative History

Thanks for the comment. We did this in an article in Historical Methids using an earlier version of the software: https://www.tandfonline.com/doi/full/10.1080/01615440.2025.2500309. Transbrikus is a great tool but it is also expensive and our goal is not to create a competitor but an open-source alternative. For context, context, on a 10,000 word, 50 page English language 18th and 19th c test set using dozens of different hands, out of the box (ie without fine tuning or training), we found Gemini-2.5-pro achieved a WER of 4.89% and a CER of 2.63% (excluding punctuation and capitalization as both can be ambiguous). On the same test set, the latest Transkribus Titan model achieves 13.2% WER and 6.6% CER. Transkribus also costs around 24 cents per page versus 0.8 cents per page with Gemini-2.5-pro. Transkribus would probably approach and perhaps exceed Gemini’s performance if you fine tuned it on each ah d, but that requires around 50 pages of transcribed pages per hand. So on large datasets, Transkribus might be the best choice choice (and it might also be much better on non English sets, we don’t know). But for sets of mixed documents or small sets of documents (or where cost is an issue), Gemini-2.5-pro in the API via a program like Archive Studio offer an alternative.

  • General Meta Tags

    16
    • title
      Comments - Introducing Archive Studio
    • title
    • title
    • title
    • title
  • Open Graph Meta Tags

    7
    • og:url
      https://generativehistory.substack.com/p/introducing-archive-studio/comment/144981227
    • og:image
      https://substackcdn.com/image/fetch/$s_!hyXt!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Fgenerativehistory.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D-967819538%26version%3D9
    • og:type
      article
    • og:title
      Mark Humphries on Generative History
    • og:description
      Thanks for the comment. We did this in an article in Historical Methids using an earlier version of the software: https://www.tandfonline.com/doi/full/10.1080/01615440.2025.2500309. Transbrikus is a great tool but it is also expensive and our goal is not to create a competitor but an open-source alternative. For context, context, on a 10,000 word, 50 page English language 18th and 19th c test set using dozens of different hands, out of the box (ie without fine tuning or training), we found Gemini-2.5-pro achieved a WER of 4.89% and a CER of 2.63% (excluding punctuation and capitalization as both can be ambiguous). On the same test set, the latest Transkribus Titan model achieves 13.2% WER and 6.6% CER. Transkribus also costs around 24 cents per page versus 0.8 cents per page with Gemini-2.5-pro. Transkribus would probably approach and perhaps exceed Gemini’s performance if you fine tuned it on each ah d, but that requires around 50 pages of transcribed pages per hand. So on large datasets, Transkribus might be the best choice choice (and it might also be much better on non English sets, we don’t know). But for sets of mixed documents or small sets of documents (or where cost is an issue), Gemini-2.5-pro in the API via a program like Archive Studio offer an alternative.
  • Twitter Meta Tags

    8
    • twitter:image
      https://substackcdn.com/image/fetch/$s_!hyXt!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Fgenerativehistory.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D-967819538%26version%3D9
    • twitter:card
      summary_large_image
    • twitter:label1
      Likes
    • twitter:data1
      0
    • twitter:label2
      Replies
  • Link Tags

    31
    • alternate
      /feed
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!mYs8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41206e71-78cf-4d63-9de9-69664c2049a2%2Fapple-touch-icon-57x57.png
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!3giA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41206e71-78cf-4d63-9de9-69664c2049a2%2Fapple-touch-icon-60x60.png
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!SHNk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41206e71-78cf-4d63-9de9-69664c2049a2%2Fapple-touch-icon-72x72.png
    • apple-touch-icon
      https://substackcdn.com/image/fetch/$s_!YqRn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41206e71-78cf-4d63-9de9-69664c2049a2%2Fapple-touch-icon-76x76.png

Links

14