doi.org/10.5281/zenodo.10148636
Preview meta tags from the doi.org website.
Linked Hostnames
17- 22 links todoi.org
- 10 links toabout.zenodo.org
- 4 links toorcid.org
- 3 links togithub.com
- 3 links tohelp.zenodo.org
- 3 links toror.org
- 2 links todevelopers.zenodo.org
- 2 links tohome.cern
Search Engine Appearance
Tibetan for Spacy 1.1
Tibetan for SpaCy is a language model for Tibetan designed for use in the SpaCy environment. The model was trained using SpaCy. It uses an external tokenizer, Botok, to segment the Tibetan and replaces the Tibetan syllable-separator (tseg) with white spaces where it occurs as a word separator. SpaCy was then told to interpret the input as English. This produces good results with standard vocabulary, but fails with unrecognised words. The project is currently working on a more sophisticated version of Tibetan for SpaCy. The package includes a list of stop words. Tibetan for SpaCy was developed by James Engels as part of Divergent Discourses, a joint project between SOAS University of London and Leipzig University, funded by the AHRC in the UK and the DFG in Germany. The project developed Tibetan for SpaCy particularly for users who want to use Tibetan texts within the Leipzig Corpus Miner (iLCM), an advanced text-mining interface designed for social scientists. The instructions for using Tibetan for SpaCy within the iLCM are in the readme file below. These instructions assume the user has already downloaded and installed the iLCM, which can be found here. Please acknowledge the Divergent Discourses project if using this material (note that ultimately copyright belongs to the two participating universities). Contact: [email protected]
Bing
Tibetan for Spacy 1.1
Tibetan for SpaCy is a language model for Tibetan designed for use in the SpaCy environment. The model was trained using SpaCy. It uses an external tokenizer, Botok, to segment the Tibetan and replaces the Tibetan syllable-separator (tseg) with white spaces where it occurs as a word separator. SpaCy was then told to interpret the input as English. This produces good results with standard vocabulary, but fails with unrecognised words. The project is currently working on a more sophisticated version of Tibetan for SpaCy. The package includes a list of stop words. Tibetan for SpaCy was developed by James Engels as part of Divergent Discourses, a joint project between SOAS University of London and Leipzig University, funded by the AHRC in the UK and the DFG in Germany. The project developed Tibetan for SpaCy particularly for users who want to use Tibetan texts within the Leipzig Corpus Miner (iLCM), an advanced text-mining interface designed for social scientists. The instructions for using Tibetan for SpaCy within the iLCM are in the readme file below. These instructions assume the user has already downloaded and installed the iLCM, which can be found here. Please acknowledge the Divergent Discourses project if using this material (note that ultimately copyright belongs to the two participating universities). Contact: [email protected]
DuckDuckGo
Tibetan for Spacy 1.1
Tibetan for SpaCy is a language model for Tibetan designed for use in the SpaCy environment. The model was trained using SpaCy. It uses an external tokenizer, Botok, to segment the Tibetan and replaces the Tibetan syllable-separator (tseg) with white spaces where it occurs as a word separator. SpaCy was then told to interpret the input as English. This produces good results with standard vocabulary, but fails with unrecognised words. The project is currently working on a more sophisticated version of Tibetan for SpaCy. The package includes a list of stop words. Tibetan for SpaCy was developed by James Engels as part of Divergent Discourses, a joint project between SOAS University of London and Leipzig University, funded by the AHRC in the UK and the DFG in Germany. The project developed Tibetan for SpaCy particularly for users who want to use Tibetan texts within the Leipzig Corpus Miner (iLCM), an advanced text-mining interface designed for social scientists. The instructions for using Tibetan for SpaCy within the iLCM are in the readme file below. These instructions assume the user has already downloaded and installed the iLCM, which can be found here. Please acknowledge the Divergent Discourses project if using this material (note that ultimately copyright belongs to the two participating universities). Contact: [email protected]
General Meta Tags
21- titleTibetan for Spacy 1.1
- charsetutf-8
- X-UA-CompatibleIE=edge
- viewportwidth=device-width, initial-scale=1
- google-site-verification5fPGCLllnWrvFxH9QWI0l1TadV7byeEvfPcyK2VkS_s
Open Graph Meta Tags
4- og:titleTibetan for Spacy 1.1
- og:descriptionTibetan for SpaCy is a language model for Tibetan designed for use in the SpaCy environment. The model was trained using SpaCy. It uses an external tokenizer, Botok, to segment the Tibetan and replaces the Tibetan syllable-separator (tseg) with white spaces where it occurs as a word separator. SpaCy was then told to interpret the input as English. This produces good results with standard vocabulary, but fails with unrecognised words. The project is currently working on a more sophisticated version of Tibetan for SpaCy. The package includes a list of stop words. Tibetan for SpaCy was developed by James Engels as part of Divergent Discourses, a joint project between SOAS University of London and Leipzig University, funded by the AHRC in the UK and the DFG in Germany. The project developed Tibetan for SpaCy particularly for users who want to use Tibetan texts within the Leipzig Corpus Miner (iLCM), an advanced text-mining interface designed for social scientists. The instructions for using Tibetan for SpaCy within the iLCM are in the readme file below. These instructions assume the user has already downloaded and installed the iLCM, which can be found here. Please acknowledge the Divergent Discourses project if using this material (note that ultimately copyright belongs to the two participating universities). Contact: [email protected]
- og:urlhttps://zenodo.org/records/10148636
- og:site_nameZenodo
Twitter Meta Tags
4- twitter:cardsummary
- twitter:site@zenodo_org
- twitter:titleTibetan for Spacy 1.1
- twitter:descriptionTibetan for SpaCy is a language model for Tibetan designed for use in the SpaCy environment. The model was trained using SpaCy. It uses an external tokenizer, Botok, to segment the Tibetan and replaces the Tibetan syllable-separator (tseg) with white spaces where it occurs as a word separator. SpaCy was then told to interpret the input as English. This produces good results with standard vocabulary, but fails with unrecognised words. The project is currently working on a more sophisticated version of Tibetan for SpaCy. The package includes a list of stop words. Tibetan for SpaCy was developed by James Engels as part of Divergent Discourses, a joint project between SOAS University of London and Leipzig University, funded by the AHRC in the UK and the DFG in Germany. The project developed Tibetan for SpaCy particularly for users who want to use Tibetan texts within the Leipzig Corpus Miner (iLCM), an advanced text-mining interface designed for social scientists. The instructions for using Tibetan for SpaCy within the iLCM are in the readme file below. These instructions assume the user has already downloaded and installed the iLCM, which can be found here. Please acknowledge the Divergent Discourses project if using this material (note that ultimately copyright belongs to the two participating universities). Contact: [email protected]
Link Tags
13- alternatehttps://zenodo.org/records/10148636/files/revised_stopwords.txt
- alternatehttps://zenodo.org/records/10148636/files/readme.docx
- alternatehttps://zenodo.org/records/10148636/files/tibetan_tib_en_ver1-0.0.1.tar.gz
- apple-touch-icon/static/apple-touch-icon-120.png
- apple-touch-icon/static/apple-touch-icon-152.png
Links
59- https://about.zenodo.org
- https://about.zenodo.org/contact
- https://about.zenodo.org/cookie-policy
- https://about.zenodo.org/infrastructure
- https://about.zenodo.org/policies