gensim
{{Short description|Vector space modeling and topic modeling toolkit}}
{{Confuse|Genshin Impact}}
{{Infobox software
| name = Gensim
| logo = Gensim logo.png
| screenshot =
| caption =
| collapsible =
| author = Radim Řehůřek
| developer = RARE Technologies Ltd.
| repo = {{URL|https://github.com/RaRe-Technologies/gensim}}
| released = 2009
| latest release version = {{wikidata|property|reference|P348}}
| latest release date = {{start date and age|{{wikidata|qualifier|P348|P577}}}}
| latest preview version =
| latest preview date =
| programming language = Python
| operating system = Linux, Windows, macOS
| genre = Information retrieval
| license = LGPL
| website = {{URL|https://radimrehurek.com/gensim/}}
}}
Gensim is an open-source library for unsupervised topic modeling, document indexing, retrieval by similarity, and other natural language processing functionalities, using modern statistical machine learning.
Gensim is implemented in Python and Cython for performance. Gensim is designed to handle large text collections using data streaming and incremental online algorithms, which differentiates it from most other machine learning software packages that target only in-memory processing.
Main Features
Gensim includes streamed parallelized implementations of fastText,[https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Any2Vec_Filebased.ipynb Scalable *2vec training] word2vec and doc2vec algorithms,[http://radimrehurek.com/2013/09/deep-learning-with-word2vec-and-gensim/ Deep learning with word2vec and Gensim] as well as latent semantic analysis (LSA, LSI, SVD), non-negative matrix factorization (NMF), latent Dirichlet allocation (LDA), tf-idf and random projections.Radim Řehůřek and Petr Sojka (2010). [http://is.muni.cz/publication/884893/en Software framework for topic modelling with large corpora]. Proc. LREC Workshop on New Challenges for NLP Frameworks
Some of the novel online algorithms in Gensim were also published in the 2011 PhD dissertation Scalability of Semantic Analysis in Natural Language Processing of Radim Řehůřek, the creator of Gensim.{{cite web |url=http://radimrehurek.com/phd_rehurek.pdf |title=Scalability of Semantic Analysis in Natural Language Processing |last1=Řehůřek |first1=Radim |date=2011 |publisher= |accessdate=27 January 2015 |quote= my open-source gensim software package that accompanies this thesis }}
Uses of Gensim
Gensim library has been used and cited in over 1400 commercial and academic applications as of 2018,[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:NaGl4SEjCO4C Gensim academic citations] in a diverse array of disciplines from medicine to insurance claim analysis to patent search.[https://github.com/RaRe-Technologies/gensim#adopters Commercial adopters of Gensim] The software has been covered in several new articles, podcasts and interviews.[https://www.podcastinit.com/episode-71-gensim-with-radim-rehurek/ Podcast.__init__ episode #71 on Gensim][http://williamjohnbert.com/2012/04/interview-with-radim-rehurek-creator-of-gensim/ Interview with Radim Řehůřek, creator of Gensim]{{Cite web|url=http://decisionstats.com/2015/12/07/decisionstats-interview-radim-rehurek-gensim-python/|title = DecisionStats Interview Radim Řehůřek Gensim #python|date = 8 December 2015}}
Free and Commercial Support
The open source code is developed and hosted on GitHub[https://github.com/rare-technologies/gensim Gensim source code on Github] and a public support forum is maintained on Google Groups[https://groups.google.com/group/gensim Gensim mailing list on Google Groups] and Gitter.[https://gitter.im/RaRe-Technologies/gensim Gensim chat room on Gitter]
Gensim is commercially supported by the company rare-technologies.com, who also provide student mentorships and academic thesis projects for Gensim via their Student Incubator programme.[https://rare-technologies.com/incubator/ Gensim open source Incubator]
References
{{Reflist}}
External links
- {{Official website|https://radimrehurek.com/gensim/}}
Category:Free science software
Category:Natural language processing toolkits
Category:Python (programming language) libraries
{{science-software-stub}}