Skip to content

Commit

Permalink
Update tutorial.rst
Browse files Browse the repository at this point in the history
  • Loading branch information
maximtrp authored Jul 24, 2021
1 parent 644585c commit 5206207
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions docs/source/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,11 @@ stage: cleaned, lemmatized or stemmed your documents, and removed stop words.
texts = df['texts'].str.strip().tolist()
# Vectorizing documents, obtaining full vocabulary and biterms
X, vocabulary, vocab_dict = btm.get_words_freqs(texts)
# Internally, btm.get_words_freqs uses CountVectorizer from sklearn
# You can pass any of its arguments to btm.get_words_freqs
# For example, you can remove stop words:
stop_words = ["word1", "word2", "word3"]
X, vocabulary, vocab_dict = btm.get_words_freqs(texts, stop_words=stop_words)
docs_vec = btm.get_vectorized_docs(texts, vocabulary)
biterms = btm.get_biterms(docs_vec)
Expand Down Expand Up @@ -149,4 +153,4 @@ References
.. [3] Greene, D., O’Callaghan, D., & Cunningham, P. (2014, September). How many
topics? stability analysis for topic models. In Joint European conference on
machine learning and knowledge discovery in databases (pp. 498-513). Springer,
Berlin, Heidelberg.
Berlin, Heidelberg.

0 comments on commit 5206207

Please sign in to comment.