Releases: mesolitica/malaya
Version 1.5
- Available to check deep learning models available for stemming, simply,
malaya.stem.available_deep_model()
. - Available to load deep learning model for stemming, simply,
malaya.stem.malaya.stem.deep_model()
, https://malaya.readthedocs.io/en/latest/Stemmer.html - Improve dependency parsing documentation, https://malaya.readthedocs.io/en/latest/Dependency.html#dependency-graph-object
Version 1.4
- Retrained Entities recognition models.
- Retrained POS recognition models.
- Able to print important features from deep entities recognition models, simply
model.print_features()
. - Able to print important transitions from deep entities recognition models, simply
model.print_transitions()
. - Able to print important features from deep POS recognition models, simply
model.print_features()
. - Able to print important transitions from deep POS recognition models, simply
model.print_transitions()
. - Released Dependency Parsing features, https://malaya.readthedocs.io/en/latest/Dependency.html.
Version 1.3
-
Release pretrained Bahasa Malaysia using wikipedia dataset, simply,
malaya.word2vec.load_wiki()
, https://malaya.readthedocs.io/en/latest/Word2vec.html -
Retrained summarization model based on news dataset, simply
malaya.summarize.deep_model_news()
-
Release pretrained summarization model based on wikipedia dataset, simply
malaya.summarize.deep_model_wiki()
-
Provide interface to train word2vec on custom dataset, simply
malaya.word2vec.train()
, https://malaya.readthedocs.io/en/latest/Word2vec.html#train-on-custom-corpus -
Provide interface to train skip-thought on custom dataset for summarization agent, simply
malaya.summarize.train_skip_thought()
, https://malaya.readthedocs.io/en/latest/Summarization.html#train-skip-thought-summarization-deep-learning-model
Version 1.2
- Released emotion analysis, https://malaya.readthedocs.io/en/latest/Emotion.html
- Added sparse
fast-text-char
deep learning model for sentiment, emotion, and subjectivity analysis.
Sparse deep learning models
What happen if a word not included in the dictionary of the models? like setan, what if setan appeared in text we want to classify? We found this problem when classifying social media texts / posts. Words used not really a vocabulary-based contextual.
Malaya will treat unknown words as <UNK>
, so, to solve this problem, we need to use N-grams character based. Malaya chose tri-grams until fifth-grams.
setan = ['set', 'eta', 'tan']
Sklearn provided easy interface to use n-grams, problem is, it is very sparse, a lot of zeros and not memory efficient. Sklearn returned sparse matrix for the result, lucky Tensorflow already provided some sparse function.
simply call, malaya.sentiment.sparse_deep_model()
, malaya.subjective.sparse_deep_model()
, malaya.emotion.sparse_deep_model()
Version 1.1
- Added deep learning model for language detection, simply call
malaya.language_detection.deep_model()
. - Retrained language detection models.
Version 1.0
Malaya released first beta version, V1.0!
- Major housekeeping, old APIs totally replaced by new APIs.
- Added subjectivity analysis, https://malaya.readthedocs.io/en/latest/Subjective.html.
- Added stacking module, https://malaya.readthedocs.io/en/latest/Stack.html.
- Added clustering module, https://malaya.readthedocs.io/en/latest/Cluster.html
- Added visualization for word2vec, https://malaya.readthedocs.io/en/latest/Word2vec.html
- Build systematic caching system, https://malaya.readthedocs.io/en/latest/Cache.html
Version 0.9
- Added LDA2Vec model for topic modelling.
- Now can visualize topic-modelling models using pyLDAvis, by simply
model.visualize_topics()
- No longer depends on NLTK.
- Added stochastic gradient descent model for language detection, simply
malaya.sgd_detect_languages()
- Retrain language detection models.
Version 0.8
- Sentiment and Toxicity analysis now will use
naive_stemmer
to classify. - Toxicity analysis now supported
['bahdanau', 'hierarchical', 'luong', 'fast-text', 'entity-network']
- No longer depends on Keras.
- No longer have any CNN based model due to CuDNN unstable.
- Added
entity-network
for sentiment and toxicity analysis. - Added
bert
for sentiment analysis. - Generated readthedocs documentation, https://malaya.readthedocs.io/en/latest/
- House keeping.
Version 0.7
- Added Deep learning summarization, skip thought vector, simply call by
malaya.summarize_deep_learning
. - Added TF-IDF string matching for Topics and Influencers Analysis, simply call
malaya.fast_get_topics
,malaya.fast_get_influencers
- Added Deep learning string matching, skip thought vector, for Topics and Influencers Analysis, simply call
malaya.deep_get_topics
,malaya.deep_get_influencers
- Major housekeeping for
text_functions
. - Deep learning Part-of-Speech case sensitive.
- Retrain malaya word2vec
- Normalizer now ignores Proper Noun.
- Spelling correction and Normalizer will ignore location.
Version 0.6.0.2
Stable version for 0.6
- Fix some bugs related to
str_idx