Skip to content

Releases: mesolitica/malaya

Version 1.5

01 Feb 17:32
Compare
Choose a tag to compare
  1. Available to check deep learning models available for stemming, simply, malaya.stem.available_deep_model().
  2. Available to load deep learning model for stemming, simply, malaya.stem.malaya.stem.deep_model(), https://malaya.readthedocs.io/en/latest/Stemmer.html
  3. Improve dependency parsing documentation, https://malaya.readthedocs.io/en/latest/Dependency.html#dependency-graph-object

Version 1.4

20 Jan 12:57
Compare
Choose a tag to compare
  1. Retrained Entities recognition models.
  2. Retrained POS recognition models.
  3. Able to print important features from deep entities recognition models, simply model.print_features().
  4. Able to print important transitions from deep entities recognition models, simply model.print_transitions().
  5. Able to print important features from deep POS recognition models, simply model.print_features().
  6. Able to print important transitions from deep POS recognition models, simply model.print_transitions().
  7. Released Dependency Parsing features, https://malaya.readthedocs.io/en/latest/Dependency.html.

Version 1.3

16 Jan 08:13
Compare
Choose a tag to compare
  1. Release pretrained Bahasa Malaysia using wikipedia dataset, simply, malaya.word2vec.load_wiki(), https://malaya.readthedocs.io/en/latest/Word2vec.html

  2. Retrained summarization model based on news dataset, simply malaya.summarize.deep_model_news()

  3. Release pretrained summarization model based on wikipedia dataset, simply malaya.summarize.deep_model_wiki()

  4. Provide interface to train word2vec on custom dataset, simply malaya.word2vec.train(), https://malaya.readthedocs.io/en/latest/Word2vec.html#train-on-custom-corpus

  5. Provide interface to train skip-thought on custom dataset for summarization agent, simply malaya.summarize.train_skip_thought(), https://malaya.readthedocs.io/en/latest/Summarization.html#train-skip-thought-summarization-deep-learning-model

Version 1.2

06 Jan 10:08
Compare
Choose a tag to compare
  1. Released emotion analysis, https://malaya.readthedocs.io/en/latest/Emotion.html
  2. Added sparse fast-text-char deep learning model for sentiment, emotion, and subjectivity analysis.

Sparse deep learning models

What happen if a word not included in the dictionary of the models? like setan, what if setan appeared in text we want to classify? We found this problem when classifying social media texts / posts. Words used not really a vocabulary-based contextual.

Malaya will treat unknown words as <UNK>, so, to solve this problem, we need to use N-grams character based. Malaya chose tri-grams until fifth-grams.

setan = ['set', 'eta', 'tan']
Sklearn provided easy interface to use n-grams, problem is, it is very sparse, a lot of zeros and not memory efficient. Sklearn returned sparse matrix for the result, lucky Tensorflow already provided some sparse function.

simply call, malaya.sentiment.sparse_deep_model(), malaya.subjective.sparse_deep_model(), malaya.emotion.sparse_deep_model()

Version 1.1

30 Dec 04:27
Compare
Choose a tag to compare
  1. Added deep learning model for language detection, simply call malaya.language_detection.deep_model().
  2. Retrained language detection models.

Version 1.0

25 Dec 15:29
Compare
Choose a tag to compare

Malaya released first beta version, V1.0!

  1. Major housekeeping, old APIs totally replaced by new APIs.
  2. Added subjectivity analysis, https://malaya.readthedocs.io/en/latest/Subjective.html.
  3. Added stacking module, https://malaya.readthedocs.io/en/latest/Stack.html.
  4. Added clustering module, https://malaya.readthedocs.io/en/latest/Cluster.html
  5. Added visualization for word2vec, https://malaya.readthedocs.io/en/latest/Word2vec.html
  6. Build systematic caching system, https://malaya.readthedocs.io/en/latest/Cache.html

Version 0.9

19 Dec 06:01
Compare
Choose a tag to compare
  1. Added LDA2Vec model for topic modelling.
  2. Now can visualize topic-modelling models using pyLDAvis, by simply model.visualize_topics()
  3. No longer depends on NLTK.
  4. Added stochastic gradient descent model for language detection, simply malaya.sgd_detect_languages()
  5. Retrain language detection models.

Version 0.8

09 Dec 15:13
Compare
Choose a tag to compare
  1. Sentiment and Toxicity analysis now will use naive_stemmer to classify.
  2. Toxicity analysis now supported ['bahdanau', 'hierarchical', 'luong', 'fast-text', 'entity-network']
  3. No longer depends on Keras.
  4. No longer have any CNN based model due to CuDNN unstable.
  5. Added entity-network for sentiment and toxicity analysis.
  6. Added bert for sentiment analysis.
  7. Generated readthedocs documentation, https://malaya.readthedocs.io/en/latest/
  8. House keeping.

Version 0.7

27 Nov 10:28
Compare
Choose a tag to compare
  1. Added Deep learning summarization, skip thought vector, simply call by malaya.summarize_deep_learning.
  2. Added TF-IDF string matching for Topics and Influencers Analysis, simply call malaya.fast_get_topics, malaya.fast_get_influencers
  3. Added Deep learning string matching, skip thought vector, for Topics and Influencers Analysis, simply call malaya.deep_get_topics, malaya.deep_get_influencers
  4. Major housekeeping for text_functions.
  5. Deep learning Part-of-Speech case sensitive.
  6. Retrain malaya word2vec
  7. Normalizer now ignores Proper Noun.
  8. Spelling correction and Normalizer will ignore location.

Version 0.6.0.2

07 Oct 08:57
Compare
Choose a tag to compare

Stable version for 0.6

  1. Fix some bugs related to str_idx