Rust Keyword Extraction

Introduction

This is a simple NLP library with a list of unsupervised keyword extraction algorithms:

Tokenizer for tokenizing text;
TF-IDF for calculating the importance of a word in one or more documents;
Co-occurrence for calculating relationships between words within a specific window size;
RAKE for extracting key phrases from a document;
TextRank for extracting keywords and key phrases from a document;
YAKE for extracting keywords with a n-gram size (defaults to 3) from a document.

Algorithms

The full list of the algorithms in this library:

Helper algorithms:
- Tokenizer
- Co-occurrence
Keyword extraction algorithms:
- TF-IDF
- RAKE
- TextRank
- YAKE

Usage

Add the library to your Cargo.toml:

[dependencies]
keyword_extraction = "1.5.0"

Or use cargo add:

cargo add keyword_extraction

Features

It is possible to enable or disable features:

"tf_idf": TF-IDF algorithm;
"rake": RAKE algorithm;
"text_rank": TextRank algorithm;
"yake": YAKE algorithm;
"all": algorimths and helpers;
"parallel": parallelization of the algorithms with Rayon;
"co_occurrence": Co-occurrence algorithm;

Default features: ["tf_idf", "rake", "text_rank"]. By default all algorithms apart from "co_occurrence" and "yake" are enabled.

NOTE: "parallel" feature is only recommended for large documents, it exchanges memory for computation resourses.

Examples

For the stop words, you can use the stop-words crate:

[dependencies]
stop-words = "0.8.0"

For example for english:

use stop_words::{get, LANGUAGE};

fn main() {
    let stop_words = get(LANGUAGE::English);
    let punctuation: Vec<String> =[
        ".", ",", ":", ";", "!", "?", "(", ")", "[", "]", "{", "}", "\"", "'",
    ].iter().map(|s| s.to_string()).collect();
    // ...
}

TF-IDF

Create a TfIdfParams enum which can be one of the following:

Unprocessed Documents: TfIdfParams::UnprocessedDocuments;
Processed Documents: TfIdfParams::ProcessedDocuments;
Single Unprocessed Document/Text block: TfIdfParams::TextBlock;

use keyword_extraction::tf_idf::{TfIdf, TfIdfParams};

fn main() {
    // ... stop_words & punctuation
    let documents: Vec<String> = vec![
        "This is a test document.".to_string(),
        "This is another test document.".to_string(),
        "This is a third test document.".to_string(),
    ];

    let params = TfIdfParams::UnprocessedDocuments(&documents, &stop_words, Some(&punctuation));

    let tf_idf = TfIdf::new(params);
    let ranked_keywords: Vec<String> = tf_idf.get_ranked_words(10);
    let ranked_keywords_scores: Vec<(String, f32)> = tf_idf.get_ranked_word_scores(10);

    // ...
}

RAKE

Create a RakeParams enum which can be one of the following:

With defaults: RakeParams::WithDefaults;
With defaults and phrase length (phrase window size limit): RakeParams::WithDefaultsAndPhraseLength;
All: RakeParams::All;

use keyword_extraction::rake::{Rake, RakeParams};

fn main() {
    // ... stop_words
    let text = r#"
        This is a test document.
        This is another test document.
        This is a third test document.
    "#;

    let rake = Rake::new(RakeParams::WithDefaults(text, &stop_words));
    let ranked_keywords: Vec<String> = rake.get_ranked_words(10);
    let ranked_keywords_scores: Vec<(String, f32)> = rake.get_ranked_word_scores(10);

    // ...
}

TextRank

Create a TextRankParams enum which can be one of the following:

With defaults: TextRankParams::WithDefaults;
With defaults and phrase length (phrase window size limit): TextRankParams::WithDefaultsAndPhraseLength;
All: TextRankParams::All;

use keyword_extraction::text_rank::{TextRank, TextRankParams};

fn main() {
    // ... stop_words
    let text = r#"
        This is a test document.
        This is another test document.
        This is a third test document.
    "#;

    let text_rank = TextRank::new(TextRankParams::WithDefaults(text, &stop_words));
    let ranked_keywords: Vec<String> = text_rank.get_ranked_words(10);
    let ranked_keywords_scores: Vec<(String, f32)> = text_rank.get_ranked_word_scores(10);
}

YAKE

Create a YakeParams enum which can be one of the following:

With defaults: YakeParams::WithDefaults;
All: YakeParams::All;

use keyword_extraction::yake::{Yake, YakeParams};

fn main() {
    // ... stop_words
    let text = r#"
        This is a test document.
        This is another test document.
        This is a third test document.
    "#;

    let yake = Yake::new(YakeParams::WithDefaults(text, &stop_words));
    let ranked_keywords: Vec<String> = yake.get_ranked_keywords(10);
    let ranked_keywords_scores: Vec<(String, f32)> = yake.get_ranked_keyword_scores(10);
    // ...
}

Contributing

I would love your input! I want to make contributing to this project as easy and transparent as possible, please read the CONTRIBUTING.md file for details.

License

This project is licensed under the GNU Lesser General Public License v3.0. See the Copying and Copying Lesser files for details.

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
.github		.github
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
COPYING		COPYING
COPYING.LESSER		COPYING.LESSER
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Rust Keyword Extraction

Introduction

Algorithms

Usage

Features

Examples

TF-IDF

RAKE

TextRank

YAKE

Contributing

License

About

Licenses found

Releases

Packages

Languages

License

Licenses found

tugascript/keyword-extraction-rs

Folders and files

Latest commit

History

Repository files navigation

Rust Keyword Extraction

Introduction

Algorithms

Usage

Features

Examples

TF-IDF

RAKE

TextRank

YAKE

Contributing

License

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages