NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.
Prepare raw text for Natural Language Processing (NLP) using wink-nlp-utils
. It offers a set of APIs to work on strings such as names, sentences, paragraphs and tokens represented as an array of strings/words. They perform the required pre-processing for many ML tasks such as semantic search, and classification.
Use npm to install:
npm install wink-nlp-utils --save
The wink-nlp-utils
provides over 36 utility functions for Natural Language Processing tasks. Some representative examples are extracting person's name from a string, compose training corpus for a chat bot, sentence boundary detection, tokenization and stop words removal:
// Load wink-nlp-utils
var nlp = require( 'wink-nlp-utils' );
// Extract person's name from a string:
var name = nlp.string.extractPersonsName( 'Dr. Sarah Connor M. Tech., PhD. - AI' );
console.log( name );
// -> 'Sarah Connor'
// Compose all possible sentences from a string:
var str = '[I] [am having|have] [a] [problem|question]';
console.log( nlp.string.composeCorpus( str ) );
// -> [ 'I am having a problem',
// -> 'I am having a question',
// -> 'I have a problem',
// -> 'I have a question' ]
// Sentence Boundary Detection.
var para = 'AI Inc. is focussing on AI. I work for AI Inc. My mail is [email protected]';
console.log( nlp.string.sentences( para ) );
// -> [ 'AI Inc. is focussing on AI.',
// 'I work for AI Inc.',
// 'My mail is [email protected]' ]
// Tokenize a sentence.
var s = 'For details on wink, check out http://winkjs.org/ URL!';
console.log( nlp.string.tokenize( s, true ) );
// -> [ { value: 'For', tag: 'word' },
// { value: 'details', tag: 'word' },
// { value: 'on', tag: 'word' },
// { value: 'wink', tag: 'word' },
// { value: ',', tag: 'punctuation' },
// { value: 'check', tag: 'word' },
// { value: 'out', tag: 'word' },
// { value: 'http://winkjs.org/', tag: 'url' },
// { value: 'URL', tag: 'word' },
// { value: '!', tag: 'punctuation' } ]
// Remove stop words:
var t = nlp.tokens.removeWords( [ 'mary', 'had', 'a', 'little', 'lamb' ] );
console.log( t );
// -> [ 'mary', 'little', 'lamb' ]
Try experimenting with these examples on Runkit in the browser.
Check out the wink NLP utilities API documentation to learn more.
If you spot a bug and the same has not yet been reported, raise a new issue or consider fixing it and sending a pull request.
Wink is a family of open source packages for Statistical Analysis, Natural Language Processing and Machine Learning in NodeJS. The code is thoroughly documented for easy human comprehension and has a test coverage of ~100% for reliability to build production grade solutions.
wink-nlp-utils is copyright 2017-19 GRAYPE Systems Private Limited.
It is licensed under the terms of the MIT License.