Skip to content
/ butter Public
forked from st937072012/butter

tf-idf library in javascript, which can be used to count top keywords on a page. To be even more effective, wrap this library into a chrome extension / firefox greasemonkey script

Notifications You must be signed in to change notification settings

mizterp/butter

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Butter

A tf-idf JavaScript library

Butter

Purpose

This is a javascript library that can be used for finding out (the most frequently used words on a webpage using tf-idf). It was initially made for recognizing cooking ingredients from recipes web sites, please modify for use in other domains.

Requirements

Thanks

Add this to the head section your webpage (change the library paths acorddingly), to see how it works

To Test this

	<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js"></script>
  <script src="http://cdnjs.cloudflare.com/ajax/libs/underscore.js/1.3.3/underscore-min.js"></script>
  <script src="../lib/stopwords.js"></script>
  <script src="../lib/tfidf.js"></script>
  <script src="../lib/tokenize.js"></script>
  <script src="../lib/corpus_tools.js"></script>
  <script src="../lib/collections_tools.js"></script>
  <script src="../lib/stemmer-min.js"></script>
  <script src="../test_data/test_data.js"></script>
  <script>
    $(function() {
      var corpus = "";
      // if($('li.ingredient.type').length>0){
      //  alert(getTextNodesIn('.ingredient.type').text());
      // }
      if($('li.ingredient').length>0){ // here use recipes microformats
        var items = getTextNodesIn('li.ingredient').text()
        alert(items);
      }
      else{ // don't use recipes microformat, scan the whole text
        corpus  = getTextNodesIn('div').text();
        alert(analyze_web_text(corpus));
      }
  });
  </script>

TODO

  • create a GreaseMonkey / Chrome Extension

resources

more info about recipes

This library doens't need to make use of recipes microformat to work, but if you would like more info Recipes Microformat

Licence

MIT

### changelog 2013-03-01 Joyce Chan
  • initial release

About

tf-idf library in javascript, which can be used to count top keywords on a page. To be even more effective, wrap this library into a chrome extension / firefox greasemonkey script

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 95.0%
  • HTML 5.0%