Skip to content

Processing Pipeline

anasrferreira edited this page Mar 5, 2015 · 12 revisions

All commands are as if run from BIDMach root.

  1. Process raw xml using /var/local/destress/scripts/xmltweet.exe
  • this generates a sparse matrix of tokens
  • xmltweet.exe has specialized date, emoticon, and XML tag handling (need pointer to docs)
  • Updated version allows the user to specify output directory of files
    • eg: '/path/to/parser/xmltweet.exe' -i '/path/inputfile.xml' -o '/path/desired/xmlIMat/output/' -d '/path/desired/dictionary/dictname'
  1. BIDMach proper (./bidmach or ./bidmach notebook) can load and unify token dictionaries
  • then, we do basic sentiment analysis based on "ground truth" emotions for now
Clone this wiki locally