Skip to content

Processing Pipeline

anasrferreira edited this page Mar 5, 2015 · 12 revisions

All commands are as if run from BIDMach root.

  1. Process raw xml using /var/local/destress/scripts/xmltweet.exe
  • this currently generates a file in the same location as the input file which is problematic
  • this generates a sparse matrix of tokens
  • xmltweet.exe has specialized date, emoticon, and XML tag handling (need pointer to docs)
  • Updated version allows the user to specify output directory of files eg: /path/to/parser/xmltweet.exe -i /path/inputfile.xml -o /path/desired/xmlIMat/output/ -d /path/desired/dictionary/dictname
  1. BIDMach proper (./bidmach or ./bidmach notebook) can load and unify token dictionaries
  • then, we do basic sentiment analysis based on "ground truth" emotions for now
Clone this wiki locally