-
Notifications
You must be signed in to change notification settings - Fork 2
Word2Vec
#Overview We are pursuing 2 approaches for our word2vec implementation. In both cases, we are following the general steps:
-
Convert word2vec model from .txt format to .fmat for BIDMach compatability
-
Generate our Data Matrix (more info below)
-
Align indices of Data Matrix and word2vec Matrix (by aligning each with our MasterDictionary Matrix)
-
Multiply Transpose of Data Matrix with word2vec matrix (This is our "magic matrix")
-
Run Queries on the magic matrix
We are simply using the provided word2vec model, that was trained on the Google News corpus: https://code.google.com/p/word2vec/
It originally has 3,000,000 words in the model, but it was trimmed down to 994,949 words -- so it can match the words of our Master Dict (essentially the intersection of the two dictonaries)
We generated the word2vec model via Gensim (python word2vec library), using my script in the repo: /destress/word2vec/word2vecModel2.py and /destress/word2vec/word2vecConvert.py . It is constructed with the following parameters:
- 300 dimension vectors
- skip-gram with max distance=10
- minimum word freq=5
- sped up with 8 cores (on mercury).
This is saved as a model in text format: /var/local/destress/LJ_training/word2vecTEXT.txt and is:
- originally 1,375,836 words
- trimmed to 994,949 words (to match the masterdict -- similar to what we did for the google news word2vec model)
The corpus we used to train our word2vec model was:
- 9,727,733,625 words
- 565,156,186 lines (sentences)
- can be found at: /var/local/destress/text_sent_idsCAT
#Running the word2vec queries: On Mercury, go to /home/gyoo/ and run the following:
$ bash query.sh'
$ bash queryLJ.sh'
In both instances, wait for BIDMach to load, and at the scala interpreter, run:
> query("sentence or word to query", X, "filter");
where X = the number of results desired. and "filter" is the word you do NOT want to see in your results.
8.938 -- it s been the worst and best couples days ... my dad s in the hospital because he has some sort of strep or staph infection in his leg and a very high fever , one of my closet friends completely bitched me out and then i did the same to him and he started crying his eyes out and feels worthless ... i ve had some long days at work and not much sleep along with an infection .
8.869 -- salmonella , infection in arm and a fungal infection in his lungs and they still don t know what to do about finding the fungal infection and today they thought about the arm and maybe there is leukemia in his arm causing the issues at hand .
8.846 -- my father was diagnosed with cancer in early june lung cancer that has spread to the bones .
8.794 -- my mom is in the hospital with a kidney infection .
8.755 -- i went to the doctors , and i ended up having double ear infections and really bad strep throat .
8.718 -- he passed away last thursday due to complications with bronchitis in connection to a rare lung disease .
8.697 -- what is worse , being diagnosed with a terminal illness and dieing a slow painfull death within a matter of years or having an illness undiagnosed and living a long painfull life ?
8.645 -- i have an acute chronic sinus infection month anniversary saturday that s flaring up because of the pollen count , and my cold .
8.621 -- , and then got dehydration sickness from the medication for the surgery .
8.509 -- i love how in he past few weeks , i have had a cold that turned to the flu which turned into an ear infection with ruptured my eardrum and as soon as that starts going away , i get pink eye in both my eyes .
8.422 -- i m not really sick , i just have like tonsillitis minus the sore throat , fever and headache .
8.421 -- go to the ucf clinic ... i have a ear infection ... and it was so bad it ruptured my eardrum .
8.417 -- because i had cervical cancer and was only when i was diagnosed ?
8.344 -- went to the doctor and he just had one look at my tonsils and said strep .
8.325 -- i am in no mood to be in the close vacinity of a vomiting person suffering from other symptoms resembling a head cold flu type deal .
8.227 -- oh yes , me and my mother found out that she has lung cancer a couple of weeks ago .
8.218 -- i think my father has the flu and he s miserable and well ... yucky .
8.213 -- and i have to get blood work to see if anything is wrong with my thyroids coz half the women in my family have thyroid problems .
8.204 -- ear infection sick soo sick my throat feels like it s on fire coughing up shit .