big data course project
- Please download the database and models from our google drive, link below
https://drive.google.com/drive/folders/0ByehYI2Lxf1nQWZTQ05LbzV2SGM
1.) database/reviews.txt 2.) database/train_data.json 3.) doc2vec/Archive.zip
-
Put 1.) and 2.) in database folder and unzip 3.) into doc2vec folder where you downloaded from github repository.
-
Run tf_train.py (You can cahange the "learning_rate", "loops", "mode". Also you can change comparison sets by tweeking "CLASSES = [[1], [5]]" in myconstants.py)
-
You don't need to run preprocess.py any more.
Note: Please Keep the folder structure the same as below, otherwise you might get unexpected Errors(Read this structure in your text editor)
--Folder structure:
bigData myconstants.py preprocess.py tf_train.py README.md LICENSE .gitignore --database train_data.json validate_data.json reviews.txt --test reviews_Electronics_5.json train_data.json validate_data.json reviews.txt --matrix b1_1234_5.npy w1_1234_5.npy ...... --ml init.py functions.py tf_training.py w2v.py d2v.py --word2vec w2v_model --test reviews.txt w2v_model w2v_model.txt --doc2vec d2v_model d2v_model.docvecs.doctag_syn0.npy d2v_model.syn0.npy d2v_model.syn1.npy --test d2v_model --results d2v_1234_5.txt mean_123_45.txt ......