A flask app for analyzing ZhihuRec dataset.
pip install requirements.txt
- [Dataset] Put dataset ZhihuRec in the root directory.
- [Work Path] Set the work path in root directory.
- [Preprocess] Run the io.py, to convert answer_infos.txt into .csv files.
1.
First, run this command to get answers' csv files:
python tools/io.py
Or just download from here:
Baidu NetDisk
Link:https://pan.baidu.com/s/1Ey-R9yo6_HNuoZuhEJivjg
Code: 8rc7
Unzip and put the folder answer_csv
into source/
2.
Then you can use this command to run the flask app:
python app.py
The flask app will run on the "127.0.0.1:5000"
[model]
The tf-idf model will be saved here.[source]
Processed files[answer_csv]
Answers' csv files. All files are sorted.[xxxx.csv]
The xxxx means the start(min) answer's index in this file.
[tools]
Tools help you analyze the dataset.[io.py]
Used to read/write/convert dataset.[tfidf.py]
TF-IDF algorithm. its mainly functions aretrain()
load_tfidf()
save_tfidf()
compare_similarity()
.
[zhihuRec]
The dataset. You should put txt files here.[app.py]
The entry of the flask app.[preprocess.py]
Use the code intools
to create tfidf matrix, and save the result intomodel
.