- All commands in this instruction should be run in the following directory.
/data_pipeline
- Run the following command on your terminal.
$ pip install -r requirments.txt
$ python3 crawler.py
$ python3 qa_crawler.py
$ python3 generate_gpt.py
$ python3 parse.py
-
To generate data using the GPT model, you need to obtain an API key from OpenAI first.
Depending on the model used, usage fees might be charged. -
If you want to modify the prompts, follow these steps.
- Add the prompts in the
backup_prompts.py
file. - Run the following command in your terminal.
$ python3 backup_prompts.py
- New pickle file will be overlapped to existing
prompts.pkl
- After that, you can proceed with the stpes mentioned earlier.
- Run the following commands in your terminal.
$ python3 spellchecker.py
$ python3 preprocessor_v2.py