- This is a paper translator(
korean
) using Langchain. - It automatically translates addresses or files in the form of PDF files.
- select embedding model
- text-embedding-3-small
- text-embedding-3-large
- Add ChromaDB
- Vectorstore using pinecone
- add GPT4-Vision API
- add Youtube Script translator(using youtube-dl)
version history
- use langchain schema
- URL -> markdown
- require
brew install libmagic
- require
- ChatGPT API Update : gpt-3.5-turbo-16k
- token 4k -> 16k (about 3 pages cover per 1 request)
- ConstitutionalChain(test) : if output format is wrong, fix it.
- paper translator using Langchain
- preprocessing for paper (ex, split Reference)
Since Langchain's llm model uses OpenAI, an OpenAI API Key is required.
# OPENAI API key
OPENAI_API_KEY="..."
# Pinecone API key
PINECONE_API_KEY="..."
PINECONE_ENVIRONEMENT="..."
git clone https://github.com/seohyunjun/paper-translator
cd paper-tanslator
python -m pip install -r ./requirements.txt
python main.py --pdf https://arxiv.org/pdf/2304.06035v1.pdf --verbose 1 --outputfile ChooseYourWeapon.md