GPT File Translator

This project leverages OpenAI’s API to translate documents while preserving the original formatting, including images, tables, fonts, and styles. It supports .pdf, .docx, and .pptx files, converting documents as necessary for consistent, high-quality translation results.

Features

•	Format Preservation: Translates documents while keeping original formatting, including images and tables.
•	Supports Multiple File Types: Handles .pdf, .docx, and .pptx files.
•	Batch Processing: Processes multiple files from the input directory.
•	Customizable Translation Prompts: Allows the user to specify custom translation prompts via prompt.txt.
•	Pricing Calculation: Estimates the cost of each translation based on OpenAI model pricing.

Requirements

•	Python 3.8+
•	OpenAI API Key (stored in .env file)
•	LibreOffice (for DOCX to PDF conversion)
•	Required Python packages (see Installation for instructions)

Installation

1.	Clone the repository:

git clone https://github.com/your-username/gpt-file-translator.git
cd gpt-file-translator

2.	Set up the virtual environment:

python3 -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows

3.	Install dependencies:

pip install -r requirements.txt

4.	Configure OpenAI API Key:
•	Create a .env file in the root directory and add your OpenAI API key:

echo "OPENAI_API_KEY=your_openai_api_key" > .env

5.	Install LibreOffice:
•	LibreOffice is required for PDF conversions back to DOCX. Install it and ensure it’s accessible in your PATH.

Project Structure

GPT_file_translator/
├── input/                   # Directory for input files
├── output/                  # Directory for output files
├── main.py                  # Main script to run translations
├── requirements.txt         # Project dependencies
├── prompt.txt               # Custom prompt file for translation instructions
├── src/                     # Source folder for processing modules
│   ├── pdf_processor.py     # PDF handling and conversion to DOCX
│   ├── docx_processor.py    # DOCX translation processing
│   ├── pptx_processor.py    # PPTX translation processing
│   ├── openai_api.py        # OpenAI API wrapper for handling requests
│   ├── utils.py             # Utility functions for chunking text
├── tests/                   # Test files for conversion and functionality
└── README.md                # Project documentation

Usage

Prepare Input Files

• Place all documents for translation in the input directory. • Supported formats include .pdf, .docx, and .pptx.
Set Custom Prompt

• Edit prompt.txt in the input directory to customize translation instructions. • Example prompt: "Translate this document to Slovak. Maintain technical terms and formatting."
Run the Translator

• Use the following command to start translation:

python main.py

•	Arguments (optional):
•	process_all_files: Set to True to process all files in the input folder.
•	file_extension_filter: Filter by file type, e.g., only process .pdf files.

Review Output Files

• Translated files will appear in the output directory, preserving original formatting.

Supported Models

Specify the desired OpenAI model in main.py by setting:

openai_api = OpenAIAPI(model="gpt-4o")

Available Models

•	gpt-4o: Optimal for high-accuracy translations with a balanced cost.
•	gpt-4-turbo: Faster, with a slight reduction in translation accuracy.
•	gpt-4: More expensive, ideal for highly accurate translations.

Refer to OpenAI’s pricing page for the latest costs.

Pricing Rates

These are the per-token costs used in estimating prices:

•	GPT-4o:
•	Input: $0.00250 / 1K tokens
•	Output: $0.01000 / 1K tokens
•	GPT-4 Turbo:
•	Input: $0.0100 / 1K tokens
•	Output: $0.0300 / 1K tokens

Example

1.	Input Files:
•	Place sample.pdf and sample.docx in the input directory.
2.	Prompt:
•	Edit prompt.txt with your desired instructions:

Translate this document to English. Ensure technical terminology is accurately preserved.

3.	Run Translation:

python main.py

4.	Output:
•	The translated files sample.pdf and sample.docx will appear in the output directory.

Troubleshooting

•	LibreOffice Not Found:
•	Ensure LibreOffice is installed and accessible from your PATH for PDF to DOCX conversions.
•	Formatting Issues:
•	Minor formatting discrepancies may occur, especially with complex layouts. For best results, use DOCX files.
•	Translation Errors:
•	Ensure prompt clarity and adjust chunk sizes in utils.py if context loss occurs.

Additional Information

This project is a work in progress. For support, please reach out or submit an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
tests		tests
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
main.py		main.py
prompt.txt.sample		prompt.txt.sample
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT File Translator

Table of Contents

Features

Requirements

Installation

Project Structure

Usage

Supported Models

Available Models

Pricing Rates

Troubleshooting

Additional Information

About

Releases

Packages

Languages

kasyan1337/GPT_file_translator

Folders and files

Latest commit

History

Repository files navigation

GPT File Translator

Table of Contents

Features

Requirements

Installation

Project Structure

Usage

Supported Models

Available Models

Pricing Rates

Troubleshooting

Additional Information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages