The goal of this hackathon round is to extract information from PDF invoices using the Adobe PDF Services Extract API.
This project provides a user-friendly interface where users can upload PDF invoices from the frontend. The PDF files are then sent to the backend for further processing. The backend utilizes the Adobe PDF Services Extract API to extract relevant information from the invoices.
The Extract API processes the PDF documents and returns a response containing the extracted data. The backend server reads this response and extracts the necessary information using appropriate parsing techniques.
Once the relevant information is extracted, it is written into a CSV (Comma-Separated Values) file. The CSV file follows the same format as the provided ExtractedData.csv
file, ensuring consistency and compatibility.
- User-friendly interface for uploading PDF invoices.
- Seamless integration with the Adobe PDF Services Extract API.
- Parsing of the API response to extract relevant invoice information.
- Writing the extracted data into a CSV file in the specified format.
Technologies Used : React , Express To set up the project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/nipunarora098/Adobe-Project.git
-
GO to client directory:
cd client
-
Install Dependancies
npm init npm i react-router-dom axios
-
Start Frontend
npm start
-
Go to server Directory
cd server
-
Install Dependancies
npm init npm i express multer cors fs @adobe/pdfservices-node-sdk csv-writer adm-zip
-
Start backend
npm run dev
-
After submitted Invoices Extracted_data.csv file will be downloaded.
Testcase :- https://drive.google.com/drive/folders/1WHnGtmzHbEI_cy44k9bRi2sfUr1ucrya
Extracted_Data.csv :- https://drive.google.com/file/d/1xgPINQeuj-eMOX-wT_RR2PIuYXqfN1XZ/view?usp=sharing
Testcases Fail : -> output81.pdf (Fail due to Wrong data From Extract API , Tax keyword is not present in the Extract File but present in pdf )