Skip to content

Commit

Permalink
Add extract pdf content function (#3)
Browse files Browse the repository at this point in the history
  • Loading branch information
Maclenn77 authored Dec 8, 2023
1 parent f681f38 commit 7a95605
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 4 deletions.
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,15 @@ An Intelligent Assistant that explains you the content of a PDF file

## Deployment

Deploy in HF with Streamlit
Deploy in HF with Streamlit-

## Local

Run streamlit run app.py

## Stack

- Streamlit
- HuggingFace
- Tika: For extracting pdf text
- Java Runtime
11 changes: 9 additions & 2 deletions app.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
""" A simple example of Streamlit. """
import streamlit as st
from tika import parser

x = st.slider("Select a value")
st.write(x, "squared is", x * x)
pdf = st.file_uploader("Upload a file", type="pdf")

if st.button("Extract text"):
if pdf is not None:
extracted_text = parser.from_file(pdf)
st.write(extracted_text["content"])
else:
st.write("Please upload a file of type: pdf")
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
openai
langchain
pdfminer
tika
chromadb
sentence_transformers
streamlit
1 change: 1 addition & 0 deletions wk_flow_requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
streamlit
tika
pylint

0 comments on commit 7a95605

Please sign in to comment.