-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
e3d9bd8
commit f6a49c3
Showing
3 changed files
with
34 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,8 @@ | ||
{ | ||
"cSpell.words": [ | ||
"imshow", | ||
"pytesseract" | ||
] | ||
], | ||
|
||
"python.pythonPath": "/Users/jasongellis/miniconda3/envs/table_reader/bin/python" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,29 @@ | ||
# table_reader | ||
table_reader | ||
# Table Reader | ||
|
||
Table Reader is a Python command-line interface (CLI) application designed to extract data values from tables in research publications and field notes. Leveraging image processing and optical character recognition (OCR) techniques, Table Reader can efficiently extract tabular data from images, enabling researchers to digitize and analyze information from various sources. | ||
|
||
## Key Features | ||
|
||
- Image Import: Table Reader allows users to import images containing tables from a specified directory. | ||
- Optical character recognition (OCR) Processing: Utilizing the powerful Tesseract OCR engine, Table Reader accurately extracts text from images, including tables and tabular data. | ||
- Data Extraction: The application processes extracted text to identify and extract tabular data, preserving the structure of tables found in the input images. | ||
|
||
- Data Cleaning: Table Reader includes functionality to clean and pre-process extracted data, removing special characters and ensuring consistent formatting. | ||
|
||
- Data Export: Once the data is extracted and cleaned, Table Reader enables users to export the data to a structured format, such as CSV files, for further analysis in statistical software or spreadsheet applications. | ||
|
||
## Why Use Table Reader? | ||
|
||
- Efficiency: Table Reader streamlines the process of extracting tabular data from imported images, saving researchers valuable time compared to manual transcription. | ||
- Accuracy: By leveraging OCR technology, Table Reader greatly improves accurate extraction of data values, reducing the risk of errors introduced during manual data entry. | ||
- Versatility: Researchers across various fields, including science, engineering, and social sciences, can benefit from Table Reader's ability to digitize and analyze tabular data from diverse sources, such as research publications and field notes. | ||
- Automation: With its command-line interface, Table Reader supports automation and integration into existing data processing pipelines, facilitating seamless data extraction and analysis workflows. | ||
|
||
## Future updates | ||
|
||
- Webapp interface | ||
- Upload multiple images | ||
- Ability to select/deselect image and OCR processing | ||
|
||
## How to cite | ||
|