From 97a4f7e05a0b2af7b69b3acd729c879f9504aa6d Mon Sep 17 00:00:00 2001 From: Anthony Stirling <77850077+Frooodle@users.noreply.github.com> Date: Mon, 16 Dec 2024 13:56:53 +0000 Subject: [PATCH] Cleanups! (#36) --- .gitignore | 4 ++ docs/Advanced Configuration/OCR.md | 5 +- docs/Contribute/Code.md | 3 -- docs/Contribute/Language.md | 3 -- .../{Convert/Overview.md => Convert.md} | 0 docs/Functionality/Convert/_category_.json | 3 -- .../Overview.md => Miscellaneous.md} | 2 +- .../Miscellaneous/_category_.json | 3 -- .../Overview.md => Page operations.md} | 0 .../Page operations/_category_.json | 3 -- .../{Security/Overview.md => Security.md} | 0 docs/Functionality/Security/_category_.json | 3 -- docs/Functionality/The Technologies.md | 28 +++++++++++ .../Docker => Installation}/Docker Install.md | 2 +- .../Local => Installation}/Unix.md | 31 ++++--------- .../Versions.md} | 45 ++++++++++++------ .../Local => Installation}/Windows.md | 2 +- .../_category_.json | 0 docs/Overview/Getting Started.md | 46 +++++++++++++++++++ docs/Overview/The Technologies.md | 26 ----------- docs/Overview/What is Stirling-PDF.md | 2 +- 21 files changed, 124 insertions(+), 87 deletions(-) rename docs/Functionality/{Convert/Overview.md => Convert.md} (100%) delete mode 100644 docs/Functionality/Convert/_category_.json rename docs/Functionality/{Miscellaneous/Overview.md => Miscellaneous.md} (95%) delete mode 100644 docs/Functionality/Miscellaneous/_category_.json rename docs/Functionality/{Page operations/Overview.md => Page operations.md} (100%) delete mode 100644 docs/Functionality/Page operations/_category_.json rename docs/Functionality/{Security/Overview.md => Security.md} (100%) delete mode 100644 docs/Functionality/Security/_category_.json create mode 100644 docs/Functionality/The Technologies.md rename docs/{Getting started/Installation/Docker => Installation}/Docker Install.md (98%) rename docs/{Getting started/Installation/Local => Installation}/Unix.md (94%) rename docs/{Getting started/Installation/Docker/Docker Versions.md => Installation/Versions.md} (69%) rename docs/{Getting started/Installation/Local => Installation}/Windows.md (99%) rename docs/{Getting started => Installation}/_category_.json (100%) create mode 100644 docs/Overview/Getting Started.md delete mode 100644 docs/Overview/The Technologies.md diff --git a/.gitignore b/.gitignore index b2d6de3..9473a53 100644 --- a/.gitignore +++ b/.gitignore @@ -18,3 +18,7 @@ npm-debug.log* yarn-debug.log* yarn-error.log* + + + +/.idea \ No newline at end of file diff --git a/docs/Advanced Configuration/OCR.md b/docs/Advanced Configuration/OCR.md index 7fac0cd..cca6c7e 100644 --- a/docs/Advanced Configuration/OCR.md +++ b/docs/Advanced Configuration/OCR.md @@ -11,7 +11,7 @@ This document provides instructions on how to add additional language packs for The paths have changed for the tessadata locations on new docker images, please use ``/usr/share/tessdata`` (Others should still work for backwards compatibility but might not) ## How does the OCR Work -Stirling-PDF uses [OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) which in turn uses tesseract for its text recognition. +Stirling-PDF uses tesseract for its text recognition. All credit goes to them for this awesome work! ## Language Packs @@ -53,8 +53,7 @@ Add the following to your existing docker run command ``` #### Non-Docker -If you are not using Docker, you need to install the OCR components, including the ocrmypdf app. -You can see [OCRmyPDF install guide](https://ocrmypdf.readthedocs.io/en/latest/installation.html) +If you are not using Docker, you need to install the OCR components, including the tesseract app. Debian based systems, install languages with this command: diff --git a/docs/Contribute/Code.md b/docs/Contribute/Code.md index 3f0befe..42909f0 100644 --- a/docs/Contribute/Code.md +++ b/docs/Contribute/Code.md @@ -2,9 +2,6 @@ sidebar_position: 7 id: Code title: Code -description: Create a doc page with rich content. -tags: - - Code --- See our [CONTRIBUTING guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) diff --git a/docs/Contribute/Language.md b/docs/Contribute/Language.md index e29d229..8dcb02c 100644 --- a/docs/Contribute/Language.md +++ b/docs/Contribute/Language.md @@ -2,9 +2,6 @@ sidebar_position: 7 id: Language title: Add a New Language -description: Create a doc page with rich content. -tags: - - Language --- See [HowToAddNewLanguage](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToAddNewLanguage.md) diff --git a/docs/Functionality/Convert/Overview.md b/docs/Functionality/Convert.md similarity index 100% rename from docs/Functionality/Convert/Overview.md rename to docs/Functionality/Convert.md diff --git a/docs/Functionality/Convert/_category_.json b/docs/Functionality/Convert/_category_.json deleted file mode 100644 index d3f8e62..0000000 --- a/docs/Functionality/Convert/_category_.json +++ /dev/null @@ -1,3 +0,0 @@ -{ - "position": 2, -} diff --git a/docs/Functionality/Miscellaneous/Overview.md b/docs/Functionality/Miscellaneous.md similarity index 95% rename from docs/Functionality/Miscellaneous/Overview.md rename to docs/Functionality/Miscellaneous.md index 7c75048..1cd0a34 100644 --- a/docs/Functionality/Miscellaneous/Overview.md +++ b/docs/Functionality/Miscellaneous.md @@ -15,7 +15,7 @@ sidebar_position: 4 - `extract-image-scans`: This feature enables users to extract scanned images from PDF files. -- `sign`: This feature allows users to add their writen signature to PDF documents. For digitally signing PDFs see Features - Security --> `cert_sign` +- `sign`: This feature allows users to add their writen signature to PDF documents. For cert signing see [Features-Security](/Functionality/Security/) - `flatten`: This functionality enables users to flatten a PDF, merging interactive form fields with the document. diff --git a/docs/Functionality/Miscellaneous/_category_.json b/docs/Functionality/Miscellaneous/_category_.json deleted file mode 100644 index fc33f71..0000000 --- a/docs/Functionality/Miscellaneous/_category_.json +++ /dev/null @@ -1,3 +0,0 @@ -{ - "position": 4, -} diff --git a/docs/Functionality/Page operations/Overview.md b/docs/Functionality/Page operations.md similarity index 100% rename from docs/Functionality/Page operations/Overview.md rename to docs/Functionality/Page operations.md diff --git a/docs/Functionality/Page operations/_category_.json b/docs/Functionality/Page operations/_category_.json deleted file mode 100644 index 3207ca4..0000000 --- a/docs/Functionality/Page operations/_category_.json +++ /dev/null @@ -1,3 +0,0 @@ -{ - "position": 1, -} diff --git a/docs/Functionality/Security/Overview.md b/docs/Functionality/Security.md similarity index 100% rename from docs/Functionality/Security/Overview.md rename to docs/Functionality/Security.md diff --git a/docs/Functionality/Security/_category_.json b/docs/Functionality/Security/_category_.json deleted file mode 100644 index 5cddd96..0000000 --- a/docs/Functionality/Security/_category_.json +++ /dev/null @@ -1,3 +0,0 @@ -{ - "position": 3, -} diff --git a/docs/Functionality/The Technologies.md b/docs/Functionality/The Technologies.md new file mode 100644 index 0000000..20aaf35 --- /dev/null +++ b/docs/Functionality/The Technologies.md @@ -0,0 +1,28 @@ +--- +sidebar_position: 0 +--- +# The Technologies Behind Stirling PDF +Stirling PDF harnesses several technologies throughout its implementation. + +# Java +As part of the JAVA framework to host the WebUI itself we use Spring Boot and Thymeleaf. +Apache PDFBox is the core of the PDF functionality within Stirling-PDF. +They offer a variety of methods to edit PDFs which we have then built Stirling-PDF on. +We also show all licenses used within our Java application [here](https://stirlingpdf.io/licenses). + +# JavaScript +- [PDF.js](https://github.com/mozilla/pdf.js) +- [PDF-LIB.js](https://github.com/Hopding/pdf-lib) + +# Core Components +- [Spring Boot + Thymeleaf](https://spring.io/projects/spring-boot) for the web framework +- [PDFBox](https://pdfbox.apache.org/) for majority of PDF manipulation +- [qpdf](https://qpdf.sourceforge.io/) for some PDF operations +- [LibreOffice](https://www.libreoffice.org/discover/libreoffice/) for advanced file conversions + +# Additional Technologies +- HTML, CSS, JavaScript for the frontend +- Docker for containerization +- jcefmaven (specifically for portable non-server version) + +For a comprehensive list of all technologies within the java application and their licenses, please visit our [licenses page](https://stirlingpdf.io/licenses). \ No newline at end of file diff --git a/docs/Getting started/Installation/Docker/Docker Install.md b/docs/Installation/Docker Install.md similarity index 98% rename from docs/Getting started/Installation/Docker/Docker Install.md rename to docs/Installation/Docker Install.md index 9bb4d92..b3e115a 100644 --- a/docs/Getting started/Installation/Docker/Docker Install.md +++ b/docs/Installation/Docker Install.md @@ -1,7 +1,7 @@ --- sidebar_position: 2 id: Docker Install -title: Installation Guide +title: Docker Guide --- # Docker Images for Stirling-PDF diff --git a/docs/Getting started/Installation/Local/Unix.md b/docs/Installation/Unix.md similarity index 94% rename from docs/Getting started/Installation/Local/Unix.md rename to docs/Installation/Unix.md index 104d39f..5674f35 100644 --- a/docs/Getting started/Installation/Local/Unix.md +++ b/docs/Installation/Unix.md @@ -35,7 +35,7 @@ Install the following software, if not already installed: - Autoconf -- libtool +- libtool[Windows.md](Windows.md) - pkg-config @@ -83,50 +83,35 @@ nix-env -iA nixpkgs.jbig2enc ``` ### Step 3: Install Additional Software -Next we need to install LibreOffice for conversions, ocrmypdf for OCR, and opencv for pattern recognition functionality. +Next we need to install LibreOffice for conversions, tesseract for OCR, and opencv for pattern recognition functionality. Install the following software: -- libreoffice-core - -- libreoffice-common - -- libreoffice-writer - -- libreoffice-calc - -- libreoffice-impress - +- libreoffice (libreoffice-core libreoffice-common libreoffice-writer libreoffice-calc libreoffice-impress) - python3-uno - - unoconv - - pngquant - -- unpaper - -- ocrmypdf - +- tesseract - opencv-python-headless For Debian-based systems, you can use the following command: ```bash -sudo apt-get install -y libreoffice-writer libreoffice-calc libreoffice-impress unpaper ocrmypdf +sudo apt-get install -y libreoffice-writer libreoffice-calc libreoffice-impress tesseract pip3 install uno opencv-python-headless unoconv pngquant WeasyPrint --break-system-packages ``` For Fedora: ```bash -sudo dnf install -y libreoffice-writer libreoffice-calc libreoffice-impress unpaper ocrmypdf +sudo dnf install -y libreoffice-writer libreoffice-calc libreoffice-impress tesseract pip3 install uno opencv-python-headless unoconv pngquant WeasyPrint ``` For Nix: ```bash -nix-env -iA nixpkgs.unpaper nixpkgs.libreoffice nixpkgs.ocrmypdf nixpkgs.poppler_utils +nix-env -iA nixpkgs.libreoffice nixpkgs.tesseract nixpkgs.poppler_utils pip3 install uno opencv-python-headless unoconv pngquant WeasyPrint ``` @@ -170,7 +155,7 @@ Manual: 1. Download the desired language pack(s) by selecting the `.traineddata` file(s) for the language(s) you need. 2. Place the `.traineddata` files in the Tesseract tessdata directory: `/usr/share/tessdata` -3. Please view [OCRmyPDF install guide](https://ocrmypdf.readthedocs.io/en/latest/installation.html) for more info. +3. Please view [tesseract install guide](https://tesseract.readthedocs.io/en/latest/installation.html) for more info. **IMPORTANT:** DO NOT REMOVE EXISTING `eng.traineddata`, IT'S REQUIRED. diff --git a/docs/Getting started/Installation/Docker/Docker Versions.md b/docs/Installation/Versions.md similarity index 69% rename from docs/Getting started/Installation/Docker/Docker Versions.md rename to docs/Installation/Versions.md index 28ef985..1b08ce0 100644 --- a/docs/Getting started/Installation/Docker/Docker Versions.md +++ b/docs/Installation/Versions.md @@ -1,23 +1,42 @@ --- sidebar_position: 1 -id: Docker Versions -title: Docker Versions +id: Versions +title: Versions --- -# Docker Versions of Stirling PDF +# Versions of Stirling PDF -Stirling PDF is avaiable in three distinct docker images: -- ![Docker Image Size (tag)](https://img.shields.io/docker/image-size/frooodle/s-pdf/latest-fat?label=Stirling-PDF%20Fat) -- ![Docker Image Size (tag)](https://img.shields.io/docker/image-size/frooodle/s-pdf/latest?label=Stirling-PDF%20Full) -- ![Docker Image Size (tag)](https://img.shields.io/docker/image-size/frooodle/s-pdf/latest-ultra-lite?label=Stirling-PDF%20Ultra-Lite) +Stirling PDF is available in several formats, each catering to different needs and use cases: -Each version caters to different needs based on the specific features required and the storage space available. +## Docker Versions +For server deployments, we offer three pre-configured Docker images: +- ![Docker Image Size (tag)](https://img.shields.io/docker/image-size/stirlingtools/stirling-pdf/latest-fat?label=Stirling-PDF%20Fat) +- ![Docker Image Size (tag)](https://img.shields.io/docker/image-size/stirlingtools/stirling-pdf/latest?label=Stirling-PDF%20Full) +- ![Docker Image Size (tag)](https://img.shields.io/docker/image-size/stirlingtools/stirling-pdf/latest-ultra-lite?label=Stirling-PDF%20Ultra-Lite) -The Fat version contains the same from Full but with additional fonts for conversion and the Security jar pre-bundled. It is the recommended version for those unconcerned about storage +- **Fat**: Includes all Full features plus additional fonts and pre-bundled jar security version +- **Full**: All features pre-configured and ready to use +- **Ultra-Lite**: Minimal installation with core features only -For an in-depth comparison of what each version offers, please refer to the graph below. -If storage optimization is not a concern, we recommend using the latest tag for the most complete set of features. -Here are the different technologies each version uses. +## Desktop Versions (Windows & Unix) +The desktop versions of Stirling PDF use a dynamic feature system. They start with Ultra-Lite features as the base and automatically enable additional functionality based on installed dependencies: + +Base Features (Ultra-Lite): +- Core PDF operations (merge, split, rotate, etc.) +- Basic conversions +- Password protection +- All features marked with ✔️ in the Ultra-Lite column below + +Additional features become available automatically when you install: +- LibreOffice: Enables document format conversions (PDF to Word, Excel, etc.) +- Tesseract: Enables OCR functionality +- QPDF: Enables compression and repair features +- Other dependencies: Enable their respective features + + +## Feature Comparison + +Here are the different technologies each version uses: | Technology | Ultra-Lite | Full | |----------------|:----------:|:----:| @@ -26,7 +45,7 @@ Here are the different technologies each version uses. | Libre | | ✔️ | | Python | | ✔️ | | OpenCV | | ✔️ | -| OCRmyPDF | | ✔️ | +| Tesseract | | ✔️ | And here you see what functions are offered as part of each. diff --git a/docs/Getting started/Installation/Local/Windows.md b/docs/Installation/Windows.md similarity index 99% rename from docs/Getting started/Installation/Local/Windows.md rename to docs/Installation/Windows.md index df82c92..a7b0320 100644 --- a/docs/Getting started/Installation/Local/Windows.md +++ b/docs/Installation/Windows.md @@ -1,7 +1,7 @@ --- sidebar_position: 2 id: Windows Installation -title: Windows installation Guide +title: Windows Guide --- # Windows Installation Guide for Stirling PDF diff --git a/docs/Getting started/_category_.json b/docs/Installation/_category_.json similarity index 100% rename from docs/Getting started/_category_.json rename to docs/Installation/_category_.json diff --git a/docs/Overview/Getting Started.md b/docs/Overview/Getting Started.md new file mode 100644 index 0000000..2c5f83a --- /dev/null +++ b/docs/Overview/Getting Started.md @@ -0,0 +1,46 @@ +--- +sidebar_position: 1 +--- +# Getting Started with Stirling PDF + +Welcome to Stirling PDF! This guide will help you choose the right installation method based on your needs. +We prioritise and focus on our Server deployment options however we also offer a [Ultra-Lite model](/Installation/Docker/Docker%20Versions) for desktop users + +## Choose Your Installation Type + +### For Desktop Users +If you want to run Stirling PDF on your personal computer: + +1. **Windows Users** + - Download our portable executable (Stirling-PDF.exe) for a simple, standalone experience + - Refer to our [Windows Installation Guide](/Installation/Windows%20Installation) for detailed setup instructions + - Note: A UI installer version is coming in the next release! + +2. **Linux/Unix Users** + - Follow our comprehensive [Unix Installation Guide](/Installation/Unix%20Installation) for a native installation + +### For Server Deployments +If you're looking to host Stirling PDF as a service: + +1. **Docker Users** + - We recommend using our Docker images for the easiest deployment + - Check our [Docker Installation Guide](/Installation/Docker/Docker%20Install) for setup instructions + - Choose from three versions: + - Fat (latest-fat): Includes additional fonts and security features + - Standard (latest): Balanced features and size + - Ultra-Lite (latest-ultra-lite): Minimal size with core features + +2. **Manual Server Setup** + - For bare metal server installations + - Use Stirling-PDF-Server package + - Follow our [Unix Installation Guide](/Installation/Unix%20Installation) for setup steps + +## Quick Reference Table + +| Installation Type | Best For | Documentation Link | +|------------------|----------|-------------------| +| Stirling-PDF.exe | Windows desktop users | [Windows Guide](/Installation/Windows%20Installation) | +| Stirling-PDF-Server | Server deployments without Docker | [Unix Guide](/Installation/Unix%20Installation) | +| Docker Images | Server deployments with Docker | [Docker Guide](/Installation/Docker/Docker%20Install) | + +Choose the installation method that best suits your needs and environment. Each guide provides detailed instructions for getting Stirling PDF up and running on your system. \ No newline at end of file diff --git a/docs/Overview/The Technologies.md b/docs/Overview/The Technologies.md deleted file mode 100644 index c97ddb0..0000000 --- a/docs/Overview/The Technologies.md +++ /dev/null @@ -1,26 +0,0 @@ ---- -sidebar_position: 2 ---- - -# The Technologies Behind Stirling PDF - -Stirling PDF harnesses several technologies throughout. - -# Java -As part of the JAVA framework to host the WebUI itself we use -Spring Boot, Thymeleaf and PDFBox. -Apache PDFBox is the core of the PDF functionality within Stirling-PDF. -They offer a variety of methods to edit PDFs which we have then build Stirling-PDF on. -We also show all licenses used within our Java application [here](https://stirlingpdf.io/licenses) - -# JavaScript -- [PDF.js](https://github.com/mozilla/pdf.js) -- [PDF-LIB.js](https://github.com/Hopding/pdf-lib) - - -# Others -We also use other open source applications along side ours to offer additional functionality. -- [LibreOffice](https://www.libreoffice.org/discover/libreoffice/) for advanced conversions -- [OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) is used for OCR (Optical CHaracter recognition) to change PDF images into text. -- GhostScript, Bundled with OCRmyPDF, this is used to compress PDF documents. - diff --git a/docs/Overview/What is Stirling-PDF.md b/docs/Overview/What is Stirling-PDF.md index c106829..01749ee 100644 --- a/docs/Overview/What is Stirling-PDF.md +++ b/docs/Overview/What is Stirling-PDF.md @@ -1,5 +1,5 @@ --- -sidebar_position: 1 +sidebar_position: 0 slug: / ---