Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update! much needed #21

Merged
merged 2 commits into from
May 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions docs/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,10 @@ tags:

Stirling PDF exposes a simple API for easy integration with external scripts. For an exhaustive list of all available API endpoints and their functions, please refer to the [Swagger Documentation](https://app.swaggerhub.com/apis-docs/Frooodle/Stirling-PDF/).

Stirling-PDF's feature set is not entirely confined to the backend, hence not all functionalities are accessible via the API. Certain operations, such as document signing and flattening, are executed exclusively on the front-end, and as such, they are only available through the Web-UI. If you encounter a situation where some API endpoints appear to be absent, it is likely attributable to these front-end exclusive features.
Stirling-PDF's feature set is not entirely confined to the backend, hence not all functionalities are accessible via the API. Certain operations, such as the "view-pdf" or "visually sign", are executed exclusively on the front-end, and as such, they are only available through the Web-UI. If you encounter a situation where some API endpoints appear to be absent, it is likely attributable to these front-end exclusive features.

Stirling-PDF also has statistic and health endpoints to integrate with monitoring/dashboard applications such as [Heimdall](https://TODOAAAAAAAAAAAAAAAAAAAAAAAAA) and [Fenrus](https://TODO)
Stirling-PDF also has statistic and health endpoints to integrate with monitoring/dashboard applications
[Stats API docs](https://app.swaggerhub.com/apis-docs/Frooodle/Stirling-PDF/0.24.6#/Info)


# Example CURL Commands
Expand Down
78 changes: 61 additions & 17 deletions docs/Advanced Configuration/How to add configurations.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,52 +5,96 @@ sidebar_position: 1

Stirling PDF allows easy customization of the app.
Includes things like
- Custom application name
- Custom slogans, icons, images, and even custom HTML (via file overrides)

- Custom application name
- Custom slogans, icons, HTML, images CSS etc (via file overrides)

For customization via variables there are two options for this, either using the settings file ``settings.yml``
This file is located in the ``/configs`` directory and follows standard YAML formatting or directly via environment variables.
There are two options for this, either using the generated settings file ``settings.yml``
This file is located in the ``/configs`` directory and follows standard YAML formatting

Environment variables override their settings file equivalents
Environment variables are also supported and would override the settings file
For example in the settings.yml you have

```
system:
defaultLocale: 'en-US'
enableLogin: 'true'
```

To have this via an environment variable you would add each sub section together to form the parameter.
In this case adding ``system`` to ``defaultLocale`` with all caps creating the variable ``SYSTEM_DEFAULTLOCALE`` or ``SYSTEM_DEFAULT_LOCALE``
To have this via an environment variable you would have ``SYSTEM_ENABLELOGIN``

The Current list of settings is

```
security:
enableLogin: false # set to 'true' to enable login
csrfDisabled: true
csrfDisabled: true # Set to 'true' to disable CSRF protection (not recommended for production)
loginAttemptCount: 5 # lock user account after 5 tries
loginResetTimeMinutes: 120 # lock account for 2 hours after x attempts
# initialLogin:
# username: "admin" # Initial username for the first login
# password: "stirling" # Initial password for the first login
# oauth2:
# enabled: false # set to 'true' to enable login (Note: enableLogin must also be 'true' for this to work)
# issuer: "" # set to any provider that supports OpenID Connect Discovery (/.well-known/openid-configuration) end-point
# clientId: "" # Client ID from your provider
# clientSecret: "" # Client Secret from your provider
# autoCreateUser: false # set to 'true' to allow auto-creation of non-existing users
# useAsUsername: "email" # Default is 'email'; custom fields can be used as the username
# scopes: "openid, profile, email" # Specify the scopes for which the application will request permissions
# provider: "google" # Set this to your OAuth provider's name, e.g., 'google' or 'keycloak'
# client:
# google:
# clientId: "" # Client ID for Google OAuth2
# clientSecret: "" # Client Secret for Google OAuth2
# scopes: "https://www.googleapis.com/auth/userinfo.email, https://www.googleapis.com/auth/userinfo.profile" # Scopes for Google OAuth2
# useAsUsername: "email" # Field to use as the username for Google OAuth2
# github:
# clientId: "" # Client ID for GitHub OAuth2
# clientSecret: "" # Client Secret for GitHub OAuth2
# scopes: "read:user" # Scope for GitHub OAuth2
# useAsUsername: "login" # Field to use as the username for GitHub OAuth2
# keycloak:
# issuer: "http://192.168.0.123:8888/realms/stirling-pdf" # URL of the Keycloak realm's OpenID Connect Discovery endpoint
# clientId: "stirling-pdf" # Client ID for Keycloak OAuth2
# clientSecret: "" # Client Secret for Keycloak OAuth2
# scopes: "openid, profile, email" # Scopes for Keycloak OAuth2
# useAsUsername: "email" # Field to use as the username for Keycloak OAuth2

system:
defaultLocale: 'en-US' # Set the default language (e.g. 'de-DE', 'fr-FR', etc)
googlevisibility: false # 'true' to allow Google visibility (via robots.txt), 'false' to disallow
enableAlphaFunctionality: false # Set to enable functionality which might need more testing before it fully goes live (This feature might make no changes)
showUpdate: true # see when a new update is available
showUpdateOnlyAdmin: false # Only admins can see when a new update is available, depending on showUpdate it must be set to 'true'
customHTMLFiles: false # enable to have files placed in /customFiles/templates override the existing template html files

#ui:
# appName: exampleAppName # Application's visible name
# homeDescription: I am a description # Short description or tagline shown on homepage.
# appNameNavbar: navbarName # Name displayed on the navigation bar
ui:
appName: null # Application's visible name
homeDescription: null # Short description or tagline shown on homepage.
appNameNavbar: null # Name displayed on the navigation bar

endpoints:
toRemove: [] # List endpoints to disable (e.g. ['img-to-pdf', 'remove-pages'])
groupsToRemove: [] # List groups to disable (e.g. ['LibreOffice'])

metrics:
enabled: true # 'true' to enable Info APIs endpoints (view http://localhost:8080/swagger-ui/index.html#/API to learn more), 'false' to disable
enabled: true # 'true' to enable Info APIs (`/api/*`) endpoints, 'false' to disable
```

For more info on the individual entries please see their separate pages
There is an additional config file ``/configs/custom_settings.yml`` were users familiar with java and spring application.properties can input their own settings on-top of Stirling-PDFs existing ones


#### Extra notes
- Endpoints. Currently, the endpoints ENDPOINTS_TO_REMOVE and GROUPS_TO_REMOVE can include comma separate lists of endpoints and groups to disable as example ENDPOINTS_TO_REMOVE=img-to-pdf,remove-pages would disable both image-to-pdf and remove pages, GROUPS_TO_REMOVE=LibreOffice Would disable all things that use LibreOffice. You can see a list of all endpoints and groups [here](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/Endpoint-groups.md)
- customStaticFilePath. Customise static files such as the app logo by placing files in the /customFiles/static/ directory. An example of customising app logo is placing a /customFiles/static/favicon.svg to override current SVG. This can be used to change any images/icons/css/fonts/js etc in Stirling-PDF

### Environment only parameters
- ``SYSTEM_ROOT_URI_PATH`` changes the websites root path, ie if set to ``pdf-app`` to application will be viewable at address ``localhost:8080/pdf-app`` instead of ``localhost:8080/``

- ``SYSTEM_ROOTURIPATH`` ie set to ``/pdf-app`` to Set the application's root URI to ``localhost:8080/pdf-app``
- ``SYSTEM_CONNECTIONTIMEOUTMINUTES`` to set custom connection timeout values
- ``DOCKER_ENABLE_SECURITY`` to tell docker to download security jar (required as true for authentication and login functionality)
- ``DOCKER_ENABLE_SECURITY`` to tell docker to download security jar (required as true for auth login)
- ``INSTALL_BOOK_AND_ADVANCED_HTML_OPS`` to download calibre onto stirling-pdf enabling pdf to/from book and advanced html conversion
- ``LANGS`` to define custom font libraries to install for use for document conversions

### Local
If running Java directly outside of docker, you can set these environment variables before starting the app
Expand Down
80 changes: 78 additions & 2 deletions docs/Advanced Configuration/OCR.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,82 @@ sidebar_position: 1
id: OCR
title: OCR (Optical Character Recognition)
---
# OCR (Optical Character Recognition)
# OCR Language Packs and Setup

TODO OCR HERE
This document provides instructions on how to add additional language packs for the OCR tab in Stirling-PDF, both inside and outside of Docker.

## My OCR used to work and now doesn't!
The paths have changed for the tessadata locations on new docker images, please use ``/usr/share/tessdata`` (Others should still work for backwards compatibility but might not)

## How does the OCR Work
Stirling-PDF uses [OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) which in turn uses tesseract for its text recognition.
All credit goes to them for this awesome work!

## Language Packs

Tesseract OCR supports a variety of languages. You can find additional language packs in the Tesseract GitHub repositories:

- [tessdata_fast](https://github.com/tesseract-ocr/tessdata_fast): These language packs are smaller and faster to load, but may provide lower recognition accuracy.
- [tessdata](https://github.com/tesseract-ocr/tessdata): These language packs are larger and provide better recognition accuracy, but may take longer to load.

Depending on your requirements, you can choose the appropriate language pack for your use case. By default Stirling-PDF uses the tessdata_fast eng but this can be replaced.

### Installing Language Packs

1. Download the desired language pack(s) by selecting the `.traineddata` file(s) for the language(s) you need.
2. Place the `.traineddata` files in the Tesseract tessdata directory: `/usr/share/tessdata`

# DO NOT REMOVE EXISTING ENG.TRAINEDDATA, IT'S REQUIRED.

#### Docker

If you are using Docker, you need to expose the Tesseract tessdata directory as a volume in order to use the additional language packs.
#### Docker Compose
Modify your `docker-compose.yml` file to include the following volume configuration:


```
services:
your_service_name:
image: your_docker_image_name
volumes:
- /location/of/trainingData:/usr/share/tessdata
```


#### Docker run
Add the following to your existing docker run command
```
-v /location/of/trainingData:/usr/share/tessdata
```

#### Non-Docker
If you are not using Docker, you need to install the OCR components, including the ocrmypdf app.
You can see [OCRmyPDF install guide](https://ocrmypdf.readthedocs.io/en/latest/installation.html)

Debian based systems, install languages with this command:

```
sudo apt update &&\
# All languages
# sudo apt install -y 'tesseract-ocr-*'

# Find languages:
apt search tesseract-ocr-

# View installed languages:
dpkg-query -W tesseract-ocr- | sed 's/tesseract-ocr-//g'
```

Fedora:

```
# All languages
# sudo dnf install -y tesseract-langpack-*

# Find languages:
dnf search -C tesseract-langpack-

# View installed languages:
rpm -qa | grep tesseract-langpack | sed 's/tesseract-langpack-//g'
```
17 changes: 5 additions & 12 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,25 +6,18 @@ sidebar_position: 8
### Q1: Can I add authentication to Stirling PDF?
No, Stirling PDF doesn't support built-in authentication, and it is not a planned feature. For secure access, we recommend implementing an external, trusted authentication solution such as Authentik or Authelia.

### Q2: What new features are planned for Stirling PDF?
Some of the upcoming features include:
- Progress bar/Tracking: To give you a real-time update on ongoing operations.
- Custom logic pipelines: To combine multiple operations.
- Folder support: To automate operations on the contents of a folder.
- Auto rename: To rename files based on their title text.

### Q3: Why are .htm files being downloaded when I use the application?
### Q2: Why are .htm files being downloaded when I use the application?
This is often caused by your NGINX configuration. NGINX's default file upload size is 1MB, and any file larger than this will cause an .htm file to be downloaded instead. To fix this issue, you should modify your NGINX configuration to increase the maximum file upload size.

### Q4: Can I customize the appearance and language of the Stirling PDF application?
### Q3: Can I customize the appearance and language of the Stirling PDF application?
Yes, Stirling PDF provides several environment variables to allow customization of the application, including the name, description, default language, and visibility to search engines. Please refer to the "Customization" section for more details.

### Q5: I want to add a new feature to Stirling PDF. How can I contribute?
### Q4: I want to add a new feature to Stirling PDF. How can I contribute?
We welcome contributions from the community! Please open an issue on our GitHub page to discuss any large features before making any changes. Any small changes are fully welcome without discussion! After the feature has been discussed and approved, you can make the changes and submit a pull request.

### Q6: I have a cool idea can you add it?
### Q5: I have a cool idea can you add it?
All feedback and suggestions are appreciated. It is best to submit these via a Github issue ticket with [Feature Request] in the title.
You can also reach out in discord but without a ticket to track it the request can often get lost!

### Q7: I found a bug in Stirling PDF. Where can I report it?
### Q6: I found a bug in Stirling PDF. Where can I report it?
Please report any bugs or issues you encounter through our GitHub Issues page. Be sure to include as much detail as possible so we can diagnose and resolve the issue quickly.
2 changes: 1 addition & 1 deletion docs/Functionality/Convert/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,4 @@ sidebar_position: 2

- `pdf-to-pdfa`: This feature transforms PDF files into PDF/A format for long-term archiving.

- `pdf-to-csv`:
- `pdf-to-csv`: This feature Tries to detect tables within a PDF which can be exported, This only works with digital PDFs not scanned and is a Work in progress feature due to its complexity
4 changes: 3 additions & 1 deletion docs/Functionality/Miscellaneous/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,6 @@ sidebar_position: 4

- `show-javascript`: Shows any embedded javascript within a PDF

- `stamp`: Adds a user-defined text or image to the corner of certain or all PDF pages
- `auto-split-pdf`: This Automatically splits documents into seperate files based on the QR code detected between each. It's intent is that a "seperator" page be placed between scans so that bulk scanning can be achieved with splitting as a post process.

- `add-stamp`: Adds a user-defined text or image to the corner of certain or all PDF pages
4 changes: 2 additions & 2 deletions docs/Functionality/Page operations/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,6 @@ sidebar_position: 1

- `split-by-size-or-count`: Splits one or multiple PDF files into parts consisting of a maximum file size or page count definde by the user.

- `overlay-pdf`:
- `overlay-pdf`: Can merge multiple PDFs into one another (ontop/behind etc) in various ways

- `split-pdf-by-sections`:
- `split-pdf-by-sections`: Splits a page in multiple section vertically, horizontally or both. Can be used to split a page in half etc.
1 change: 1 addition & 0 deletions docs/Functionality/Security/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@ sidebar_position: 3

- `auto-redact`: This features lets the user input text (or regex) to be redacted/blacked out from the pdf document.

- `get-info-on-pdf`: This grabs all info on a PDF such as version, font types, width height and any and all information it can find and puts it within a formatted JSON doc (or visual tables within UI)
19 changes: 11 additions & 8 deletions docs/Getting started/Installation/Docker/Docker Install.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,26 +13,27 @@ Please note that Stirling PDF offers three distinct versions tailored for variou
| Version | Latest Tag |
| ---------- | ------------------- |
| Standard | `latest` |
| Lite | `latest-lite` |
| Ultra Lite | `latest-ultra-lite` |

### Run docker container with `docker run`

```
docker run -d \
-p 8080:8080 \
-v /location/of/trainingData:/usr/share/tesseract-ocr/4.00/tessdata \
-v /location/of/trainingData:/usr/share/tessdata \
-v /location/of/extraConfigs:/configs \
-v /location/of/logs:/logs \
-e DOCKER_ENABLE_SECURITY=false \
-e INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false \
-e LANGS=en_GB \
--name stirling-pdf \
frooodle/s-pdf:latest


Can also add these for customization but are not required

Can also add these for customisation but are not required
-v /location/of/customFiles:/customFiles \
```


### Run docker container with `docker compose`

- `docker-compose.yml`
Expand All @@ -44,14 +45,16 @@ services:
ports:
- '8080:8080'
volumes:
- /location/of/trainingData:/usr/share/tesseract-ocr/4.00/tessdata #Required for extra OCR languages
- /location/of/trainingData:/usr/share/tessdata #Required for extra OCR languages
- /location/of/extraConfigs:/configs
# - /location/of/customFiles:/customFiles/
# - /location/of/logs:/logs/
environment:
- DOCKER_ENABLE_SECURITY=false

- INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false
- LANGS=en_GB
```

### Extras

For extra parameters and customization please check the [advanced configuration](http://todo) page!
For extra parameters and customization please check the [advanced configuration](https://stirlingtools.com/docs/Advanced%20Configuration/How%20to%20add%20configurations) page!
Loading
Loading