-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat!: add LayoutLMv3 model and restructure project architecture
Major changes: - Integrate LayoutLMv3 model for document classification - Add new predictor container for model inference - Create dedicated data models and configurations using dataclasses - Implement proper model versioning and persistence - Optimize Docker builds with UV package manager - Set up volume bindings for models and logs persistence - Reorganize code for better maintainability and testing - Add proper error handling and logging - Implement state management for processing pipeline - Add proper type hints and documentation Infrastructure improvements: - Replace pip with UV for faster package installation - Add bind mounts for logs and model artifacts - Implement multi-stage Docker builds - Configure proper networking between services - Set up development environment with PDM - Add start_services.sh script for easy deployment and initialization BREAKING CHANGE: Complete architecture redesign with new model integration and container structure.
- Loading branch information
1 parent
09e1066
commit 8859503
Showing
24 changed files
with
1,804 additions
and
817 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -162,5 +162,7 @@ cython_debug/ | |
#.idea/ | ||
|
||
# Other | ||
.DS_Store | ||
logs/ | ||
__*.py | ||
__*.py | ||
models/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,20 +1,36 @@ | ||
FROM python:3.12-slim | ||
FROM ghcr.io/astral-sh/uv:latest AS uv | ||
FROM python:3.12-slim AS python | ||
|
||
LABEL authors="codeplayer" | ||
|
||
WORKDIR /code | ||
ENV VIRTUAL_ENV=/opt/venv | ||
|
||
WORKDIR /app/data/ocr | ||
|
||
# Update and upgrade the system | ||
RUN apt update -y && \ | ||
apt upgrade -y \ | ||
# Install required packages | ||
&& apt install poppler-utils -y \ | ||
# cleanup | ||
&& apt autoremove -y \ | ||
&& apt clean -y \ | ||
&& rm -rf /var/lib/apt/lists | ||
|
||
COPY ./requirements.txt /code/requirements.txt | ||
RUN \ | ||
# we use a cache --mount to reuse the uv cache across builds | ||
--mount=type=cache,target=/root/.cache/uv \ | ||
# we use a bind --mount to use the uv binary from the uv stage | ||
--mount=type=bind,from=uv,source=/uv,target=/uv \ | ||
# we use a bind --mount to use the requirements.txt from the host instead of adding a COPY layer | ||
--mount=type=bind,source=requirements.txt,target=requirements.txt \ | ||
/uv venv /opt/venv && \ | ||
/uv pip install -r requirements.txt | ||
|
||
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt | ||
WORKDIR /app/code/ocr | ||
|
||
COPY ./src/documentclassification/ocr/ /code/documentclassification/ocr/ | ||
COPY src/ocr/ . | ||
COPY src/configs/ocr_config.py configs/ocr_config.py | ||
COPY src/payload/ocr_models.py payload/ocr_models.py | ||
|
||
CMD ["uvicorn", "documentclassification.ocr.ocr:app", "--host", "0.0.0.0", "--port", "8080"] | ||
CMD ["/opt/venv/bin/python", "-m", "uvicorn", "ocr:app", "--host", "0.0.0.0", "--port", "8080"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
FROM ghcr.io/astral-sh/uv:latest AS uv | ||
FROM python:3.12-slim AS python | ||
|
||
ENV VIRTUAL_ENV=/opt/venv | ||
|
||
WORKDIR /app/data/predictor | ||
|
||
# Update and upgrade the system | ||
RUN apt update -y && \ | ||
apt upgrade -y \ | ||
# cleanup | ||
&& apt autoremove -y \ | ||
&& apt clean -y \ | ||
&& rm -rf /var/lib/apt/lists | ||
|
||
RUN \ | ||
# we use a cache --mount to reuse the uv cache across builds | ||
--mount=type=cache,target=/root/.cache/uv \ | ||
# we use a bind --mount to use the uv binary from the uv stage | ||
--mount=type=bind,from=uv,source=/uv,target=/uv \ | ||
# we use a bind --mount to use the requirements.txt from the host instead of adding a COPY layer | ||
--mount=type=bind,source=requirements.txt,target=requirements.txt \ | ||
/uv venv /opt/venv && \ | ||
/uv pip install -r requirements.txt | ||
|
||
WORKDIR /app/code/predictor | ||
|
||
COPY src/predictor/ . | ||
COPY src/configs/model_config.py configs/model_config.py | ||
COPY src/payload/model_models.py payload/model_models.py | ||
|
||
CMD ["/opt/venv/bin/python","-m", "uvicorn", "model:app", "--host", "0.0.0.0", "--port", "7070"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.