This job uses Document AI to process data from human-readable invoices in a variety of file formats stored in a Cloud Storage bucket, and saves that data in a Cloud Firestore database.
The job being executed is in processor/
. That program
calls code from the processor/
module to work with
the Document AI and Cloud Firestore client libraries.
The Dockerfile manifest defines a minimal container using the official Python image to run a single Python script.
Create a Google Cloud project using the console or command line.
Define the project region you'll create components in:
Enable the Cloud Run API, Firestore API, and Cloud Document API.
gcloud services enable \ \ \
Create the Firestore database:
gcloud app create --region=$GOOGLE_CLOUD_REGION gcloud firestore databases create --project $GOOGLE_CLOUD_PROJECT --region $GOOGLE_CLOUD_REGION
Navigate to the Document AI section and create a new Invoice Parser processor. Learn how to Create a Document AI processor in the console.
Note the Bucket name and the Document AI Processor ID which will be used in the command to create the job.
Create a bucket in the command line or the console to hold invoices to process.
gsutil mb -l $GOOGLE_CLOUD_REGION gs://${BUCKET}
New invoices should be place in a bucket folder called
and the file names should start with a lower-case hex digit (one of 0123456789abcdef). Naming them with UUID4 value works well.# Copy provided example invoices to bucket gsutil cp -r incoming/*.pdf gs://${BUCKET}/incoming
Cloud Run Jobs can create a job from a container. The container can be built with a variety of tools, including Google Cloud Build with the command:
gcloud builds submit$GOOGLE_CLOUD_PROJECT/invoice-processor
Once a container is available in a container repository, create the job with the command:
gcloud run jobs create invoice-processing \ --image$GOOGLE_CLOUD_PROJECT/invoice-processor \ --region $GOOGLE_CLOUD_REGION \ --set-env-vars BUCKET=$BUCKET \ --set-env-vars PROCESSOR_ID=$PROCESSOR_ID
Execute the job from the command line with the command:
gcloud run jobs execute invoice-processing
Run your job nightly with a cron job.
Create new service account
gcloud iam service-accounts create process-identity
Give the service account access to invoke the
jobgcloud run jobs add-iam-policy-binding invoice-processing \ --member serviceAccount:process-identity@$ \ --role roles/run.invoker
Note: The job does not have a publicly available endpoint; therefore must the Cloud Scheduler Job must have permissions to invoke.
Create Cloud Scheduler Job for every day at midnight:
gcloud scheduler jobs create http my-job \ --schedule="0 0 * * *" \ --uri="https://${GOOGLE_CLOUD_REGION}${GOOGLE_CLOUD_PROJECT}/jobs/invoice-processing:run" \ --http-method=POST \ --oauth-service-account-email=process-identity@${GOOGLE_CLOUD_PROJECT}
This repo also includes services for uploading and reviewing the processed invoices.
Deploy the Uploader service:
gcloud run deploy uploader \ --source uploader/ \ --set-env-vars BUCKET=$BUCKET \ --allow-unauthenticated
Deploy the Reviewer service:
gcloud run deploy reviewer \ --source reviewer/ \ --set-env-vars BUCKET=$BUCKET \ --allow-unauthenticated