Skip to content

emmanuelmoraGL/terraform-genai-doc-summarization

 
 

Repository files navigation

Generative AI Document Summarization

Description

Tagline

Create summaries of a large corpus of documents using Generative AI.

Detailed

This solution showcases how to summarize a large corpus of documents using Generative AI. It provides an end-to-end demonstration of document summarization going all the way from raw documents, detecting text in the documents and summarizing the documents on-demand using Vertex AI LLM APIs, Cloud Vision Optical Character Recognition (OCR) and BigQuery.

PreDeploy

To deploy this blueprint you must have an active billing account and billing permissions.

Architecture

Document Summarization using Generative AI

  1. The developer follows a tutorial on a Jupyter Notebook, where they upload a PDF — either through Vertex AI Workbench or Colaboratory.
  2. The uploaded PDF file is sent to a function running on Cloud Functions. This function handles PDF file processing.
  3. The Cloud Functions function uses Cloud Vision to extract all text from the PDF file.
  4. The Cloud Functions function stores the extracted text inside a Cloud Storage bucket.
  5. The Cloud Functions function uses Vertex AI’s LLM API to summarize the extracted text.
  6. The Cloud Functions function stores the text summaries of PDFs in BigQuery tables.
  7. As an alternative to uploading PDF files through Jupyter Notebook, the developer can upload a PDF file directly to a Cloud Storage bucket — for instance, through the Console UI or gcloud. This upload triggers Eventarc to begin the Document Processing phase.
  8. As a result of the direct upload to Cloud Storage, Eventarc triggers the Document Processing phase, handled by Cloud Functions.

Documentation

Deployment Duration

Configuration: 1 mins Deployment: 10 mins

Cost

Cost Details

Inputs

Name Description Type Default Required
bucket_name The name of the bucket to create string "genai-webhook" no
gcf_timeout_seconds GCF execution timeout number 900 no
project_id The Google Cloud project ID to deploy to string n/a yes
region Google Cloud region string "us-central1" no
time_to_enable_apis Wait time to enable APIs in new projects string "180s" no
webhook_name Name of the webhook string "webhook" no
webhook_path Path to the webhook directory string "webhook" no

Outputs

Name Description
genai_doc_summary_colab_url The URL to launch the notebook tutorial for the Generateive AI Document Summarization Solution
neos_walkthrough_url The URL to launch the in-console tutorial for the Generative AI Document Summarization solution

Requirements

These sections describe requirements for using this module.

Software

The following dependencies must be available:

Service Account

A service account with the following roles must be used to provision the resources of this module:

  • Storage Admin: roles/storage.admin

APIs

A project with the following APIs enabled must be used to host the resources of this module:

  • Google Cloud Storage JSON API: storage-api.googleapis.com

Contributing

Refer to the contribution guidelines for information on contributing to this module.

Security Disclosures

Please see our security disclosure process.

Releases

No releases published

Packages

No packages published

Languages

  • Python 45.0%
  • Jupyter Notebook 32.8%
  • HCL 15.4%
  • Makefile 3.1%
  • Go 2.0%
  • Shell 1.7%