Skip to content

Commit

Permalink
docs[patch]: Adds PDF ingestion and QA tutorial (#5692)
Browse files Browse the repository at this point in the history
* Adds PDF ingestion and QA tutorial

* Typo

* typo
  • Loading branch information
jacoblee93 authored Jun 6, 2024
1 parent 423da6a commit e1c2856
Show file tree
Hide file tree
Showing 9 changed files with 423 additions and 14 deletions.
Binary file added docs/core_docs/data/nke-10k-2023.pdf
Binary file not shown.
10 changes: 5 additions & 5 deletions docs/core_docs/docs/how_to/document_loader_pdf.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ npm install pdf-parse
## Usage, one document per page

```typescript
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";
// Or, in web environments:
// import { WebPDFLoader } from "langchain/document_loaders/web/pdf";
// import { WebPDFLoader } from "@langchain/community/document_loaders/web/pdf";
// const blob = new Blob(); // e.g. from a file input
// const loader = new WebPDFLoader(blob);

Expand All @@ -29,7 +29,7 @@ const docs = await loader.load();
## Usage, one document per file

```typescript
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";

const loader = new PDFLoader("src/document_loaders/example_data/example.pdf", {
splitPages: false,
Expand All @@ -49,7 +49,7 @@ npm install pdfjs-dist
```

```typescript
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";

const loader = new PDFLoader("src/document_loaders/example_data/example.pdf", {
// you may need to add `.then(m => m.default)` to the end of the import
Expand All @@ -63,7 +63,7 @@ PDFs come in many varieties, which makes reading them a challenge. The loader pa
if you are seeing excessive spaces, this may not be the desired behavior. In that case, you can override the separator with an empty string like this:

```typescript
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";

const loader = new PDFLoader("src/document_loaders/example_data/example.pdf", {
parsedItemSeparator: "",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ npm install pdf-parse
## Usage, one document per page

```typescript
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";

const loader = new PDFLoader("src/document_loaders/example_data/example.pdf");

Expand All @@ -21,7 +21,7 @@ const docs = await loader.load();
## Usage, one document per file

```typescript
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";

const loader = new PDFLoader("src/document_loaders/example_data/example.pdf", {
splitPages: false,
Expand All @@ -41,7 +41,7 @@ npm install pdfjs-dist
```

```typescript
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";

const loader = new PDFLoader("src/document_loaders/example_data/example.pdf", {
// you may need to add `.then(m => m.default)` to the end of the import
Expand All @@ -55,7 +55,7 @@ PDFs come in many varieties, which makes reading them a challenge. The loader pa
if you are seeing excessive spaces, this may not be the desired behavior. In that case, you can override the separator with an empty string like this:

```typescript
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";

const loader = new PDFLoader("src/document_loaders/example_data/example.pdf", {
parsedItemSeparator: "",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ npm install pdfjs-dist
```

```typescript
import { WebPDFLoader } from "langchain/document_loaders/web/pdf";
import { WebPDFLoader } from "@langchain/community/document_loaders/web/pdf";

const blob = new Blob(); // e.g. from a file input

Expand All @@ -43,7 +43,7 @@ PDFs come in many varieties, which makes reading them a challenge. The loader pa
if you are seeing excessive spaces, this may not be the desired behavior. In that case, you can override the separator with an empty string like this:

```typescript
import { WebPDFLoader } from "langchain/document_loaders/web/pdf";
import { WebPDFLoader } from "@langchain/community/document_loaders/web/pdf";

const blob = new Blob(); // e.g. from a file input

Expand Down
1 change: 1 addition & 0 deletions docs/core_docs/docs/tutorials/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ New to LangChain or to LLM app development in general? Read this material to qui
- [Build a Query Analysis System](/docs/tutorials/query_analysis)
- [Build a local RAG application](/docs/tutorials/local_rag)
- [Build a Question Answering application over a Graph Database](/docs/tutorials/graph)
- [Build a PDF ingestion and Question/Answering system](/docs/tutorials/pdf_qa/)

### Specialized tasks

Expand Down
Loading

0 comments on commit e1c2856

Please sign in to comment.