From bb1b49929b6f2f477df5b2b8b472620160036825 Mon Sep 17 00:00:00 2001 From: Fibi Date: Thu, 26 Dec 2024 04:11:58 +0100 Subject: [PATCH] better documentation --- .../document_loaders/file_loaders/docx.mdx | 47 ++++++++++++------- 1 file changed, 30 insertions(+), 17 deletions(-) diff --git a/docs/core_docs/docs/integrations/document_loaders/file_loaders/docx.mdx b/docs/core_docs/docs/integrations/document_loaders/file_loaders/docx.mdx index 86675b3da726..8e46cde7a1b8 100644 --- a/docs/core_docs/docs/integrations/document_loaders/file_loaders/docx.mdx +++ b/docs/core_docs/docs/integrations/document_loaders/file_loaders/docx.mdx @@ -4,17 +4,38 @@ hide_table_of_contents: true # Docx files -This example goes over how to load data from docx files. +The `DocxLoader` allows you to extract text data from Microsoft Word documents. It supports both the modern `.docx` format and the legacy `.doc` format. Depending on the file type, additional dependencies are required. -# Setup +--- + +## Setup + +To use `DocxLoader`, you'll need the `@langchain/community` integration along with either `mammoth` or `word-extractor` package: + +- **`mammoth`**: For processing `.docx` files. +- **`word-extractor`**: For handling `.doc` files. + +### Installation + +#### For `.docx` Files ```bash npm2yarn npm install @langchain/community @langchain/core mammoth ``` -# Usage +#### For `.doc` Files + +```bash npm2yarn +npm install @langchain/community @langchain/core word-extractor +``` + +## Usage -```typescript +### Loading `.docx` Files + +For `.docx` files, there is no need to explicitly specify any parameters when initializing the loader: + +```javascript import { DocxLoader } from "@langchain/community/document_loaders/fs/docx"; const loader = new DocxLoader( @@ -24,27 +45,19 @@ const loader = new DocxLoader( const docs = await loader.load(); ``` +### Loading `.doc` Files -# Doc files - -This example goes over how to load data from doc files. - -# Setup - -```bash npm2yarn -npm install @langchain/community @langchain/core word-extractor -``` +For `.doc` files, you must explicitly specify the `type` as `doc` when initializing the loader: -# Usage - -```typescript +```javascript import { DocxLoader } from "@langchain/community/document_loaders/fs/docx"; const loader = new DocxLoader( - "src/document_loaders/tests/example_data/attentio.docx", + "src/document_loaders/tests/example_data/attention.doc", { type: "doc", } ); const docs = await loader.load(); +```