Skip to content
forked from ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

License

Notifications You must be signed in to change notification settings

yogitha2023/docetl

This branch is 2 commits ahead of, 68 commits behind ucbepic/docetl:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

8604fb8 · Nov 14, 2024
Nov 6, 2024
Nov 14, 2024
Nov 13, 2024
Oct 28, 2024
Nov 14, 2024
Nov 14, 2024
Nov 14, 2024
Nov 6, 2024
Oct 4, 2024
Sep 12, 2024
Sep 17, 2024
Nov 5, 2024
Nov 7, 2024
Nov 1, 2024
Oct 28, 2024
Nov 1, 2024
Sep 15, 2024
Sep 15, 2024

Repository files navigation

DocETL: Powering Complex Document Processing Pipelines

Website Documentation Discord Paper

DocETL Figure

DocETL is a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks. It offers a low-code, declarative YAML interface to define LLM-powered operations on complex data.

When to Use DocETL

DocETL is the ideal choice when you're looking to maximize correctness and output quality for complex tasks over a collection of documents or unstructured datasets. You should consider using DocETL if:

  • You want to perform semantic processing on a collection of data
  • You have complex tasks that you want to represent via map-reduce
  • You're unsure how to best express your task to maximize LLM accuracy
  • You're working with long documents that don't fit into a single prompt
  • You have validation criteria and want tasks to automatically retry when validation fails

Community Projects

Educational Resources

Installation

Prerequisites

  • Python 3.10 or later
  • OpenAI API key

Quick Start

  1. Install from PyPI:
pip install docetl

To see examples of how to use DocETL, check out the tutorial.

Running the UI Locally

We offer a simple UI for building pipelines. We recommend building up complex pipelines one operation at a time, so you can see the results of each operation as you go and iterate on your pipeline. To run it locally, follow these steps:

Playground Screenshot

  1. Clone the repository:
git clone https://github.com/ucbepic/docetl.git
cd docetl
  1. Install dependencies:
make install      # Install Python package
make install-ui   # Install UI dependencies
  1. Set up environment variables in .env:
OPENAI_API_KEY=your_api_key_here
BACKEND_ALLOW_ORIGINS=
BACKEND_HOST=localhost
BACKEND_PORT=8000
BACKEND_RELOAD=True
FRONTEND_HOST=0.0.0.0
FRONTEND_PORT=3000
  1. Start the development server:
make run-ui-dev
  1. Visit http://localhost:3000/playground

Development Setup

If you're planning to contribute or modify DocETL, you can verify your setup by running the test suite:

make tests-basic  # Runs basic test suite (costs < $0.01 with OpenAI)

For detailed documentation and tutorials, visit our documentation.

About

A system for agentic LLM-powered data processing and ETL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 60.3%
  • TypeScript 39.3%
  • Other 0.4%