Skip to content

Commit

Permalink
scripts[minor]: Add CLI for document loader integration docs
Browse files Browse the repository at this point in the history
  • Loading branch information
bracesproul committed Aug 1, 2024
1 parent 64cdc86 commit 9fa1c40
Show file tree
Hide file tree
Showing 8 changed files with 685 additions and 14 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,304 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"sidebar_label: CheerioWebBaseLoader\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Cheerio\n",
"\n",
"This notebook provides a quick overview for getting started with [CheerioWebBaseLoader](/docs/integrations/document_loaders/). For detailed documentation of all CheerioWebBaseLoader features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_community_document_loaders_web_cheerio.CheerioWebBaseLoader.html).\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"This example goes over how to load data from webpages using Cheerio. One document will be created for each webpage.\n",
"\n",
"Cheerio is a fast and lightweight library that allows you to parse and traverse HTML documents using a jQuery-like syntax. You can use Cheerio to extract data from web pages, without having to render them in a browser.\n",
"\n",
"However, Cheerio does not simulate a web browser, so it cannot execute JavaScript code on the page. This means that it cannot extract data from dynamic web pages that require JavaScript to render. To do that, you can use the [`PlaywrightWebBaseLoader`](/docs/integrations/document_loaders/web_loaders/web_playwright) or [`PuppeteerWebBaseLoader`](/docs/integrations/document_loaders/web_loaders/web_puppeteer) instead.\n",
"\n",
"| Class | Package | Local | Serializable | PY support|\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [CheerioWebBaseLoader](https://api.js.langchain.com/classes/langchain_community_document_loaders_web_cheerio.CheerioWebBaseLoader.html) | @langchain/community | ✅ | ✅ | ❌ | \n",
"### Loader features\n",
"| Source | Web Support | Node Support\n",
"| :---: | :---: | :---: | \n",
"| CheerioWebBaseLoader | ✅ | ✅ | \n",
"\n",
"## Setup\n",
"\n",
"- TODO: Update with relevant info.\n",
"\n",
"To access `CheerioWebBaseLoader` document loader you'll need to install the `@langchain/community` integration package, along with the `cheerio` peer dependency.\n",
"\n",
"### Credentials\n",
"\n",
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:\n",
"\n",
"```bash\n",
"# export LANGCHAIN_TRACING_V2=\"true\"\n",
"# export LANGCHAIN_API_KEY=\"your-api-key\"\n",
"```\n",
"\n",
"### Installation\n",
"\n",
"The LangChain CheerioWebBaseLoader integration lives in the `@langchain/community` package:\n",
"\n",
"```{=mdx}\n",
"import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n",
"import Npm2Yarn from \"@theme/Npm2Yarn\";\n",
"\n",
"<IntegrationInstallTooltip></IntegrationInstallTooltip>\n",
"\n",
"<Npm2Yarn>\n",
" @langchain/community cheerio\n",
"</Npm2Yarn>\n",
"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"Now we can instantiate our model object and load documents:\n",
"\n",
"- TODO: Update model instantiation with relevant params."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import { CheerioWebBaseLoader } from \"@langchain/community/document_loaders/web/cheerio\"\n",
"\n",
"const loader = new CheerioWebBaseLoader(\"https://news.ycombinator.com/item?id=34817881\", {\n",
" // optional params: ...\n",
"})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Document {\n",
" pageContent: '\\n' +\n",
" ' \\n' +\n",
" ' Hacker News\\n' +\n",
" ' new | past | comments | ask | show | jobs | submit \\n' +\n",
" ' login\\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" '\\n' +\n",
" ' \\n' +\n",
" ' What Lights the Universe’s Standard Candles? (quantamagazine.org)\\n' +\n",
" ' 75 points by Amorymeltzer on Feb 17, 2023 | hide | past | favorite | 6 comments \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' delta_p_delta_x on Feb 17, 2023 \\n' +\n",
" ' | next [–] \\n' +\n",
" ' \\n' +\n",
" \" Astrophysical and cosmological simulations are often insightful. They're also very cross-disciplinary; besides the obvious astrophysics, there's networking and sysadmin, parallel computing and algorithm theory (so that the simulation programs are actually fast but still accurate), systems design, and even a bit of graphic design for the visualisations.Some of my favourite simulation projects:- IllustrisTNG: https://www.tng-project.org/- SWIFT: https://swift.dur.ac.uk/- CO5BOLD: https://www.astro.uu.se/~bf/co5bold_main.html (which produced these animations of a red-giant star: https://www.astro.uu.se/~bf/movie/AGBmovie.html)- AbacusSummit: https://abacussummit.readthedocs.io/en/latest/And I can add the simulations in the article, too.\\n\" +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' froeb on Feb 18, 2023 \\n' +\n",
" ' | parent | next [–] \\n' +\n",
" ' \\n' +\n",
" \" Supernova simulations are especially interesting too. I have heard them described as the only time in physics when all 4 of the fundamental forces are important. The explosion can be quite finicky too. If I remember right, you can't get supernova to explode properly in 1D simulations, only in higher dimensions. This was a mystery until the realization that turbulence is necessary for supernova to trigger--there is no turbulent flow in 1D.\\n\" +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' andrewflnr on Feb 17, 2023 \\n' +\n",
" ' | prev | next [–] \\n' +\n",
" ' \\n' +\n",
" \" Whoa. I didn't know the accretion theory of Ia supernovae was dead, much less that it had been since 2011.\\n\" +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' andreareina on Feb 17, 2023 \\n' +\n",
" ' | prev | next [–] \\n' +\n",
" ' \\n' +\n",
" ' This seems to be the paper https://academic.oup.com/mnras/article/517/4/5260/6779709\\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' andreareina on Feb 17, 2023 \\n' +\n",
" ' | prev [–] \\n' +\n",
" ' \\n' +\n",
" \" Wouldn't double detonation show up as variance in the brightness?\\n\" +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' yencabulator on Feb 18, 2023 \\n' +\n",
" ' | parent [–] \\n' +\n",
" ' \\n' +\n",
" ' Or widening of the peak. If one type Ia supernova goes 1,2,3,2,1, the sum of two could go 1+0=1\\n' +\n",
" ' 2+1=3\\n' +\n",
" ' 3+2=5\\n' +\n",
" ' 2+3=5\\n' +\n",
" ' 1+2=3\\n' +\n",
" ' 0+1=1\\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" ' \\n' +\n",
" '\\n' +\n",
" '\\n' +\n",
" 'Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact\\n' +\n",
" 'Search: \\n' +\n",
" ' \\n' +\n",
" ' \\n',\n",
" metadata: { source: 'https://news.ycombinator.com/item?id=34817881' },\n",
" id: undefined\n",
"}\n"
]
}
],
"source": [
"const docs = await loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{ source: 'https://news.ycombinator.com/item?id=34817881' }\n"
]
}
],
"source": [
"console.log(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Additional configurations\n",
"\n",
"`CheerioWebBaseLoader` supports additional configuration when instantiating the loader. Here is an example of how to use it with the `selector` field passed, making it only load content from the provided HTML class names:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Some of my favourite simulation projects:- IllustrisTNG: https://www.tng-project.org/- SWIFT: https://swift.dur.ac.uk/- CO5BOLD: https://www.astro.uu.se/~bf/co5bold_main.html (which produced these animations of a red-giant star: https://www.astro.uu.se/~bf/movie/AGBmovie.html)- AbacusSummit: https://abacussummit.readthedocs.io/en/latest/And I can add the simulations in the article, too.\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n"
]
}
],
"source": [
"import { CheerioWebBaseLoader } from \"@langchain/community/document_loaders/web/cheerio\"\n",
"\n",
"const loaderWithSelector = new CheerioWebBaseLoader(\"https://news.ycombinator.com/item?id=34817881\", {\n",
" selector: \"p\",\n",
"});\n",
"\n",
"const docsWithSelector = await loaderWithSelector.load();\n",
"docsWithSelector[0].pageContent;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all CheerioWebBaseLoader features and configurations head to the API reference: https://api.js.langchain.com/classes/langchain_community_document_loaders_web_cheerio.CheerioWebBaseLoader.html"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "TypeScript",
"language": "typescript",
"name": "tslab"
},
"language_info": {
"codemirror_mode": {
"mode": "typescript",
"name": "javascript",
"typescript": true
},
"file_extension": ".ts",
"mimetype": "text/typescript",
"name": "typescript",
"version": "3.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
2 changes: 2 additions & 0 deletions libs/langchain-scripts/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
"axios": "^1.6.7",
"commander": "^11.1.0",
"glob": "^10.3.10",
"lodash": "^4.17.21",
"readline": "^1.3.0",
"rimraf": "^5.0.1",
"rollup": "^4.5.2",
Expand All @@ -55,6 +56,7 @@
"@swc/core": "^1.3.90",
"@swc/jest": "^0.2.29",
"@tsconfig/recommended": "^1.0.3",
"@types/lodash": "^4",
"@typescript-eslint/eslint-plugin": "^6.12.0",
"@typescript-eslint/parser": "^6.12.0",
"dotenv": "^16.3.1",
Expand Down
22 changes: 11 additions & 11 deletions libs/langchain-scripts/src/cli/docs/chat.ts
Original file line number Diff line number Diff line change
Expand Up @@ -69,57 +69,57 @@ type ExtraFields = {

async function promptExtraFields(): Promise<ExtraFields> {
const hasToolCalling = await getUserInput(
"Does the tool support tool calling? (y/n) ",
"Does this integration support tool calling? (y/n) ",
undefined,
true
);
const hasJsonMode = await getUserInput(
"Does the tool support JSON mode? (y/n) ",
"Does this integration support JSON mode? (y/n) ",
undefined,
true
);
const hasImageInput = await getUserInput(
"Does the tool support image input? (y/n) ",
"Does this integration support image input? (y/n) ",
undefined,
true
);
const hasAudioInput = await getUserInput(
"Does the tool support audio input? (y/n) ",
"Does this integration support audio input? (y/n) ",
undefined,
true
);
const hasVideoInput = await getUserInput(
"Does the tool support video input? (y/n) ",
"Does this integration support video input? (y/n) ",
undefined,
true
);
const hasTokenLevelStreaming = await getUserInput(
"Does the tool support token level streaming? (y/n) ",
"Does this integration support token level streaming? (y/n) ",
undefined,
true
);
const hasTokenUsage = await getUserInput(
"Does the tool support token usage? (y/n) ",
"Does this integration support token usage? (y/n) ",
undefined,
true
);
const hasLogprobs = await getUserInput(
"Does the tool support logprobs? (y/n) ",
"Does this integration support logprobs? (y/n) ",
undefined,
true
);
const hasLocal = await getUserInput(
"Does the tool support local usage? (y/n) ",
"Does this integration support local usage? (y/n) ",
undefined,
true
);
const hasSerializable = await getUserInput(
"Does the tool support serializable output? (y/n) ",
"Does this integration support serializable output? (y/n) ",
undefined,
true
);
const hasPySupport = await getUserInput(
"Does the tool support Python support? (y/n) ",
"Does this integration have Python support? (y/n) ",
undefined,
true
);
Expand Down
Loading

0 comments on commit 9fa1c40

Please sign in to comment.