Skip to content

Commit

Permalink
added bs4 import in the split_file_function and removed it from top l…
Browse files Browse the repository at this point in the history
…evel. The class can be imported without bs4 depedency and once the split function is called, the bs4 will be imported. This makes bs4 optional depedency.
  • Loading branch information
AhmedTammaa committed Dec 19, 2024
1 parent c2107b1 commit 4261885
Showing 1 changed file with 9 additions and 3 deletions.
12 changes: 9 additions & 3 deletions libs/text-splitters/langchain_text_splitters/html.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,8 @@
)

import requests
from bs4 import BeautifulSoup
from langchain.docstore.document import Document as DocstoreDocument
from langchain_core._api import beta
from langchain_core.documents import BaseDocumentTransformer, Document as CoreDocument
from langchain_core.documents import BaseDocumentTransformer, Document

from langchain_text_splitters.character import RecursiveCharacterTextSplitter

Expand Down Expand Up @@ -297,6 +295,14 @@ def split_text_from_file(self, file: Any) -> List[Document]:
Returns:
A list of split Document objects.
"""
try:
from bs4 import BeautifulSoup # type: ignore[import-untyped]
except ImportError as e:
raise ImportError(
"Unable to import BeautifulSoup/PageElement, \
please install with `pip install \
bs4`."
) from e
if isinstance(file, str):
with open(file, 'r', encoding='utf-8') as f:
html_content = f.read()
Expand Down

0 comments on commit 4261885

Please sign in to comment.