diff --git a/docs/docs/guides/privacy/presidio_data_anonymization/index.ipynb b/docs/docs/guides/privacy/presidio_data_anonymization/index.ipynb index 3021ad2c5864e..1d64d64ec55e0 100644 --- a/docs/docs/guides/privacy/presidio_data_anonymization/index.ipynb +++ b/docs/docs/guides/privacy/presidio_data_anonymization/index.ipynb @@ -8,6 +8,8 @@ "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/guides/privacy/presidio_data_anonymization/index.ipynb)\n", "\n", + ">[Presidio](https://microsoft.github.io/presidio/) (Origin from Latin praesidium ‘protection, garrison’) helps to ensure sensitive data is properly managed and governed. It provides fast identification and anonymization modules for private entities in text and images such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, financial data and more.\n", + "\n", "## Use case\n", "\n", "Data anonymization is crucial before passing information to a language model like GPT-4 because it helps protect privacy and maintain confidentiality. If data is not anonymized, sensitive information such as names, addresses, contact numbers, or other identifiers linked to specific individuals could potentially be learned and misused. Hence, by obscuring or removing this personally identifiable information (PII), data can be used freely without compromising individuals' privacy rights or breaching data protection laws and regulations.\n", @@ -530,7 +532,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.4" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/platforms/microsoft.mdx b/docs/docs/integrations/platforms/microsoft.mdx index 5a91175962e8b..b528301f6d371 100644 --- a/docs/docs/integrations/platforms/microsoft.mdx +++ b/docs/docs/integrations/platforms/microsoft.mdx @@ -151,6 +151,20 @@ See a [usage example](/docs/integrations/document_loaders/microsoft_powerpoint). from langchain.document_loaders import UnstructuredPowerPointLoader ``` +### Microsoft OneNote + +First, let's install dependencies: + +```bash +pip install bs4 msal +``` + +See a [usage example](/docs/integrations/document_loaders/onenote). + +```python +from langchain.document_loaders.onenote import OneNoteLoader +``` + ## Vector stores @@ -259,4 +273,25 @@ from langchain.agents.agent_toolkits import PowerBIToolkit from langchain.utilities.powerbi import PowerBIDataset ``` +## More + +### Microsoft Presidio + +>[Presidio](https://microsoft.github.io/presidio/) (Origin from Latin praesidium ‘protection, garrison’) +> helps to ensure sensitive data is properly managed and governed. It provides fast identification and +> anonymization modules for private entities in text and images such as credit card numbers, names, +> locations, social security numbers, bitcoin wallets, US phone numbers, financial data and more. + +First, you need to install several python packages and download a `SpaCy` model. + +```bash +pip install langchain-experimental openai presidio-analyzer presidio-anonymizer spacy Faker +python -m spacy download en_core_web_lg +``` + +See [usage examples](/docs/guides/privacy/presidio_data_anonymization/). + +```python +from langchain_experimental.data_anonymizer import PresidioAnonymizer, PresidioReversibleAnonymizer +```