Cloud Data Connect (#2)

* Add function for cloud connect * Add API for CSV MatchReport * Add custom user-agent to API request headers * Add repository url to package.json * Update publish script * Update npmignore * Update package version * Update JSDoc * Update imports/exports * Update README.md * CSV & TSC MatchKey reports - Add support for TSV. - Add request validation. - Add support for 'responseFormat' of json, html, or text. - Add unit tests. * CloudDatabaseMatchKeyReports - Implement CloudDatabaseMatchKeyReport API - Request validation. - Rename "CSV..." classes to "DelimitedFile" to make it more clear it supports both CSV and TSV - Refactoring. --------- Co-authored-by: Interzoid <[email protected]>
interzoid · Oct 16, 2023 · 437c750 · 437c750
1 parent 3ca94fb
commit 437c750
Show file tree

Hide file tree

Showing 53 changed files with 3,063 additions and 730 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,8 @@
 # Interzoid Data Matching Node.js SDK
 
-This is a Node.js SDK for Interzoid's Generative-AI powered data matching, data quality, data cleansing, and data normalization for organization and individual name data. Functions include the generation of similarity keys for identifying and matching inconsistent name data, as well as comparing and scoring data for matching purposes.
+**Version: 1.1.0**
+
+This is a Node.js SDK for Interzoid's Generative-AI powered data matching, data quality, data cleansing, and data normalization for organization and individual name data. Functions include the generation of similarity keys (also called match keys) for identifying and matching inconsistent name data, as well as comparing and scoring data for matching purposes. The concept is that the same similarity key will be algorithmically generated for different permutations of the same data content, such as GE, Gen Elec, General Electric all generating the same similarity key. Then, these similarity keys can be used as the basis of matching data, identifying duplicates, and resolving inconsistencies that can otherwise degrade the usefulness and value of data-driven applications, processes, or anything else that makes use of data. These similarity keys form the basis of many of the different functions available in the SDK that make use of Generative AI, Machine Learning, specialized algorithms, and extensive knowledge bases - all in the Cloud - to provide its results. These include functions that generate similarity keys for custom use, functions that score matches for certain use cases, and functions that process and perform matching functions with entire database tables and datasets.
 
 #### Table of Contents
 1. [API Key](#api-key)
@@ -13,8 +15,16 @@ This is a Node.js SDK for Interzoid's Generative-AI powered data matching, data
    2. [Match Score Functions](#match-score-functions)
       1. [Full Name Match Score](#full-name-match-score)
       2. [Organization Name Match Score](#organization-name-match-score)
-   3. [Interzoid Account Information (Remaining Credits)](#account-information)
-
+4. [Interzoid Cloud Data Connect](#cloud-data-connect)
+   1. [Introduction](#introduction)
+   2. [Matching Process](#matching-process)
+   3. [Sources](#source)
+   4. [Processing Categories](#category)
+   5. [Connection Strings](#connection-strings)
+   6. [Match and write keys to a new cloud database table](#match-and-write-results-to-a-new-table)
+   7. [Match Key Report for a cloud database table](#match-key-report-for-a-cloud-database-table)
+   8. [Text File Match Key Report](#text-file-match-key-report)
+5. [Interzoid Account Information (Remaining Credits)](#account-information)
 --- 
 
 ## API Key
@@ -33,6 +43,7 @@ npm install @interzoid/data-matching
 ---
 
 ## Data Matching APIs
+
 Interzoid uses algorithmically generated similarity keys leveraging Generative AI, Large Language Models (LLMs), Machine Learning, specialized algorithms, and extensive knowledge bases to intelligently match data within or across data sources. Match rates can increase significantly when similarity keys are used with important data.
 
 To learn more about the technology behind these APIs and to better understand how to make use of similarity keys, please visit https://docs.interzoid.com/entries/understanding-data-matching
@@ -43,7 +54,7 @@ To learn more about the technology behind these APIs and to better understand ho
 This API provides a hashed similarity key from the input data used to match with other similar full name data. Use the generated similarity key, rather than the actual data itself, to match and/or sort individual name data by similarity as similar individual names will generate the same similarity key. This avoids the problems of data inconsistency, misspellings, and name variations when matching within a single dataset, and can also help matching across datasets or for more advanced searching. 
 
 ```typescript
-import { getFullNameMatchKey } from 'interzoid';
+import { getFullNameMatchKey } from '@interzoid/data-matching';
 
 async function fullNameMatch() {
     const result = await getFullNameMatchKey({apiKey: 'your-interzoid-api-key', fullName: 'John Smith'});
@@ -75,7 +86,7 @@ The optional `algorithm` parameter provides multiple matching algorithms:
 - The default value for the optional `algorithm` parameter is `wide`. 
 
 ```typescript
-import { getCompanyNameMatchKey } from 'interzoid';
+import { getCompanyNameMatchKey } from '@interzoid/data-matching';
 
 async function companyNameMatch() {
     const result = await getCompanyNameMatchKey({apiKey: 'your-interzoid-api-key', company: 'Microsoft', algorithm: 'medium'});
@@ -94,15 +105,16 @@ async function companyNameMatch() {
 ---
 
 #### Address Match Key
+
 This API provides a hashed similarity key from the input data used to match with other similar address data. Use the generated similarity key, rather than the actual data itself, to match and/or sort address data by similarity, as similar addresses will generate the same similarity key. This avoids the problems of data inconsistency, misspellings, and address element variations when matching either withing a single dataset, or across datasets. It also provides for broader searching capabilities.
 
-You can choose from two matching algorithms, `wide` and `narrow`. 
+You can choose from two matching algorithms, `wide` and `narrow`.
 - `narrow` considers a unit number (suite, apartment, unit, etc.) when generating similarity keys. This ensures individual units are identified separately when comparing generated keys.
 - `wide` parameter will not consider the unit numbers, generating matching similarity keys based on the primary address only.
 - The default value for the optional `algorithm` parameter is `narrow`. 
 
 ```typescript
-import { getAddressMatchKey } from 'interzoid';
+import { getAddressMatchKey } from '@interzoid/data-matching';
 
 async function addressMatch() {
   const result = await getAddressMatchKey({apiKey: 'your-interzoid-api-key', address: '500 main street', algorithm: 'narrow'});
@@ -130,7 +142,7 @@ We provide two operations for match scoring: Organization name and Full name. Th
 This API provides a match score (likelihood of matching) between two individual names on a scale of 0-100, where 100 is the highest possible match.
 
 ```typescript
-import { getFullNameMatchScore } from 'interzoid';
+import { getFullNameMatchScore } from '@interzoid/data-matching';
 
 async function fullNameMatchScore() {
   const result = await getFullNameMatchScore({apiKey: 'your-interzoid-api-key', value1: 'John Smith', value2: 'John Smyth'});
@@ -150,10 +162,10 @@ async function fullNameMatchScore() {
 ---
 
 #### Organization Name Match Score
-This API provides a match score (likelihood of matching) from 0-100 between two organization names.
+This API provides a match score (likelihood of matching) ranging from 0 to 100 between two organization names.
 
 ```typescript
-import { getOrganizationMatchScore } from 'interzoid';
+import { getOrganizationMatchScore } from '@interzoid/data-matching';
 
 async function organizationNameMatchScore() {
   const result = await getOrganizationNameMatchScore({apiKey: 'your-interzoid-api-key', value1: 'Apple', value2: 'Apple Inc.'});
@@ -172,26 +184,233 @@ async function organizationNameMatchScore() {
 
 ---
 
-#### Account Information
+## Cloud Data Connect
+
+### Introduction
+
+Interzoid's Cloud Data Connect is a set of functions that allow you to match data in your cloud database or delimited text file such as CSV and TSV with Interzoid's data matching algorithms.
+
+
+### Matching Process
+
+The `process` parameter determines the type of matching process to run. The package provides an `enum` called [`Process`](src/interfaces/Process.ts) that contains the available options.
+
+| Process                | Description                                                                                                                                              |
+|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `Process.MATCH_REPORT` | Generate a report of all found clusters of similar data that share the same generated similarity key.                                                                                                 |
+| `Process.CREATE_TABLE` | Creates a new table in the source database with all the similarity keys for each record in the source table, so they can be used for additional queries. |
+| `Process.GEN_SQL`      | Generate the SQL INSERT statements to store the similarity keys in a database for ability to review before execution.                                                                          |
+| `Process.KEYS_ONLY`    | Output a generated similarity key for every record in the dataset.                                                                                       |
+
+
+### Source
+
+The `source` parameter determines the type of data source containing the data you are performing matching functions with. The package provides an `enum` called [`Source`](src/interfaces/Source.ts) that contains the available options. Some commonly used examples are:
+
+| Source              | Description                          |
+|---------------------|--------------------------------------|
+| `Source.MYSQL`      | Match data in a MySQL database.      |
+| `Source.POSTGRES`   | Match data in a PostgreSQL database. |
+| `Source.MARIADB`    | Match data in a MariaDB database.    |
+| `Source.DATABRICKS` | Match data in a Databricks table.    |
+| `Source.CSV`        | Match data in a CSV file.            |
+
+Please see the [source code](src/interfaces/Source.ts) for a complete list of available options.
+
+
+### Category
+
+The `category` parameter determines the type of data you're matching. The package provides an `enum` called [`Category`](src/interfaces/Category.ts) that contains the available options.
+
+| Category              | Description             |
+|-----------------------|-------------------------|
+| `Category.COMPANY`    | Match company names.    |
+| `Category.INDIVIDUAL` | Match individual names. |
+| `Category.ADDRESS`    | Match addresses.        |
+
+### Connection Strings
+
+The `connection` parameter is a connection string for your database. The format of the connection string depends on the database you're connecting to. 
+
+Please see [this page](https://connect.interzoid.com/connection-strings) for examples of connection strings for various databases.
+
+### Match and write results to a new table
+
+Set the `process` parameter to `CREATE_TABLE` to create a new table in your database with the match keys. The `newTable` parameter is the name of the new table to create. This table will be created by the process, and will contain the original data and the similarity key. 
+
+**Do not create the table manually; the process will handle the creation.**
+
+You'll have to grant the user you're connecting with the ability to create a new table in the database in addition to the ability to read from the table you're matching.
+
+```typescript
+import { getCloudDatabaseMatchKeyReport, Process, Category, Source } from '@interzoid/data-matching';
+
+async function databaseMatchKeyReport() {
+   const result = await getCloudDatabaseMatchKeyReport({
+      apiKey: 'your-interzoid-api-key',
+      process: Process.CREATE_TABLE,
+      category: Category.COMPANY,
+      source: Source.MYSQL,
+      connection: 'db_user:db_password@tcp(db_host)/database',
+      table: 'companies',                 // table to match
+      column: 'companyname',              // column to match
+      reference: 'id',                    // optional reference column
+      newTable: 'companies_match_keys'    // new table to create
+   });
+   console.log(result);
+}
+```
+
+#### Response
+```
+"Creating new table...Table companies_match_keys created successfully."
+```
+
+---
+
+### Match Key Report for a cloud database table
+
+#### Response options
+
+* Set `json` to `true` to return a JSON object with arrays of match clusters.
+* Set `html` to `true` to return results in plain text with clusters separated by html `<br>` tags.
+* Don't set either to return results in plain text with clusters separated by newlines.
+
+```typescript
+import { getCloudDatabaseMatchKeyReport, Source, Process, Category } from '@interzoid/data-matching';
+
+async function databaseMatchKeyReport() {
+   const result = await getCloudDatabaseMatchKeyReport({
+      apiKey: 'your-interzoid-api-key',
+      process: Process.MATCH_REPORT,
+      category: Category.COMPANY,
+      source: Source.MYSQL,
+      connection: 'db_user:db_password@tcp(db_host)/database',
+      table: 'companies',
+      column: 'companyname',
+      reference: 'id',
+      json: true,
+   });
+   console.log(JSON.stringify(result, null, 2));
+}
+```
+
+#### Sample Response
+
+```json
+{
+  "Status": "success",
+  "Message": "",
+  "MatchClusters": [
+    [
+      {
+        "Data": "Cisco",
+        "Reference": "",
+        "SimKey": "3AmCGk2yvEJ7XUxUmB3dFHxRiVzy4Squ89J-4_lDrxQ"
+      },
+      {
+        "Data": "Cisco Systems",
+        "Reference": "30",
+        "SimKey": "3AmCGk2yvEJ7XUxUmB3dFHxRiVzy4Squ89J-4_lDrxQ"
+      }
+    ],
+    [
+      {
+        "Data": "Netflix",
+        "Reference": "15",
+        "SimKey": "8c6BY0KP9MYiDezQaKL3bH3iHfDU2wCMMTD9v0EeZJ8"
+      },
+      {
+        "Data": "\"Netflix, Inc.\"",
+        "Reference": "34",
+        "SimKey": "8c6BY0KP9MYiDezQaKL3bH3iHfDU2wCMMTD9v0EeZJ8"
+      }
+    ]
+  ]
+ }
+```
+
+---
+
+### Text File Match Key Report
+
+Provide a URL to a delimited file (CSV or TSV) and the API will return a match key report for the data in the file.
+
+```typescript
+import { getDelimitedFileMatchKeyReport, Process, Source, Category } from '@interzoid/data-matching';
+
+async function csvFileMatchReport() {
+   const result = await getDelimitedFileMatchKeyReport({
+      apiKey: 'your-interzoid-api-key',
+      process: Process.MATCH_REPORT,
+      category: Category.COMPANY,
+      source: Source.CSV,
+      table: Source.CSV,
+      connection: 'https://dl.interzoid.com/csv/companies.csv',
+      column: '1',          // column number to match
+      json: true,
+   });
+   console.log(JSON.stringify(result, null, 2));
+}
+
+```
+
+#### Result
+
+```json
+{
+  "Status": "success",
+  "Message": "",
+  "MatchClusters": [
+    [
+      {
+        "Data": "Good Year Tire & Rubber",
+        "Reference": "",
+        "SimKey": "140xAiUxvDysV56LZzogzDwLuYLd2U7E5sVAXd1nKd8"
+      },
+      {
+        "Data": "Goodyear Tire Inc",
+        "Reference": "Transportaions",
+        "SimKey": "140xAiUxvDysV56LZzogzDwLuYLd2U7E5sVAXd1nKd8"
+      }
+    ],
+    [
+      {
+        "Data": "Pederson Tooling Inc.",
+        "Reference": "Transportaions",
+        "SimKey": "7oOMieCdoyxjt7_oKbE2xGngnZGdG75CFU5pEfhU5z8"
+      },
+      {
+        "Data": "Peterson Tools",
+        "Reference": "Services",
+        "SimKey": "7oOMieCdoyxjt7_oKbE2xGngnZGdG75CFU5pEfhU5z8"
+      }
+    ]
+  ]
+}
+```
+
+---
+
+## Account Information
 
 This API retrieves the current amount of remaining purchased (or trial) credits for a license key.
 
 Using this function does **not** deduct credits from your account.
 
 ```typescript
-import { getRemainingCredits } from 'interzoid';
+import { getRemainingCredits } from '@interzoid/data-matching';
 
 async function remainingCredits() {
   const result = getRemainingCredits({apiKey: 'your-interzoid-api-key'});
     console.log(result);
 }
 ```
 
-##### Result
+#### Result
 ```json
 {
   "credits": "9998",
   "code": "Success"
 }
 ```
-
diff --git a/docs/assets/highlight.css b/docs/assets/highlight.css
@@ -15,6 +15,12 @@
     --dark-hl-6: #4FC1FF;
     --light-hl-7: #0451A5;
     --dark-hl-7: #9CDCFE;
+    --light-hl-8: #008000;
+    --dark-hl-8: #6A9955;
+    --light-hl-9: #098658;
+    --dark-hl-9: #B5CEA8;
+    --light-hl-10: #EE0000;
+    --dark-hl-10: #D7BA7D;
     --light-code-background: #FFFFFF;
     --dark-code-background: #1E1E1E;
 }
@@ -28,6 +34,9 @@
     --hl-5: var(--light-hl-5);
     --hl-6: var(--light-hl-6);
     --hl-7: var(--light-hl-7);
+    --hl-8: var(--light-hl-8);
+    --hl-9: var(--light-hl-9);
+    --hl-10: var(--light-hl-10);
     --code-background: var(--light-code-background);
 } }
 
@@ -40,6 +49,9 @@
     --hl-5: var(--dark-hl-5);
     --hl-6: var(--dark-hl-6);
     --hl-7: var(--dark-hl-7);
+    --hl-8: var(--dark-hl-8);
+    --hl-9: var(--dark-hl-9);
+    --hl-10: var(--dark-hl-10);
     --code-background: var(--dark-code-background);
 } }
 
@@ -52,6 +64,9 @@
     --hl-5: var(--light-hl-5);
     --hl-6: var(--light-hl-6);
     --hl-7: var(--light-hl-7);
+    --hl-8: var(--light-hl-8);
+    --hl-9: var(--light-hl-9);
+    --hl-10: var(--light-hl-10);
     --code-background: var(--light-code-background);
 }
 
@@ -64,6 +79,9 @@
     --hl-5: var(--dark-hl-5);
     --hl-6: var(--dark-hl-6);
     --hl-7: var(--dark-hl-7);
+    --hl-8: var(--dark-hl-8);
+    --hl-9: var(--dark-hl-9);
+    --hl-10: var(--dark-hl-10);
     --code-background: var(--dark-code-background);
 }
 
@@ -75,4 +93,7 @@
 .hl-5 { color: var(--hl-5); }
 .hl-6 { color: var(--hl-6); }
 .hl-7 { color: var(--hl-7); }
+.hl-8 { color: var(--hl-8); }
+.hl-9 { color: var(--hl-9); }
+.hl-10 { color: var(--hl-10); }
 pre, code { background: var(--code-background); }
diff --git a/docs/assets/navigation.js b/docs/assets/navigation.js