Group similar items together.
Runtime complexity is O(N^2 * (M + α(N)))
, where N
is the number of elements in items
, M
is the runtime complexity of the similarityFunction
, and α(N)
is the inverse Ackermann function (amortized constant time for all practical purposes).
Space complexity is O(N)
.
npm i group-similar
Group similar strings
import { groupSimilar } from "group-similar";
import { distance } from "fastest-levenshtein";
function levenshteinSimilarityFunction(a: string, b: string): number {
return a.length === 0 && b.length === 0
? 1
: 1 - distance(a, b) / Math.max(a.length, b.length);
}
groupSimilar({
items: ["cat", "bat", "kitten", "dog", "sitting"],
mapper: (i) => i,
similarityFunction: levenshteinSimilarityFunction,
similarityThreshold: 0.5,
});
// [ [ 'cat', 'bat' ], [ 'kitten', 'sitting' ], [ 'dog' ] ]
Group similar numbers
import { groupSimilar } from "group-similar";
function evenOddSimilarityFunction(a: number, b: number): number {
return Number(a % 2 === b % 2);
}
groupSimilar({
items: [1, 5, 10, 0, 2, 123],
mapper: (i) => i,
similarityFunction: evenOddSimilarityFunction,
similarityThreshold: 1,
});
// [ [ 1, 5, 123 ], [ 10, 0, 2 ] ]
Group similar objects
import { groupSimilar } from "group-similar";
import { distance } from "fastest-levenshtein";
function nestedMapper(object: { a: { b: { value: string } } }): string {
return object.a.b.value;
}
function levenshteinSimilarityFunction(a: string, b: string): number {
return a.length === 0 && b.length === 0
? 1
: 1 - distance(a, b) / Math.max(a.length, b.length);
}
groupSimilar({
items: [
{ a: { b: { value: "sitting" } } },
{ a: { b: { value: "dog" } } },
{ a: { b: { value: "kitten" } } },
{ a: { b: { value: "bat" } } },
{ a: { b: { value: "cat" } } },
],
mapper: nestedMapper,
similarityFunction: levenshteinSimilarityFunction,
similarityThreshold: 0.5,
});
// [
// [{ a: { b: { value: "sitting" } } }, { a: { b: { value: "kitten" } } }],
// [{ a: { b: { value: "dog" } } }],
// [{ a: { b: { value: "bat" } } }, { a: { b: { value: "cat" } } }],
// ]
groupSimilar(options);
groupSimilar({ items, mapper, similarityFunction, similarityThreshold });
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
options | Object |
Yes | none | Arguments to pass to the function |
Property | Type | Required | Default | Description |
---|---|---|---|---|
items | T[] |
Yes | none | Array of items to group |
mapper | (t: T) => K |
Yes | none | Function to apply to each element in items prior to measuring similarity |
similarityFunction | (a: K, b: K) => number |
Yes | none | Function to measure similarity between mapped items |
similarityThreshold | number |
Yes | none | Threshold at which items whose similarity value is greater than or equal to it are grouped together |
The return value is a new nested array of type T[][]
containing elements of items
grouped by similarity. If there are no elements in items
, an empty array will be returned.
Benchmark test results where N
is the number of items being grouped, higher ops/sec
is better.
Library | N=16 | N=32 | N=64 | N=128 | N=256 | N=512 | N=1024 | N=2048 |
---|---|---|---|---|---|---|---|---|
group-similar | 86867 | 17538 | 6067 | 1594 | 444 | 171 | 75 | 27 |
set-clustering | 28506 | 6258 | 1831 | 455 | 121 | 30 | 6 | 1 |
Benchmark configuration details can be found here.