hask-sim-str

Finds the 2 "most similar" strings given a list of strings.

Reads standard in, expecting a list with each line being a single string for comparison. e.g.

  Foo bar
  Foos bars
  Dogs
  Cats

Outputs the two most similar strings found, one per line.

Foo bar
Foos bars

Usage

Build using:

stack build

Then run using:

stack exec hask-sim-str-exe

Test using:

stack test

Logic

This will cycle through each pair of strings => O((n² + n) / 2) (i.e. nth triangular number) and calculate the levenshtein distance => O(n₁n₂).

TODO: for a "good enough" match, sort collection based on string length and batch them. In a large dataset, it's unlikely the shortest string will be the closest match for longest string so no point comparing them. Then take best match from each batch. This would miss matches on the boundaries between batches but would speed up the comparison.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
src		src
test		test
.gitignore		.gitignore
ChangeLog.md		ChangeLog.md
README.md		README.md
Setup.hs		Setup.hs
cabal.config		cabal.config
package.yaml		package.yaml
stack.yaml		stack.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hask-sim-str

Usage

Logic

About

Releases

Packages

Languages

eraf2135/hask-sim-str

Folders and files

Latest commit

History

Repository files navigation

hask-sim-str

Usage

Logic

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages