Skip to content

Commit

Permalink
Update the README to include a section on the phonetic algorithms
Browse files Browse the repository at this point in the history
  • Loading branch information
MrPowers committed Sep 12, 2017
1 parent 542457e commit 9dfae11
Showing 1 changed file with 47 additions and 2 deletions.
49 changes: 47 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Making similarity functions and phonetic algorithms readily available for fuzzy

## Project Setup

Update your `build.sbt` files to import the libraries.
Update your `build.sbt` file to import the libraries.

```
libraryDependencies += "org.apache.commons" % "commons-text" % "1.1"
Expand All @@ -19,7 +19,7 @@ libraryDependencies += "mrpowers" % "spark-stringmetric" % "2.2.0_0.1.0"
* `jaccard_similarity`
* `jaro_winkler`

Import the functions.
How to import the functions.

```scala
import com.github.mrpowers.spark.stringmetric.SimilarityFunctions._
Expand Down Expand Up @@ -63,3 +63,48 @@ We can run `actualDF.show()` to view the `w1_w2_jaccard` column that's been appe
```

## PhoneticAlgorithms

* `double_metaphone`
* `nysiis`
* `refined_soundex`

How to import the functions.

```scala
import com.github.mrpowers.spark.stringmetric.PhoneticAlgorithms._
```

Here's an example on how to use the `refined_soundex` function.

Suppose we have the following `sourceDF`:

```
+-----+
|word1|
+-----+
|night|
| cat|
| null|
+-----+
```

Let's run the `refined_soundex` function.

```scala
val actualDF = sourceDF.withColumn(
"word1_refined_soundex",
refined_soundex(col("word1"))
)
```

We can run `actualDF.show()` to view the `word1_refined_soundex` column that's been appended to the DataFrame.

```
+-----+---------------------+
|word1|word1_refined_soundex|
+-----+---------------------+
|night| N80406|
| cat| C306|
| null| null|
+-----+---------------------+
```

2 comments on commit 9dfae11

@pabinvaz
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi
I am new for pyspark, How can i use this function in pyspark

@MrPowers
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pabinvaz - No, we'd unfortunately have to create another library to get these functions to work with PySpark... That's not a bad idea for a library to create. Maybe I'll do that as my next side project 🤓

Please sign in to comment.