Outline

Reuters Data setup

Scala to the rescue

$ scala -cp "lucene*.jar"
scala> import org.apache.lucene.benchmark.utils.ExtractReuters
import org.apache.lucene.benchmark.utils.ExtractReuters

scala> val reutersIn = "/home/saleem/projects/mine/ruby-ml/rubyml/clustering/reuters/reuters-sgm"
scala> val reutersOut = "/home/saleem/projects/mine/ruby-ml/rubyml/clustering/reuters/reuters-out"
scala> ExtractReuters.main(Array(reutersIn, reutersOut))
Deleting all files in /home/saleem/projects/mine/ruby-ml/rubyml/clustering/reuters/reuters-out-tmp

Recommendation via clustering

"You might also like": How would we go about that? To clarify the idea, let's look at a naive solution:

Split the current post title into its individual words
Get all other posts
Sort all other posts by those with the most words in their body in common with our title

Processing images

i = Magick::Image.read(filename).first
  out_filename = File.join(cropped_dir, File.split(filename).last)
 i.quantize(2, Magick::GRAYColorspace) \
.contrast(sharpen=true) \
.negate(grayscale=true) \
.enhance \
.adaptive_blur(radius=0.0, sigma=1.0) \
.despeckle \
.resize_to_fill(28, 28) \
.write(out_filename)

Classification

Clustering

Recommendation

Ruby Toolbox: Recommendation Engines

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notes.md

notes.md

Outline

Reuters Data setup

Classification

Clustering

Recommendation

Files

notes.md

Latest commit

History

notes.md

File metadata and controls

Outline

Reuters Data setup

Classification

Clustering

Recommendation