Parallel document identification

A simple hapax-based method to identify parallel documents

Report

The report (in french) can be found here.

The data used was Wikipédia articles in french and english.
But I was not allowed to publish it here (obviously).

Anyway it was too big.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
project		project
src/main/scala		src/main/scala
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
report.pdf		report.pdf