Parallel document identification A simple hapax-based method to identify parallel documents Report The report (in french) can be found here. Where is the data? The data used was Wikipédia articles in french and english. But I was not allowed to publish it here (obviously). Anyway it was too big.