Lower performance in alignment compared to another preprocessing script. #5

haorannlp · 2021-06-28T15:13:04Z

Hi Sanxing, thank you for sharing this script!

I run your preprocess.py (clean empty lines; I did not run the whole prepare.sh) and then use fast_align to learn an alignment model on the parallel corpus.
I found that the perplexity of alignmens given by the alignment model is higher than the results of the parallel corpus preprocessed by another script wmt.py.
I guess this is due to that they merge the blank lines.
So could you possibly add this merge blank lines function into your script in the future? Thanks a lot!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lower performance in alignment compared to another preprocessing script. #5

Lower performance in alignment compared to another preprocessing script. #5

haorannlp commented Jun 28, 2021

Lower performance in alignment compared to another preprocessing script. #5

Lower performance in alignment compared to another preprocessing script. #5

Comments

haorannlp commented Jun 28, 2021