Around 50 files have broken encodings on french. #12

maxencealluin · 2019-09-16T08:52:37Z

Hello,

For those of you that intend to use french documents in this corpus, know that on the 2647 french books included 49 have broken encoding and all accent letters are removed.
A quick way to find the culprits is to look the book for the letter 'é'.

DBarthe · 2020-02-02T22:20:19Z

Yes this is annoying. Seems that it's possible to fix it by replacing :

c282 => é
c28a => è
c288 => ê
c285 => à
c287 => ç
c293 => ô
c28c => î
c296 => û
...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Around 50 files have broken encodings on french. #12

Around 50 files have broken encodings on french. #12

maxencealluin commented Sep 16, 2019

DBarthe commented Feb 2, 2020

Around 50 files have broken encodings on french. #12

Around 50 files have broken encodings on french. #12

Comments

maxencealluin commented Sep 16, 2019

DBarthe commented Feb 2, 2020