Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issue #7

Open
simonkaspersen opened this issue Oct 5, 2016 · 3 comments
Open

Encoding issue #7

simonkaspersen opened this issue Oct 5, 2016 · 3 comments

Comments

@simonkaspersen
Copy link

My feeds are encoded in UTF-8, but when I get it printed in the Command Line the letters ÆØÅ and more are replaced with \xc3\xb8 \xc3\xa5 and more.

Does this happend when you encode it again to UTF-8?

@simonkaspersen
Copy link
Author

0/2 SOURCE>> http://www.nrk.no/nyheter/toppsaker.rss

[0] b'Eksperter om presset H\xc3\xb8gmo: \xe2\x80\x93 Tenker som man gj\xc3\xb8r i et klubblag'
[1] b'12 \xc3\xa5rs forvaring for overgrep mot eget spedbarn'
[2] b'R\xc3\xb8ysta mot eige parti \xe2\x80\x93 blir skvisa ut'
[3] b'Ny rapport om verdenshavene: WWF-Jensen dypt bekymret'
[4] b'St\xc3\xb8re til Erna: \xe2\x80\x93 Velkommen tilbake fra bokseringen'
[5] b'Fikk Stortinget til \xc3\xa5 le'
[6] b'Si nei til voksne som vil ta en Skamtale'
[7] b'Ruud fekk bank'
[8] b'Turg\xc3\xa5ar fall om \xe2\x80\x93 mangla mobildekning'
[9] b'S\xc3\xa5 vidt over 2.500 asyls\xc3\xb8kere hittil i \xc3\xa5r'
[10] b'Fikk Nobelprisen i kjemi for verdens minste maskiner'
[11] b'\xe2\x80\x93 Feriene blir billigere'
[12] b'D\xc3\xb8mt til fengsel for kakekasting'
[13] b'Gikk 635 h\xc3\xb8ydemeter for det perfekte bildet'
[14] b'\xe2\x80\x93 Jeg forst\xc3\xa5r ikke kunst'
[15] b'Lover millionst\xc3\xb8tte til s\xc3\xb8rsamisk senter'
[16] b'Cuba svekket orkanen Matthew'
[17] b'Klimaekspert: \xe2\x80\x93 Det henger ikke p\xc3\xa5 greip'
[18] b'Helleland p\xc3\xa5 utenlandsreise:\xc2\xa0 \xe2\x80\x93 Borte den viktigste dagen i \xc3\xa5ret'
[19] b'Gjekk fr\xc3\xa5 ein million'
[20] b'\xe2\x80\x93 Det er en sterk september'
[21] b'NRKs partibarometer for oktober: Frp faller stygt \xe2\x80\x93 skylder p\xc3\xa5 Ap'
[22] b'Her har skuleborn blitt sjuke i mange \xc3\xa5r'
[23] b'M\xc3\xa5tte sjekke Pok\xc3\xa9mon'
[24] b'Har levd lenge med ADHD uten \xc3\xa5 vite det'
[25] b'\xc2\xabNorge er dopet p\xc3\xa5 olje\xc2\xbb'
[26] b'Minst 11 d\xc3\xb8de'
[27] b'Hele \xc2\xabHakkebakkeskogen\xc2\xbb dukket opp'
[28] b'Visepresident-duellen: Dette l\xc3\xb8y de om'
[29] b'Hegerberg vil ha Nordlie'
[30] b'Han tjener mest av fylkesordf\xc3\xb8rerne'
[31] b'Vil gi 11 mill. mer for \xc3\xa5 lokke filmbransjen'
[32] b'Vil gi smilefjes til norske sykehus'
[33] b'Visepresident-duellen: Beskyldte Trump for manglende patriotisme'
[34] b'\xe2\x80\x93 Trump har betalt millioner i skatt'
[35] b'\xe2\x80\x93 Pence vant p\xc3\xa5 stil, Kaine p\xc3\xa5 innhold'
[36] b'Tvitret heftig under debatten'
[37] b'\xe2\x80\x93 Det var d\xc3\xa5 det skjedde noko med namnet mitt'
[38] b'Har du peiling p\xc3\xa5 stadnamn?'
[39] b'Kristen skole fjerner sider\xc2\xa0om pubertet'
[40] b'\xe2\x80\x93 Intervjuet\xc2\xa0blir bare absurd'
[41] b'I utlandet n\xc3\xa5r budsjettet blir lagt fram'
[42] b'Idrettsstyrets mindretall vil legge kortene p\xc3\xa5 bordet: \xe2\x80\x93 Skaper mistanke om mislighold'
[43] b'\xe2\x80\x93 Er eg skyldig i klimaendringane?'

@iamaziz
Copy link
Owner

iamaziz commented Oct 6, 2016

hmm, not sure why this happens. Initially each feed is encoded as utf-8. I tried that link (on macOS and linux) with no issues. What you mean by encoding it again ?

image

@simonkaspersen
Copy link
Author

Weird. I did google a little to check it it was some encoding issue with Python, or with feedparser, but all I saw, was that if the RSS already was encoded to UTF-8, a .decode(‘utf8’) would decode it to ISO-something, before it decodes it to UTF-8 again. I don’t know :P

I will try some more, because i like the concept :D

  1. okt. 2016 kl. 20.46 skrev Aziz Alto [email protected]:

hmm, not sure why this happens. Initially each feed is encoded as utf-8 https://github.com/iamaziz/TermFeed/blob/master/termfeed/feed.py#L79. I tried that link (on macOS and linux) with no issues. What you mean by encoding it again ?

https://cloud.githubusercontent.com/assets/3298308/19165000/1ca31c6c-8bd0-11e6-959f-c2d9d6087c69.png

You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub #7 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AGCFzt5lk3NFib3dNsPzT8ZV4MvRhUMzks5qxUIPgaJpZM4KOyRQ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants