Replies: 1 comment 2 replies
-
Technically what you want is to interpret as bytes not latin-1 - latin-1 is only relevant in that latin-1 characters map to the bytes equivalent to the same unicode codepoints. Since all you're doing is constructing a URL you can do that rather than using the "form" generator, try something like this:
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm at the absolute very beginning of rewriting the very first module I ever pushed to CPAN to try and fall in love with Perl again and of course I'm bumping my head on something.
To scrape a BitTorrent tracker, you must include a SHA-1 hash as a query parameter. Trackers expect a properly escaped, ISO-8859-1/Latin-1 encoded string. Of course, Mojo::UserAgent, by default, encodes query params to UTF-8 which causes my escaped infohash look like
%1B%C3%90%C2%88%C3%AE%C2%91f%C2%A0b%C3%8FJ%C3%B0%C2%9C%C3%B9%C2%97+%C3%BAn%1A13
rather than the expected%1B%D0%88%EE%91f%A0b%CFJ%F0%9C%F9%97+%FAn%1A13
.This should be overridable but, for
HEAD
andGET
,Mojo::UserAgent::Transactor::_form(...)
silently ignores the user definedcharset
when merging query parameters. I hesitate to report this as an issue so I'm posting it here; I'm sure there's a solid reason it functions this way (I'm not an HTTP standards expert) but I can't find a git blame or any other discussion around it and no other HTTP client is making this choice by default. Am I missing something? Is setting the charset manually with$tx->req->url->query->charset(undef);
my only/best option?Code Example
Here's a minimal example if such a thing is needed. This scrapes the Debian project's tracker for debian-12.7.0-amd64-netinst.iso.torrent so it shouldn't set off alarms at most rational ISPs but you could always just comment out the
start(...)
call.Beta Was this translation helpful? Give feedback.
All reactions