Incorrect regex used to match Content-Type #54

slicingmelon · 2024-10-29T20:26:34Z

Hello,

I've noticed that the current regex used to get the Content-Type is not sufficient to cover all content types.

REGEX_CONTENT_TYPE = re.compile(r"Content-Type:\s+(\w+/\w+)", re.IGNORECASE)

This can be seen in regex101.com too

This will lead to empty content-types in the output.

Also, curl sometimes returns the headers case-insensitive, so to fix all these, I suggest the following regex:

(?i)content-type:\s*([-\w.]+/[-\w.]+(?:\s*;\s*[\w-]+=(?:\"[^\"]*\"|[^\s;]*))*)

So, the code would be:

REGEX_CONTENT_TYPE = re.compile(r'(?i)content-type:\s*([-\w.]+/[-\w.]+(?:\s*;\s*[\w-]+=(?:\"[^\"]*\"|[^\s;]*))*)')

Tested in regex101 as well

The text was updated successfully, but these errors were encountered:

laluka · 2024-10-29T23:01:33Z

Nice catch, I'll have a look with @jtof-fap when we find some free time ! 🌹

Provide feedback