Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Error due to Duplicate Content-Type Headers in cwltool #747

Open
suecharo opened this issue Sep 29, 2023 · 7 comments
Open

Unexpected Error due to Duplicate Content-Type Headers in cwltool #747

suecharo opened this issue Sep 29, 2023 · 7 comments

Comments

@suecharo
Copy link
Contributor

suecharo commented Sep 29, 2023

When I run the following command with the latest version of cwltool:

$ cwltool https://sandbox.zenodo.org/record/1016630/files/trimming_and_qc.cwl --help

I encounter the error below:

While fetching https://sandbox.zenodo.org/record/1016630/files/trimming_and_qc.cwl, got content-type of 'application/octet-stream, application/octet-stream'. Expected one of ['text/plain', 'application/json', 'text/vnd.yaml', 'text/yaml', 'text/x-yaml', 'application/x-yaml', 'application/octet-stream'].

I thought this might be related to the fix I provided in the past at:

common-workflow-language/cwltool#1622

However, upon closer inspection, I noticed that the content-type is duplicated: application/octet-stream, application/octet-stream.

I fetched the actual headers using curl, and observed:

$ curl -D - https://sandbox.zenodo.org/record/1016630/files/trimming_and_qc.cwl
...
Content-Type: application/octet-stream
...
Content-Type: application/octet-stream
...

It seems there are two Content-Type lines.

I suspect the code around:

resp = self.session.get(url, headers=headers)

might be related to this issue, but I'm not sure about the exact solution. Could you please look into this?

@mr-c
Copy link
Member

mr-c commented Sep 29, 2023

Thanks for the report!

It seems that repeating HTTP header fields is valid.

I would rename content_type to received_content_types and also add a .split(",") to make it a list.

if content_types and "content-type" in resp.headers:

Then we can check if there is no intersection between the two sets/lists (content_types.isdisjoint(received_content_types) and throw the error as before if so.

@suecharo
Copy link
Contributor Author

suecharo commented Oct 3, 2023

Thank you, Mr. @mr-c .
Should I create the PR?
(Tazro seems to want this fix done sooner rather than later.)

@mr-c
Copy link
Member

mr-c commented Oct 3, 2023

@suecharo yes, that would be great. Thank you

@suecharo
Copy link
Contributor Author

suecharo commented Oct 5, 2023

My Environment

  • OS: Ubuntu 22.04.3 LTS (Jammy Jellyfish)
  • Python: 3.10.12

Steps to Reproduce

Run the following command:

$ docker run -it --rm -v "$PWD":"$PWD" -w="$PWD" quay.io/commonwl/cwltool:3.1.20220628170238 https://zenodo.org/api/files/2422dda0-1bd9-4109-aa44-53d55fd934de/download-sra.cwl --help
INFO /usr/local/bin/cwltool 3.1
While fetching https://zenodo.org/api/files/2422dda0-1bd9-4109-aa44-53d55fd934de/download-sra.cwl, got content-type of 'application/octet-stream, application/octet-stream'. Expected one of ['text/plain', 'application/json', 'text/vnd.yaml', 'text/yaml', 'text/x-yaml', 'application/x-yaml', 'application/octet-stream']

Development and Testing

Setting up the schema_salad in a virtual environment.

# === build and install ===
$ git clone --depth 1 https://github.com/suecharo/schema_salad && cd schema_salad
$ which python3
/usr/bin/python3
$ python3 -m venv .
$ source ./bin/activate
(schema_salad) $ which python3
/home/suecharo/git/github.com/suecharo/schema_salad/bin/python3
(schema_salad) $ readlink $(which python3)
/usr/bin/python3
(schema_salad) $ which pip
/home/suecharo/git/github.com/suecharo/schema_salad/bin/pip
(schema_salad) $ readlink $(which pip)

(schema_salad) $ pip install -e .
...
Successfully installed CacheControl-0.13.1 certifi-2023.7.22 charset-normalizer-3.3.0 filelock-3.12.4 idna-3.4 importlib-resources-6.1.0 isodate-0.6.1 mistune-2.0.5 msgpack-1.0.7 mypy-extensions-1.0.0 pyparsing-3.1.1 rdflib-7.0.0 requests-2.31.0 ruamel.yaml-0.17.33 ruamel.yaml.clib-0.2.7 schema-salad-0.1.dev1258+ge16612a six-1.16.0 urllib3-2.0.6

(schema_salad) $ pip list
Package             Version              Editable project location
------------------- -------------------- ---------------------------------------------------
CacheControl        0.13.1
certifi             2023.7.22
charset-normalizer  3.3.0
filelock            3.12.4
idna                3.4
importlib-resources 6.1.0
isodate             0.6.1
mistune             2.0.5
msgpack             1.0.7
mypy-extensions     1.0.0
pip                 22.0.2
pyparsing           3.1.1
rdflib              7.0.0
requests            2.31.0
ruamel.yaml         0.17.33
ruamel.yaml.clib    0.2.7
schema-salad        0.1.dev1258+ge16612a /home/suecharo/git/github.com/suecharo/schema_salad
setuptools          59.6.0
six                 1.16.0
urllib3             2.0.6

(schema_salad) $ ls ./bin/
activate      activate.fish  csv2rdf      normalizer  pip3     python   python3.10  rdfgraphisomorphism  rdfs2dot          schema-salad-tool
activate.csh  Activate.ps1   doesitcache  pip         pip3.10  python3  rdf2dot     rdfpipe              schema-salad-doc

Installing cwltool in a virtual environment using editable schema_salad.

# cwl-utils
(schema_salad) $ git clone --depth 1 https://github.com/common-workflow-language/cwl-utils.git
(schema_salad) $ cd cwl-utils
(schema_salad) $ vim ./requirements.txt
# Edit schema_salad version to the editable one.
(schema_salad) $ pip install -e .
...
Successfully installed cwl-upgrader-1.2.9 cwl-utils-0.29 packaging-23.2

# cwltool
(schema_salad) $ git clone --depth 1 https://github.com/common-workflow-language/cwltool.git
(schema_salad) $ cd cwltool
(schema_salad) $ vim ./setup.py
# Edit schema_salad and cwl-utils version to the editable one.
(schema_salad) $ pip install -e .
...
Successfully installed argcomplete-3.1.2 coloredlogs-15.0.1 cwltool-3.1 humanfriendly-10.0 lxml-4.9.3 networkx-3.1 prov-1.5.1 psutil-5.9.5 pydot-1.4.2 python-dateutil-2.8.2 shellescape-3.8.1

(schema_salad) $ pip list
Package             Version              Editable project location
------------------- -------------------- -------------------------------------------------------------
argcomplete         3.1.2
CacheControl        0.13.1
certifi             2023.7.22
charset-normalizer  3.3.0
coloredlogs         15.0.1
cwl-upgrader        1.2.9
cwl-utils           0.29                 /home/suecharo/git/github.com/suecharo/schema_salad/cwl-utils
cwltool             3.1                  /home/suecharo/git/github.com/suecharo/schema_salad/cwltool
filelock            3.12.4
humanfriendly       10.0
idna                3.4
importlib-resources 6.1.0
isodate             0.6.1
lxml                4.9.3
mistune             2.0.5
msgpack             1.0.7
mypy-extensions     1.0.0
networkx            3.1
packaging           23.2
pip                 22.0.2
prov                1.5.1
psutil              5.9.5
pydot               1.4.2
pyparsing           3.1.1
python-dateutil     2.8.2
rdflib              7.0.0
requests            2.31.0
ruamel.yaml         0.17.33
ruamel.yaml.clib    0.2.7
schema-salad        0.1.dev1258+ge16612a /home/suecharo/git/github.com/suecharo/schema_salad
setuptools          59.6.0
shellescape         3.8.1
six                 1.16.0
urllib3             2.0.6

Before attempting to fix the issue, I ran the following command to confirm that the issue is reproducible.

(schema_salad) $ cwltool https://sandbox.zenodo.org/record/1016630/files/trimming_and_qc.cwl --help
INFO /home/suecharo/git/github.com/suecharo/schema_salad/bin/cwltool 3.1
usage: https://sandbox.zenodo.org/record/1016630/files/trimming_and_qc.cwl [-h] --fastq_1 FASTQ_1
                                                                           --fastq_2 FASTQ_2
                                                                           [--nthreads NTHREADS]
                                                                           [job_order]

The error was not reproducible. 🤔
This made me suspect that the error might be specific to the container: quay.io/commonwl/cwltool:3.1.20220628170238.
And probably the container is using an older version of schema_salad.

I added print statements in the fetcher.py to further investigate:

try:
    headers = {}
    if content_types:
        headers["Accept"] = ", ".join(content_types) + ", */*;q=0.8"
    resp = self.session.get(url, headers=headers)
    resp.raise_for_status()
except Exception as e:
    raise ValidationException(f"Error fetching {url}: {e}") from e

# === added ===
print("=== resp.headers ===")
print(resp.headers)
print("=== resp.headers['content-type'] ===")
print(resp.headers["content-type"])

Then I ran the following command again:

(schema_salad) $ cwltool https://sandbox.zenodo.org/record/1016630/files/trimming_and_qc.cwl --help
INFO /home/suecharo/git/github.com/suecharo/schema_salad/bin/cwltool 3.1
=== resp.headers ===
{'Server': 'nginx', 'Date': 'Thu, 05 Oct 2023 02:33:36 GMT', 'Content-Length': '1151', 'Content-Disposition': 'attachment; filename=trimming_and_qc.cwl', 'Accept-Ranges': 'none, bytes', 'Set-Cookie': 'session=9779c6ebbc5f63d_651e2080.LCWpzVkPaLmiYEY9UkKCpqimCS8; Expires=Sun, 05-Nov-2023 02:33:36 GMT; Secure; HttpOnly; Path=/', 'OC-Checksum': 'MD5:415878c78ed8265bd7367099cf2254f7', 'Content-Security-Policy': "default-src 'none';", 'X-Content-Type-Options': 'nosniff', 'X-Download-Options': 'noopen', 'X-Permitted-Cross-Domain-Policies': 'none', 'X-Frame-Options': 'sameorigin', 'X-XSS-Protection': '1; mode=block', 'ETag': '"md5:415878c78ed8265bd7367099cf2254f7"', 'X-RateLimit-Limit': '60', 'X-RateLimit-Remaining': '59', 'X-RateLimit-Reset': '1696473276', 'Retry-After': '59', 'Strict-Transport-Security': 'max-age=0', 'Referrer-Policy': 'strict-origin-when-cross-origin'}
=== resp.headers['content-type'] ===
ERROR I'm sorry, I couldn't load this CWL file, try again with --debug for more information.
The error was: 'content-type'

The output showed that the requests library was unable to retrieve the content-type header. 🤔

@suecharo
Copy link
Contributor Author

suecharo commented Oct 5, 2023

@mr-c ,
In summary, the error I encountered seems likely to be resolved by updating the cwltool container. However, upon further debugging, I noticed that the requests library isn't fetching the content-type header in such cases. Just wanted to report this to you.

@mr-c
Copy link
Member

mr-c commented Oct 5, 2023

@suecharo I tested your example with the latest cwltool and schema_salad dev branches, and I get the original error that you reported. Then I tried again in a clean virtualenv and I received the new error about the missing content-type header!

Looking into it, I think that when we get a cached response the content-type header is missing. Delete the ~/.cache/salad directory and try again. This returned the original error.

@suecharo
Copy link
Contributor Author

suecharo commented Oct 6, 2023

#754 created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants