Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not assume input to be a URL if the local path doesn't exist #1595

Open
mre opened this issue Dec 18, 2024 · 0 comments
Open

Do not assume input to be a URL if the local path doesn't exist #1595

mre opened this issue Dec 18, 2024 · 0 comments
Labels
breaking-change help wanted Extra attention is needed
Milestone

Comments

@mre
Copy link
Member

mre commented Dec 18, 2024

Context

At the moment, lychee uses the http:// scheme as a fallback for local input paths that don't exist:

> lychee foo
Error: Network error

Caused by:
    0: error sending request for url (http://foo/)
    1: client error (Connect)
    2: dns error: failed to lookup address information: nodename nor servname provided, or not known
    3: failed to lookup address information: nodename nor servname provided, or not known

The idea was to model the behavior after curl, which does the same:
https://github.com/curl/curl/blob/70ac27604a2abfa809a7b2736506af0da8c3c8a9/lib/urlapi.c#L1104-L1124

Issue

Getting the scheme guessing correct is quite tricky.

Furthermore, it can be misleading if the user assumes a local path exists, but doesn't. In this case, we make an unexpected network request.
This can even go unnoticed in CI/CD in case there happens to be a URL with the same name.

For example, assume we want to check a ZIP archive (which isn't supported, but might be in the future).
Furthermore, assume the file doesn't exist.
If we run:

lychee --dump-inputs url.zip
http://url.zip/

This would assume http://url.zip/ instead! 💥
(Yes, .zip is a TLD.)

Proposal

I propose to remove the fallback to http. This is a breaking change, which we should make before 1.0.

Necessary changes

Remove the condition here and return an error instead:

// Invalid path; check if a valid URL can be constructed from the input
// by prefixing it with a `http://` scheme.
// Curl also uses http (i.e. not https), see
// https://github.com/curl/curl/blob/70ac27604a2abfa809a7b2736506af0da8c3c8a9/lib/urlapi.c#L1104-L1124
let url = Url::parse(&format!("http://{value}")).map_err(|e| {
ErrorKind::ParseUrl(e, "Input is not a valid URL".to_string())
})?;
InputSource::RemoteUrl(Box::new(url))

Add a test to prove correct behavior.

Pull requests greatly appreciated.

@mre mre added this to the v1.0 milestone Dec 18, 2024
@mre mre added help wanted Extra attention is needed breaking-change labels Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant