Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sniff add --just-mime option #1372

Merged
merged 3 commits into from
Oct 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,11 @@ The `to` command converts CSVs to `.xlsx`, [Parquet](https://parquet.apache.org)

The `sqlp` command returns query results in CSV, JSON, Parquet & [Arrow IPC](https://arrow.apache.org/docs/format/Columnar.html#ipc-file-format) formats. Polars SQL also supports reading external files directly in various formats with its `read_ndjson`, `read_csv`, `read_parquet` & `read_ipc` [table functions](https://github.com/pola-rs/polars/blob/c7fa66a1340418789ec66bdedad6654281afa0ab/polars/polars-sql/src/table_functions.rs#L9-L36).

The `sniff` command can also detect the mime type of any file with the `--no-infer` or `--just-mime` options, may it be local or remote (http and https schemes supported).
It can detect more than 120 file formats, including MS Office/Open Document files, JSON, XML,
PDF, PNG, JPEG and specialized geospatial formats like GPX, GML, KML, TML, TMX, TSX, TTML.
See https://docs.rs/file-format/latest/file_format/#reader-features for a complete list.

### Snappy Compression/Decompression

qsv supports *automatic compression/decompression* using the [Snappy frame format](https://github.com/google/snappy/blob/main/framing_format.txt). Snappy was chosen instead of more popular compression formats like gzip because it was designed for [high-performance streaming compression & decompression](https://github.com/google/snappy/tree/main/docs#readme) (up to 2.58 gb/sec compression, 0.89 gb/sec decompression).
Expand Down
9 changes: 9 additions & 0 deletions src/cmd/sniff.rs
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,10 @@ sniff options:
(Unsigned, Signed => Integer, Text => String, everything else the same)
--no-infer Do not infer the schema. Only return the file's mime type, size and
last modified date. Use this to use sniff as a general mime type detector.
Note that CSV and TSV files will only be detected as mime type plain/text
in this mode.
--just-mime Only return the file's mime type. Use this to use sniff as a general
mime type detector. Synonym for --no-infer.
--quick When sniffing a non-CSV remote file, only download the first chunk of the file
before attempting to detect the mime type. This is faster but less accurate as
some mime types cannot be detected with just the first downloaded chunk.
Expand Down Expand Up @@ -139,6 +143,7 @@ struct Args {
flag_user_agent: Option<String>,
flag_stats_types: bool,
flag_no_infer: bool,
flag_just_mime: bool,
flag_quick: bool,
flag_harvest_mode: bool,
}
Expand Down Expand Up @@ -714,6 +719,10 @@ async fn sniff_main(mut args: Args) -> CliResult<()> {
Some("CKAN-harvest/$QSV_VERSION ($QSV_TARGET; $QSV_BIN_NAME)".to_string());
}

if args.flag_just_mime {
args.flag_no_infer = true;
}

let mut sample_size = args.flag_sample;
if sample_size < 0.0 {
if args.flag_json || args.flag_pretty_json {
Expand Down
29 changes: 29 additions & 0 deletions tests/test_sniff.rs
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,35 @@ fn sniff_notcsv() {
assert!(got_error.starts_with(expected));
}

#[test]
fn sniff_justmime() {
let wrk = Workdir::new("sniff_justmime");

let test_file = wrk.load_test_file("excel-xls.xls");

let mut cmd = wrk.command("sniff");
cmd.arg("--just-mime").arg(test_file);

let got: String = wrk.stdout(&mut cmd);

let expected = "Detected mime type: application/vnd.ms-excel";
assert!(got.starts_with(expected));
}

#[test]
fn sniff_justmime_remote() {
let wrk = Workdir::new("sniff_justmime_remote");

let mut cmd = wrk.command("sniff");
cmd.arg("--just-mime")
.arg("https://github.com/jqnatividad/qsv/raw/master/resources/test/excel-xls.xls");

let got: String = wrk.stdout(&mut cmd);

let expected = "Detected mime type: application/vnd.ms-excel";
assert!(got.starts_with(expected));
}

#[test]
fn sniff_url_snappy() {
let wrk = Workdir::new("sniff_url_snappy");
Expand Down
Loading