Skip to content

Commit

Permalink
Merge pull request #1924 from jqnatividad/json
Browse files Browse the repository at this point in the history
`json`: change `jsonp` to `json` using new implementation
  • Loading branch information
jqnatividad authored Jun 27, 2024
2 parents cd44da1 + 29d610f commit 57342c2
Show file tree
Hide file tree
Showing 14 changed files with 224 additions and 225 deletions.
24 changes: 24 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,7 @@ url = "2.5"
vader_sentiment = { version = "0.1", optional = true }
whatlang = { version = "0.16", optional = true }
xxhash-rust = { version = "0.8", features = ["xxh3"] }
json-objects-to-csv = "0.1.3"

[target.'cfg(not(target_arch = "aarch64"))'.dependencies]
simdutf8 = "0.1"
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@
| [join](/src/cmd/join.rs#L2)<br>👆 | Inner, outer, right, cross, anti & semi joins. Automatically creates a simple, in-memory hash index to make it fast. |
| [joinp](/src/cmd/joinp.rs#L2)<br>✨🚀🐻‍❄️ | Inner, outer, cross, anti, semi & asof joins using the [Pola.rs](https://www.pola.rs) engine. Unlike the `join` command, `joinp` can process files larger than RAM, is multithreaded, has join key validation, pre-join filtering, supports [asof joins](https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.DataFrame.join_asof.html) (which is [particularly useful for time series data](https://github.com/jqnatividad/qsv/blob/30cc920d0812a854fcbfedc5db81788a0600c92b/tests/test_joinp.rs#L509-L983)) & its output doesn't have duplicate columns. However, `joinp` doesn't have an --ignore-case option & it doesn't support right outer joins. |
| [jsonl](/src/cmd/jsonl.rs#L2)<br>🚀🔣 | Convert newline-delimited JSON ([JSONL](https://jsonlines.org/)/[NDJSON](http://ndjson.org/)) to CSV. See `tojsonl` command to convert CSV to JSONL.
| [jsonp](/src/cmd/jsonp.rs#L2)<br>✨🐻‍❄️ | Convert non-nested JSON to CSV. Only available with the polars feature enabled.
| [json](/src/cmd/json.rs#L2)<br> | Convert non-nested JSON to CSV.
| <a name="luau_deeplink"></a><br>[luau](/src/cmd/luau.rs#L2) 👑<br>✨📇🌐🔣 ![CKAN](docs/images/ckan.png) | Create multiple new computed columns, filter rows, compute aggregations and build complex data pipelines by executing a [Luau](https://luau-lang.org) [0.630](https://github.com/Roblox/luau/releases/tag/0.630) expression/script for every row of a CSV file ([sequential mode](https://github.com/jqnatividad/qsv/blob/bb72c4ef369d192d85d8b7cc6e972c1b7df77635/tests/test_luau.rs#L254-L298)), or using [random access](https://www.webopedia.com/definitions/random-access/) with an index ([random access mode](https://github.com/jqnatividad/qsv/blob/bb72c4ef369d192d85d8b7cc6e972c1b7df77635/tests/test_luau.rs#L367-L415)).<br>Can process a single Luau expression or [full-fledged data-wrangling scripts using lookup tables](https://github.com/dathere/qsv-lookup-tables#example) with discrete BEGIN, MAIN and END sections.<br> It is not just another qsv command, it is qsv's [Domain-specific Language](https://en.wikipedia.org/wiki/Domain-specific_language) (DSL) with [numerous qsv-specific helper functions](https://github.com/jqnatividad/qsv/blob/113eee17b97882dc368b2e65fec52b86df09f78b/src/cmd/luau.rs#L1356-L2290) to build production data pipelines. |
| [partition](/src/cmd/partition.rs#L2)<br>👆 | Partition a CSV based on a column value. |
| [prompt](/src/cmd/prompt.rs#L2) | Open a file dialog to pick a file. |
Expand Down
50 changes: 25 additions & 25 deletions contrib/bashly/completions.bash
Original file line number Diff line number Diff line change
Expand Up @@ -108,10 +108,6 @@ _qsv_completions() {
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -W "$(_qsv_completions_filter "--help --k_weight -h")" -- "$cur" )
;;

*'geocode'*'suggest'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -W "$(_qsv_completions_filter "--admin1 --help --min-score -h")" -- "$cur" )
;;

*'snappy'*'compress'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--help -h")" -- "$cur" )
;;
Expand All @@ -120,6 +116,10 @@ _qsv_completions() {
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--help -h")" -- "$cur" )
;;

*'geocode'*'suggest'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -W "$(_qsv_completions_filter "--admin1 --help --min-score -h")" -- "$cur" )
;;

*'fetch'*'--jqlfile')
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -- "$cur" )
;;
Expand Down Expand Up @@ -340,8 +340,8 @@ _qsv_completions() {
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--delimiter --end --help --index --json --len --no-headers --output --start -h")" -- "$cur" )
;;

*'jsonl'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--batch --delimiter --help --ignore-errors --jobs --output -h")" -- "$cur" )
*'table'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--align --condense --delimiter --help --memcheck --output --pad --width -h")" -- "$cur" )
;;

*'count'*)
Expand All @@ -364,22 +364,18 @@ _qsv_completions() {
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--delimiter --harvest-mode --help --json --just-mime --no-infer --prefer-dmy --pretty-json --progressbar --quick --quote --sample --save-urlsample --stats-types --timeout --user-agent -h")" -- "$cur" )
;;

*'table'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--align --condense --delimiter --help --memcheck --output --pad --width -h")" -- "$cur" )
*'stats'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--cache-threshold --cardinality --dates-whitelist --delimiter --everything --force --help --infer-boolean --infer-dates --jobs --mad --median --memcheck --mode --no-headers --nulls --output --prefer-dmy --quartiles --round --select --stats-binout --typesonly -h")" -- "$cur" )
;;

*'jsonp'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--date-format --datetime-format --float-precision --help --output --time-format --wnull-value -h")" -- "$cur" )
*'jsonl'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--batch --delimiter --help --ignore-errors --jobs --output -h")" -- "$cur" )
;;

*'split'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--chunks --delimiter --filename --help --jobs --kb-size --no-headers --pad --quiet --size -h")" -- "$cur" )
;;

*'stats'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--cache-threshold --cardinality --dates-whitelist --delimiter --everything --force --help --infer-boolean --infer-dates --jobs --mad --median --memcheck --mode --no-headers --nulls --output --prefer-dmy --quartiles --round --select --stats-binout --typesonly -h")" -- "$cur" )
;;

*'sqlp'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--compress-level --compression --date-format --datetime-format --decimal-comma --delimiter --float-precision --format --help --ignore-errors --infer-len --low-memory --memcheck --no-headers --no-optimizations --output --quiet --rnull-values --statistics --time-format --truncate-ragged-lines --try-parsedates --wnull-value -h")" -- "$cur" )
;;
Expand All @@ -388,24 +384,28 @@ _qsv_completions() {
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--delimiter --faster --help --ignore-case --jobs --memcheck --no-headers --numeric --output --random --reverse --rng --seed --select --unique -h")" -- "$cur" )
;;

*'fill'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--backfill --default --delimiter --first --groupby --help --no-headers --output -h")" -- "$cur" )
;;

*'diff'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--delimiter right --delimiter-left --delimiter-output --help --jobs --key --no-headers-left --no-headers-output --no-headers-right --output --sort-columns -h")" -- "$cur" )
;;

*'enum'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--constant --copy --delimiter --hash --help --increment --new-column --no-headers --output --start --uuid4 --uuid7 -h")" -- "$cur" )
;;

*'luau'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -W "$(_qsv_completions_filter "--begin --cache-dir --ckan-api --ckan-token --colindex --delimiter --end --help --luau-path --max-errors --no-globals --no-headers --output --progressbar --remap --timeout -h filter map")" -- "$cur" )
;;

*'json'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--help --output -h")" -- "$cur" )
;;

*'join'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--cross --delimiter --full --help --ignore-case --left --left-anti --left-semi --no-headers --nulls --output --right -h")" -- "$cur" )
;;

*'enum'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--constant --copy --delimiter --hash --help --increment --new-column --no-headers --output --start --uuid4 --uuid7 -h")" -- "$cur" )
*'fill'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--backfill --default --delimiter --first --groupby --help --no-headers --output -h")" -- "$cur" )
;;

*'cat'*)
Expand All @@ -416,16 +416,16 @@ _qsv_completions() {
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -A file -W "$(_qsv_completions_filter "--ascii --crlf --delimiter --escape --help --no-final-newline --out-delimiter --output --quote --quote-always --quote-never -h")" -- "$cur" )
;;

*'to'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -W "$(_qsv_completions_filter "--delimiter --drop --dump --evolve --help --jobs --pipe --print-package --quiet --schema --separator --stats --stats-csv -h datapackage parquet postgres sqlite xlsx")" -- "$cur" )
;;

*'py'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -W "$(_qsv_completions_filter "--batch --delimiter --help --helper --no-headers --output --progressbar -h filter map")" -- "$cur" )
;;

*'to'*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -W "$(_qsv_completions_filter "--delimiter --drop --dump --evolve --help --jobs --pipe --print-package --quiet --schema --separator --stats --stats-csv -h datapackage parquet postgres sqlite xlsx")" -- "$cur" )
;;

*)
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -W "$(_qsv_completions_filter "--envlist --help --list --update --updatenow --version -h -v apply applydp behead cat count datefmt dedup describegpt diff enum excel exclude explode extdudup extsort fetch fetchpost fill fixlengths flatten fmt foreach frequency generate geocode headers index input join joinp jsonl jsonp luau partition prompt pseudo py rename reverse safenames sample schema search searchset select slice snappy sniff sort sortcheck split sqlp stats table to tojsonl transpose validate")" -- "$cur" )
while read -r; do COMPREPLY+=( "$REPLY" ); done < <( compgen -W "$(_qsv_completions_filter "--envlist --help --list --update --updatenow --version -h -v apply applydp behead cat count datefmt dedup describegpt diff enum excel exclude explode extdudup extsort fetch fetchpost fill fixlengths flatten fmt foreach frequency generate geocode headers index input join joinp json jsonl luau partition prompt pseudo py rename reverse safenames sample schema search searchset select slice snappy sniff sort sortcheck split sqlp stats table to tojsonl transpose validate")" -- "$cur" )
;;

esac
Expand Down
12 changes: 1 addition & 11 deletions contrib/bashly/src/bashly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -814,20 +814,10 @@ commands:
- *output
- *delimiter

- name: jsonp
- name: json
completions:
- <file>
flags:
- long: --datetime-format
arg: <fmt>
- long: --date-format
arg: <fmt>
- long: --time-format
arg: <fmt>
- long: --float-precision
arg: <arg>
- long: --wnull-value
arg: <arg>
- *output

- name: luau
Expand Down
2 changes: 1 addition & 1 deletion contrib/fish/qsv.fish
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Set all available subcommands for qsv
set -l qsv_commands apply applydp behead cat count datefmt dedup describegpt diff enum excel exclude explode extdedup extsort fetch fetchpost fill fixlengths flatten fmt frequency geocode headers index input join joinp jsonl jsonp luau partition prompt pseudo py rename replace reverse safenames sample schema search searchset select slice snappy sniff sort sortcheck split sqlp stats table to tojsonl transpose validate
set -l qsv_commands apply applydp behead cat count datefmt dedup describegpt diff enum excel exclude explode extdedup extsort fetch fetchpost fill fixlengths flatten fmt frequency geocode headers index input join joinp jsonl json luau partition prompt pseudo py rename replace reverse safenames sample schema search searchset select slice snappy sniff sort sortcheck split sqlp stats table to tojsonl transpose validate

# Enable completions for qsv
complete -c qsv
Expand Down
2 changes: 1 addition & 1 deletion docs/FEATURES.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
* `foreach` - enable `foreach` command (not valid for Windows).
* `geocode` - enable `geocode` command.
* `luau` - enable `luau` command. Embeds a [Luau](https://luau-lang.org) interpreter into qsv. [Luau has type-checking, sandboxing, additional language operators, increased performance & other improvements](https://luau-lang.org/2022/11/04/luau-origins-and-evolution.html) over Lua.
* `polars` - enables all [Polars](https://pola.rs)-powered commands (currently, `joinp`, `jsonp` and `sqlp`. Also enables polars mode in `count`). Note that Polars is a very powerful library, but it has a lot of dependencies that drastically increases both compile time and binary size.
* `polars` - enables all [Polars](https://pola.rs)-powered commands (currently, `joinp` and `sqlp`. Also enables polars mode in `count`). Note that Polars is a very powerful library, but it has a lot of dependencies that drastically increases both compile time and binary size.
* `python` - enable `py` command. Note that qsv will look for the shared library for the Python version (Python 3.7 & above supported) it was compiled against & will abort on startup if the library is not found, even if you're NOT using the `py` command. Check [Python](#python) section for more info.
* `to` - enables the `to` command except the parquet option.
* `to_parquet` - enables the `parquet` option of the `to` command. This is a separate feature as it brings in the `duckdb` dependency, which markedly increases binary size and compile time.
Expand Down
2 changes: 1 addition & 1 deletion scripts/benchmarks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -514,7 +514,7 @@ run geocode_suggest "$qsv_bin" geocode suggest City --new-column geocoded_city "
run geocode_reverse "$qsv_bin" geocode reverse Location --new-column geocoded_location "$data"
run index "$qsv_bin" index "$data"
run input "$qsv_bin" input "$data"
run jsonp "$qsv_bin" jsonp benchmark_data.json
run json "$qsv_bin" json benchmark_data.json
run join "$qsv_bin" join \'Community Board\' "$data" community_board communityboards.csv
run join_casei "$qsv_bin" join \'Community Board\' "$data" community_board --ignore-case communityboards.csv
run joinp "$qsv_bin" joinp \'Community Board\' "$data" community_board communityboards.csv
Expand Down
Loading

0 comments on commit 57342c2

Please sign in to comment.