Releases: dathere/qsv
0.111.0
This is the first in a series of "Giddy-up" ππ½ releases.
As Quicksilver matures, we will continue to tweak it in our goal to be the π fastest general purpose CSV data-wrangling CLI toolkit available.
"Giddy-up" ππ½ releases increase performance by:
- taking advantage of new Rust features as they become available
- using new libraries that are faster than the ones we currently use
- optimizing our code to take advantage of new features in the libraries we use
- using new algorithms that are faster than the ones we currently use
- taking advantage of more hardware features (SIMD, multi-core, etc.)
- adding reproducible benchmarks that are automatically updated on release to track our progress
As it is, Quicksilver has an aggressive release tempo - with more than 160 releases since its initial release in December 2020. This was made possible by the solid foundation of Rust and the xsv project from which qsv was forked. We will continue to build on this foundation by adding more CI tests and starting to track code coverage so we can continue to iterate aggressively with confidence.
Apart from "giddy-up" releases, Quicksilver will also have "carousel" π releases that will focus on making the toolkit more accessible to non-technical users.
"Carousel" π releases will include:
- more documentation
- more examples
- more tutorials
- more recipes in the Cookbook
- multiple GUI wrappers around the CLI
- integrations with common desktop tools like Excel, Google Sheets, Open Office, etc.
- tighter integration with the CKAN ecosystem, with a focus on helping data publishers & data coordinators maintain a high quality data/metadata catalog
Hopefully, this will make qsv more accessible to non-technical users, and help them get more value out of their data. Special attention will be given to "open data" use cases - enabling non-profits, governments and regular citizens tap raw open data and convert it to actionable insight - making open data useful, usable and used.
Every now and then, we'll also have "Unicorn" π¦ releases that will add MAJOR new features to the toolkit (e.g. 10x type features like the integration of Pola.rs into qsv).
We will also add a new Technical Documentation section to the wiki to document qsv's architecture and how each command works. The hope is doing so will lower the barrier to contributions and help us grow the community of qsv contributors.
Added
Changed
stats
: refactor init_date_inference #1187join
: cache has_headers result in hot loop e53edafsearch
&searchset
: amortize allocs #1188stats
: usefast-float
to convert string to float #1191sqlp
: more examples, apply clippy::needless_borrow lint ff37a04 and b8e1f77- use
fast-float
project-wide (apply
,applydp
,schema
,sort
,validate
) #1192 - fine tune publishing workflows to enable universally available CPU features a1dccc7
- build(deps): bump serde from 1.0.179 to 1.0.180 by @dependabot in #1176
- build(deps): bump pyo3 from 0.19.1 to 0.19.2 by @dependabot in #1177
- build(deps): bump qsv-dateparser from 0.9.0 to 0.10.0 by @dependabot in #1178
- build(deps): bump qsv-sniffer from 0.9.4 to 0.10.0 by @dependabot in #1180
- build(deps): bump indicatif from 0.17.5 to 0.17.6 by @dependabot in #1182
- Bump to qsv stats 0.11 #1184
- build(deps): bump serde from 1.0.180 to 1.0.181 by @dependabot in #1185
- build(deps): bump qsv_docopt from 1.3.0 to 1.4.0 by @dependabot in #1186
- build(deps): bump filetime from 0.2.21 to 0.2.22 by @dependabot in #1193
- build(deps): bump regex from 1.9.1 to 1.9.2 by @dependabot in #1194
- build(deps): bump regex from 1.9.2 to 1.9.3 by @dependabot in #1195
- build(deps): bump serde from 1.0.181 to 1.0.182 by @dependabot in #1196
- build(deps): bump tempfile from 3.7.0 to 3.7.1 by @dependabot in #1199
- build(deps): bump strum_macros from 0.25.1 to 0.25.2 by @dependabot in #1200
- build(deps): bump serde from 1.0.182 to 1.0.183 by @dependabot in #1201
- cargo update bump several indirect dependencies
- apply select clippy lint suggestions
- pin Rust nightly to 2023-08-07
Removed
- temporarily remove rand/simd_support feature when building nightly as its causing the nightly build to fail 0a66fdb
Fixed
New Contributors
Full Changelog: 0.110.0...0.111.0
0.110.0
Added
describegpt
: Add jsonl to prompt file doc section & more clarification by @rzmk in #1149luau
: add--no-jit
option #1170sqlp
: add CTE examples 33f0218
Changed
frequency
: minor optimizations ecac0bejoin
: performance optimizations 4cb5937 and 788360asqlp
: reduce allocs in loop ae164b5- Apple Silicon build now uses mimalloc allocator by default bfab24a
- build(deps): bump jql-runner from 7.0.1 to 7.0.2 by @dependabot in #1151
- build(deps): bump serde from 1.0.171 to 1.0.173 by @dependabot in #1154
- build(deps): bump tempfile from 3.6.0 to 3.7.0 by @dependabot in #1155
- build(deps): bump serde from 1.0.174 to 1.0.175 by @dependabot in #1157
- build(deps): bump redis from 0.23.0 to 0.23.1 by @dependabot in #1164
- build(deps): bump serde from 1.0.175 to 1.0.177 by @dependabot in #1163
- build(deps): bump serde_json from 1.0.103 to 1.0.104 by @dependabot in #1160
- build(deps): bump grex from 1.4.1 to 1.4.2 by @dependabot in #1159
- build(deps): bump sysinfo from 0.29.6 to 0.29.7 by @dependabot in #1158
- build(deps): bump mlua from 0.9.0-rc.1 to 0.9.0-rc.3 by @dependabot in #1169
- build(deps): bump flexi_logger from 0.25.5 to 0.25.6 by @dependabot in #1168
- build(deps): bump jemallocator from 0.5.0 to 0.5.4 by @dependabot in #1167
- build(deps): bump serde from 1.0.177 to 1.0.178 by @dependabot in #1166
- build(deps): bump rust_decimal from 1.30.0 to 1.31.0 by @dependabot in #1172
- build(deps): bump csvs_convert from 0.8.6 to 0.8.7 by @dependabot in #1174
- apply
clippy:needless_pass_by_ref_mut
lint inselect
andfrequency
ba6566e and 83add7b - cargo update bump indirect dependencies
- pin Rust nightly to 2023-07-29
Removed
excel
: remove defunct dates-whitelist comments 2a24d2d
Fixed
join
: fix left-semi join. Fixes #1150. #1153foreach
: fix command argument token splitter pattern. Fixes #1171 #1173
Full Changelog: 0.109.0...0.110.0
0.109.0
This is a monstrousπΉ release with lots of new features and improvements!
The biggest new feature is the describegpt
command which allows you to use OpenAI's Large Language Models to generate extended metadata from a CSV. We created this command primarily for CKAN and Datapusher+ so we can infer descriptions, tags and to automatically created annotated data dictionaries using the CSV's summary statistics and frequency tables. In that way, it works even for very large CSV files without consuming too many Open AI tokens. This is a very powerful feature and we are looking forward to seeing what people do with it. Thanks @rzmk for all the work on this!
This release also features major improvements in the sqlp
and joinp
commands thanks to all the new capabilities of Polars 0.31.1.
Polars SQL's capabilities have been vastly improved in 0.31.1 with numerous new SQL functions and operators, and they're all available with the sqlp
command.
The joinp
command has several new options for CSV parsing, for pre-join filtering (--filter-left
and --filter-right
), and pre-join validation with the --validate
option. Two new asof join variants (--left_by
and --right_by
) were also added.
Added
describegpt
command by @rzmk in #1036describegpt
: minor refactoring in #1104describegpt
:--key
& QSV_OPENAI_API_KEY by @rzmk in #1105describegpt
: add--user-agent
in help message by @rzmk in #1095describegpt
: json output format for redirection by @rzmk in #1107describegpt
: add testing (resolves #1114) by @rzmk in #1115describegpt
: add--model
option (resolves #1101) by @rzmk in #1117describegpt
: polishing #1122describegpt
: add--jsonl
option (resolves #1086) by @rzmk in #1127describegpt
: add--prompt-file
option (resolves #1085) by @rzmk in #1120joinp
: addedasof_by
join variant; added CSV formatting options consistent with sqlp CSV format options #1090joinp
: add--filter-left
and--filter-right
options #1146joinp
: add--validate
option #1147fetch
&fetchpost
: add--no-cache
option #1112sniff
: detect file kind along with mime type #1137- user-agent metadata now contains the current command's name #1093
Changed
fetch
&fetchpost
: --redis and --no-cache are mutually exclusive #1113luau
: adapt to mlua 0.9.0-rc.1 API changes #1129- upgrade to Polars 0.31.1 #1139
- Bump MSRV to latest Rust stable (1.71.0)
- pin Rust nightly to 2023-07-15
- Bump uuid from 1.3.4 to 1.4.0 by @dependabot in #1073
- Bump tokio from 1.28.2 to 1.29.0 by @dependabot in #1077
- Bump tokio from 1.29.0 to 1.29.1 by @dependabot in #1087
- Bump sysinfo from 0.29.2 to 0.29.3 by @dependabot in #1088
- build(deps): bump sysinfo from 0.29.4 to 0.29.5 by @dependabot in #1148
- Bump jql-runner from 6.0.9 to 7.0.0 by @dependabot in #1092
- build(deps): bump jql-runner from 7.0.0 to 7.0.1 by @dependabot in #1132
- Bump itoa from 1.0.6 to 1.0.7 by @dependabot in #1091
- Bump itoa from 1.0.7 to 1.0.8 by @dependabot in #1098
- build(deps): bump itoa from 1.0.8 to 1.0.9 by @dependabot in #1142
- Bump serde from 1.0.164 to 1.0.165 by @dependabot in #1094
- Bump serde from 1.0.165 to 1.0.166 by @dependabot in #1100
- Bump serde from 1.0.166 to 1.0.167 by @dependabot in #1116
- build(deps): bump serde from 1.0.167 to 1.0.171 by @dependabot in #1118
- Bump pyo3 from 0.19.0 to 0.19.1 by @dependabot in #1099
- Bump ryu from 1.0.13 to 1.0.14 by @dependabot in #1096
- build(deps): bump ryu from 1.0.14 to 1.0.15 by @dependabot in #1144
- Bump strum_macros from 0.25.0 to 0.25.1 by @dependabot in #1097
- Bump serde_json from 1.0.99 to 1.0.100 by @dependabot in #1103
- build(deps): bump serde_json from 1.0.100 to 1.0.101 by @dependabot in #1123
- build(deps): bump serde_json from 1.0.101 to 1.0.102 by @dependabot in #1125
- build(deps): bump serde_json from 1.0.102 to 1.0.103 by @dependabot in #1143
- Bump serde_stacker from 0.1.8 to 0.1.9 by @dependabot in #1110
- Bump regex from 1.8.4 to 1.9.0 by @dependabot in #1109
- build(deps): bump regex from 1.9.0 to 1.9.1 by @dependabot in #1119
- Bump jsonschema from 0.17.0 to 0.17.1 by @dependabot in #1108
- build(deps): bump cpc from 1.9.1 to 1.9.2 by @dependabot in #1121
- build(deps): bump governor from 0.5.1 to 0.6.0 by @dependabot in #1128
- build(deps): bump actions/setup-python from 4.6.1 to 4.7.0 by @dependabot in #1134
- build(deps): bump file-format from 0.17.3 to 0.18.0 by @dependabot in #1136
- build(deps): bump serde_stacker from 0.1.9 to 0.1.10 by @dependabot in #1141
- build(deps): bump semver from 1.0.17 to 1.0.18 by @dependabot in #1140
- cargo update bump several indirect dependencies
Fixed
fmt
: Quote ASCII format differently by @LemmingAvalanche in #1075apply
: makedynfmt
subcommand case sensitive. Fixes #1126 #1130applydp
: makedynfmt
case-sensitive #1131describegpt
: docs/Describegpt.md: typo 'a' --> 'an' by @rzmk in #1135tojsonl
: support snappy-compressed input. Fixes #1133 #1145- security.md: fix mailto text by @rzmk in #1079
New Contributors
- @LemmingAvalanche made their first contribution in #1075
Full Changelog: 0.108.0...0.109.0
0.108.0
Another big Quicksilver release with lots of new features and improvements!
The two Polars-powered commands - joinp
and sqlp
- have received significant attention. joinp
now supports asof joins and the --try-parsedates
option. sqlp
now has several Parquet format options, along with a --low-memory
option.
Other new features include:
- A new
cat rowskey --group
option that emulates csvkit'scsvstack
command. - SIMD-accelerated UTF-8 validation for the
input
command. - A
--field-separator
option for theflatten
command. - The
sniff
command now uses the excellentfile-format
crate for mime-type detection on ALL platforms, not just Linux, as was the case when we were using the libmagic library.
Also, QuickSilver now has optimized builds for Apple Silicon. These builds are created using native Apple Silicon self-hosted Action Runners, which means we can enable all qsv features without being constrained by cross-compilation limitations and GitHubβs Action Runnerβs disk/memory constraints. Additionally, we compile Apple Silicon builds with M1/M2 chip optimizations enabled to maximize performance.
Finally, qsv startup should be noticeably faster, thanks to @viβs PR to avoid sysinfo::System::new_all.
Added
joinp
: added asof join & --try-parsedates option #1059cat
: emulate csvkit's csvstack #1067input
: SIMD-accelerated utf8 validation 88e1df2sniff
: replace magic with file-format crate, enabling mime-type detection on all platforms #1069sqlp
: add --low-memory option d95048esqlp
: added parquet format options c179cf4 a861ebfflatten
: add --field-separator option #1068- Apple Silicon binaries built on native Apple Silicon self-hosted Action Runners, enabling all features and optimized for M1/M2 chips
Changed
input
: minor improvements 62cff74joinp
: align option names withjoin
command #1058sqlp
: minor improvements- changed all GitHub action workflows to account for the new Apple Silicon builds
- Bump rust_decimal from 1.29.1 to 1.30.0 by @dependabot in #1049
- Bump serde_json from 1.0.96 to 1.0.97 by @dependabot in #1051
- Bump calamine from 0.21.0 to 0.21.1 by @dependabot in #1052
- Bump strum from 0.24.1 to 0.25.0 by @dependabot in #1055
- Bump actix-governor from 0.4.0 to 0.4.1 by @dependabot in #1060
- Bump csvs_convert from 0.8.5 to 0.8.6 by @dependabot in #1061
- Bump itertools from 0.10.5 to 0.11.0 by @dependabot in #1062
- Bump serde_json from 1.0.97 to 1.0.99 by @dependabot in #1065
- Bump indexmap from 1.9.3 to 2.0.0 by @dependabot in #1066
- Bump calamine from 0.21.1 to 0.21.2 by @dependabot in #1071
- cargo update bump various indirect dependencies
- pin Rust nightly to 2021-06-23
Fixed
Removed
- removed libmagic dependency from all GitHub action workflows
New Contributors
Full Changelog: 0.107.0...0.108.0
0.107.0
We continue to improve the new sqlp
command. It now supports SQL scripts and additional options to fine-tune Polars CSV parsing and formatting behavior.
We also added an _all_generic
special value for the rename
command which allows you to rename all columns in a CSV with generic names (e.g. _col_1, _col_2, _col_N). This was done to make it easier to prepare CSVs with no headers for use with sqlp
.
This release also features a Windows MSI installer. This is a big step forward for qsv and we hope to make it easier for Windows users to install and use qsv. Thanks @minhajuddin2510 for all the work on pulling this together!
Added
sqlp
: added script support #1037sqlp
: added CSV format options #1048rename
: add"_all_generic"
special value for headers #1031
Changed
excel
: now supports Duration type with calamine upgrade to 0.21.0 #1045- Update publish-wix-installer.yml by @minhajuddin2510 in #1032
- Bump mlua from 0.9.0-beta.2 to 0.9.0-beta.3 by @dependabot in #1030
- Bump serde from 1.0.163 to 1.0.164 by @dependabot in #1029
- Bump csvs_convert from 0.8.4 to 0.8.5 by @dependabot in #1028
- Bump sysinfo from 0.29.1 to 0.29.2 by @dependabot in #1027
- Bump log from 0.4.18 to 0.4.19 by @dependabot in #1039
- Bump uuid from 1.3.3 to 1.3.4 by @dependabot in #1041
- Bump jql-runner from 6.0.8 to 6.0.9 by @dependabot in #1043
- cargo update bump several indirect dependencies
- pin Rust nightly to 2021-06-13
Fixed
- Remove redundant registries protocol by @icp1994 in #1034
- fix typo in tojsonl.rs (optionns -> options) by @rzmk in #1035
- Fix eula by @minhajuddin2510 in #1046
New Contributors
Full Changelog: 0.106.0...0.107.0
0.106.0
This release features the new Polars-powered sqlp
command which allows you to run SQL queries against CSVs.
Initial tests show that its performance is competitive with DuckDB and faster than DataFusion on identical SQL queries, and it just runs rings around pandas sql.
It converts Polars SQL (a subset of ANSI SQL) queries to multi-threaded LazyFrames expressions and then executes them. This is a very powerful feature and allows you to do things like joins, aggregations, group bys, etc. on larger than memory CSVs. The sqlp
command is still experimental and we are looking for feedback on it. Please try it out and let us know what you think.
Added
sqlp
: new command to allow Polars SQL queries against CSVs #1015
Changed
- Bump csv from 1.2.1 to 1.2.2 by @dependabot in #1008
- Bump pyo3 from 0.18.3 to 0.19.0 by @dependabot in #1007
- workflow for creating msi for qsv by @minhajuddin2510 in #1009
- migrate from once_cell to std::sync::oncelock #1010
- Bump qsv_docopt from 1.2.2 to 1.3.0 by @dependabot in #1011
- Bump self_update from 0.36.0 to 0.37.0 by @dependabot in #1014
- Bump indicatif from 0.17.4 to 0.17.5 by @dependabot in #1013
- Bump cached from 0.43.0 to 0.44.0 by @dependabot in #1012
- Bump url from 2.3.1 to 2.4.0 by @dependabot in #1016
- Wix changes by @minhajuddin2510 in #1017
- Bump actions/github-script from 5 to 6 by @dependabot in #1018
- Bump regex from 1.8.3 to 1.8.4 by @dependabot in #1019
- Bump hashbrown from 0.13.2 to 0.14.0 by @dependabot in #1020
- Bump tempfile from 3.5.0 to 3.6.0 by @dependabot in #1021
- Bump sysinfo from 0.29.0 to 0.29.1 by @dependabot in #1023
- Bump qsv-dateparser from 0.8.2 to 0.9.0 by @dependabot in #1022
- Bump qsv-sniffer from 0.9.3 to 0.9.4 by @dependabot in #1024
- Bump qsv-stats from 0.9.0 to 0.10.0 3803579
- Bump embedded luau from 0.577 to 0.579
- Bump data-encoding from 2.3.3 to 2.4.0 2285a12
- cargo update bump several indirect dependencies
- change MSRV to 1.70.0
- pin Rust nightly to 2023-06-06
Full Changelog: 0.105.1...0.106.0
0.105.1
All "unsafe" code has been removed. By selectively using asserts, we obviate the need to use explicit unchecked logic to skip unnecessary bounds checking.
Changed
stats
: remove all unsafes 4a4c010fetch
&fetchpost
: remove unsafe 1826bb3validate
: remove unsafe 742ccb3- normalize
--user-agent
option across all of qsv feff90b & 839b3b7 - bump qsv-dateparser from 0.8.1 to 0.8.2 which also uses chrono 0.4.26
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-05-29
Fixed
- remove chrono pin to 0.4.24 and upgrade to 0.4.26 which fixed 0.4.25 CI test failures 7636d82
Full Changelog: 0.105.0...0.105.1
0.105.0
Added
sniff
: added --harvest-mode convenience option #997sniff
: added --quick option on Linux e16df6f- qsv (pronounced "Quicksilver") now has a tagline - "Hi ho, QuickSilver! Away!" π d32aeb1
Changed
sniff
: if --no-infer is enabled when sniffing a snappy file, just return the snappy mime type #996sniff
: now returns filesize and last-modified date in errors. 2162659stats
: minor performance tweaks in hot compute loop f61198c- qsv binary variants built using older glibc/musl libraries are now published with their respective glibc/musl version suffixes (glibc-2.31/musl-1.1.24) in the filename, instead of just the "older" suffix.
- pin chrono to 0.4.24 as the new 0.4.25 is breaking CI tests cde3623
- Bump calamine from 0.19.1 to 0.20.0 ec7e2df
- Bump actions/setup-python from 4.6.0 to 4.6.1 by @dependabot in #991
- Bump flexi_logger from 0.25.4 to 0.25.5 by @dependabot in #992
- Bump regex from 1.8.2 to 1.8.3 by @dependabot in #993
- Bump csvs_convert from 0.8.3 to 0.8.4 by @dependabot in #994
- Bump log from 0.4.17 to 0.4.18 by @dependabot in #998
- Bump polars from 0.29.0 to 0.30.0 by @dependabot in #999
- Bump tokio from 1.28.1 to 1.28.2 by @dependabot in #1000
- Bump once_cell from 1.17.1 to 1.17.2 by @dependabot in #1003
- Bump indicatif from 0.17.3 to 0.17.4 by @dependabot in #1001
- cargo bump update several indirect dependencies
- pin Rust nightly to 2023-05-28
Removed
excel
: removed kludgy --dates-whitelist option #1005
Fixed
sniff
: fix inconsistent mime type detection #995
Full Changelog: 0.104.1...0.105.0
0.104.1
Added
- added new publishing workflow to build binary variants using older glibc 2.31 instead of glibc 2.35 and musl 1.1.24 instead of musl 1.2.2. This will allow users running on older Linux distros (e.g. Debian, Ubuntu 20.04) to run qsv prebuilt binaries with "older" glibc/musl versions. 1a08b92
Changed
sniff
: improved usage text d2b32acsniff
: if sniffing a URL, and server does not return content-length or last-modified headers, set filesize and last-modified to "Unknown" d4a64acfrequency
: use SIMD accelerated utf8 validation in hot loop 33406a1foreach
: use simdut8 validation df6b4f8apply
: use simdutf8 validation in decode operation; also tweak it to avoid panics (however unlikely) adf7052- update install & build instructions with magic
- Bump regex from 1.8.1 to 1.8.2 by @dependabot in #990
- Bump bumpalo from 3.12.2 to 3.13.0
- pin Rust nightly to 2021-05-22
Removed
sniff
: disabled --progressbar option on qsvdp binary variant 1a20edb
Fixed
- updated publishing workflows to properly enable magic feature (for sniff mime type detection) 136211f
Full Changelog: 0.104.0...0.104.1
0.104.0
Added
sniff
: add --no-infer option only available on Linux. Using this option makessniff
work as a general mime type detector - retrieving detected mime type, file size (content-length when sniffing a URL), and last modified date.
When sniffing a URL with --no-infer, it only sniffs the first downloaded chunk, making it very fast even for very large remote files. This option was designed to facilitate accelerated harvesting and broken/stale link checking on CKAN. #987excel
: add canonical_filename to metadata #985snappy
: now accepts url input #986sample
: support url input #989
Changed
- Bump qsv-sniffer from 0.9.2 to 0.9.3 by @dependabot in #979
- Bump console from 0.15.5 to 0.15.6 by @dependabot in #980
- Bump jql-runner from 6.0.7 to 6.0.8 by @dependabot in #981
- Bump console from 0.15.6 to 0.15.7 by @dependabot in #988
- Bump embedded Luau from 0.576 to 0.577
- apply select clippy recommendations
- tweaked emojis used in Available Commands legend - ποΈ to π€― to denote memory-intensive commands that load the entire CSV into memory; πͺ to π£ to denote commands that need addl memory proportional to the cardinality of the columns being processed; π to denote commands that have web-aware options
- cargo update bump several indirect dependencies
- pin Rust nightly to 2021-05-21
Fixed
excel
: Handle ranges larger than the sheet by @bluepython508 in #984
Full Changelog: 0.103.1...0.104.0