Skip to content

0.118.0

Compare
Choose a tag to compare
@jqnatividad jqnatividad released this 27 Oct 13:24
· 3683 commits to master since this release
bd23d3f

Highlights:

  • With the Polars upgrade to 0.34.2, the sqlp and joinp enjoy expanded capabilities and a noticeable performance boost. πŸ¦„πŸ‡
  • We now publish the 500, 1000, 5000 and 15000 Geonames cities indices for the geocode command, with users able to easily switch indices with the index-load subcommand. As the name implies, the 500 index contains cities with populations of 500 or more, the 1000 index contains cities with populations of 1000 or more, and so on.
    The 15000 index (default) is the smallest (13mb) and fastest with ~26k cities. The 500 index is the largest(56mb) and slowest, with ~200k cities. The 5000 index is 21mb with ~53k cities. The 1000 index is 44mb with ~140k cities. 🎠
  • The geocode command now returns US Census FIPS codes for US places with the %json and %pretty-json formats, returning both US State and US County FIPS codes, with upcoming support for Cities and other US Census geographies (School Districts, Voting Districts, Congressional Districts, etc.) 🎠
  • Improved performance for stats, schema and tojsonl commands with the stats cache bincode refactor. This is especially noticeable for large CSV files as stats previously created large bincode cache files by default.
    The bincode cache allows other commands (currently, only schema and tojsonl) to skip recomputing statistics and deserialize the saved stats data structures directly into memory. Now, it will only create a bincode file if the --stats-binout option is specified (typically, before using the schema an tojsonl commands). stats will still continue to create a stats CSV cache file by default, but it will be much smaller than the bincode file, and is universally applicable, unlike the bincode cache. πŸ‡
  • self-update will now verify updates. This is done by verifying the zipsign signature of the release zip archive before applying it. This should make it harder for malicious actors to compromise the self-update process. Version 0.118.0 has the verification code, and future releases will use this new verification process.
    Regardless, we will zipsign all zip archives starting with this release.
    Users can manually verify the signatures by downloading the zipsign public key and running the zipsign command line tool. See Verifying the Integrity of the Prebuilt Binaries Zip Archive for more info. πŸ¦„
  • The frequency command now supports the --ignore-case option for case-insensitive frequency counts. πŸ¦„πŸŽ 
  • The schema command can now compile case-insensitive enum constraints. πŸ¦„
  • Improved performance for apply and applydp commands with faster compile-time perfect hash functions for operations lookups. πŸ‡
  • Several minor performance improvements and bug fixes with snappy, sniff & cat commands. πŸ‡

Added

  • frequency: added --ignore-case option #1386
  • geocode: added 500, 1000, 5000, 15000 Geonames cities convenience shortcuts to index subcommands bd9f4c3
  • schema: added --ignore-case option when compiling enum constraints; replaced Hashset with faster AHashset a16a1ca
  • snappy: added buf_size parm to compress helper fn e0c0d1f
  • sniff added --just-mime option #1372
  • added zipsign signature verification to self-update #1389

Changed

  • apply & applydp: replaced binary_search with faster compile-time perfect hash functions for operations lookups #1371
  • stats, schema and tojsonl: stats cache bincode refactor #1377
  • luau: replaced sanitise-file-name with more popular sanitize-filename crate 8927cb7
  • cat: minor optimization by preallocating with capacity c13c341
  • sqlp & joinp: expanded speed/functionality with upgrade to Polars 0.34.2 #1385
  • tojsonl: improved boolean inferencing. Now correctly infers booleans, even if the enum domain range is more than 2, but has cardinality 2 case-insensitive 6345f2d
  • build(deps): bump strum_macros from 0.25.2 to 0.25.3 by @dependabot in #1368
  • build(deps): bump regex from 1.10.1 to 1.10.2 by @dependabot in #1369
  • build(deps): bump uuid from 1.4.1 to 1.5.0 by @dependabot in #1373
  • build(deps): bump hashbrown from 0.14.1 to 0.14.2 by @dependabot in #1376
  • build(deps): bump self_update from 0.38.0 to 0.39.0 by @dependabot in #1378
  • build(deps): bump ahash from 0.8.5 to 0.8.6 by @dependabot in #1383
  • build(deps): bump serde from 1.0.189 to 1.0.190 by @dependabot in #1388
  • build(deps): bump futures from 0.3.28 to 0.3.29 by @dependabot in #1390
  • build(deps): bump futures-util from 0.3.28 to 0.3.29 by @dependabot in #1391
  • build(deps): bump tempfile from 3.8.0 to 3.8.1 by @dependabot in 4f6200c
  • apply select clippy suggestions
  • update several indirect dependencies
  • pin Rust nightly to 2023-10-26

Fixed

  • dedup: fixed --ignore-case not being honored during internal sort option #1387
  • applydp: fixed wrong usage text using apply and not applydp c47ba86
  • geocode: fixed index-update not honoring --timeout parameter 3272a9e
  • geocode : fixed index-load to work properly with convenience shortcuts 5097326

Full Changelog: 0.117.0...0.118.0