Skip to content

Commit

Permalink
readme: wordsmith MSRV policy and Goals/Non-Goals
Browse files Browse the repository at this point in the history
MSRV
- qualify that we use the latest Rust _stable_ version supported by HomeBrew

Goals/Non-goals
- add Avro to list
- wordsmith reason for not using crypto-secure RNGs in certain commands
- change CSVs to Input Files for multi-lingual non-goal, as qsv does more than CSVs

[skip ci]
  • Loading branch information
jqnatividad committed Jan 5, 2024
1 parent 095b814 commit c35bbc5
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,7 @@ See [Features](docs/FEATURES.md) for more info.

## Minimum Supported Rust Version

qsv's MSRV policy is to require the latest [Rust version](https://github.com/rust-lang/rust/blob/master/RELEASES.md) that is [supported by Homebrew](https://formulae.brew.sh/formula/rust#default), currently [![HomeBrew](https://img.shields.io/homebrew/v/rust?logo=homebrew)](https://formulae.brew.sh/formula/rust).
qsv's MSRV policy is to require the latest stable [Rust version](https://github.com/rust-lang/rust/blob/master/RELEASES.md) that is [supported by Homebrew](https://formulae.brew.sh/formula/rust#default), currently [![HomeBrew](https://img.shields.io/homebrew/v/rust?logo=homebrew)](https://formulae.brew.sh/formula/rust).
However, if the latest Rust stable has been released for more than a week and Homebrew has not yet updated its Rust formula, qsv will go ahead and require the latest Rust stable version.

## Goals / Non-Goals
Expand All @@ -347,22 +347,22 @@ QuickSilver's goals, in priority order, are to be:
* **Able to Process Very Large Files** - Most qsv commands are streaming, using constant memory, and can process arbitrarily large CSV files. For those commands that require loading the entire CSV into memory (denoted by 🤯), qsv has Out-of-Memory prevention, batch processing strategies and "ext"ernal commands that use the disk to process larger than memory files. See [Memory Management](docs/PERFORMANCE.md#memory-management) for more info.
* **A Complete Data-Wrangling Toolkit** - qsv aims to be a comprehensive data-wrangling toolkit that you can use for quick analysis and investigations, but is also robust enough for production data pipelines. Its many commands are targeted towards common data-wrangling tasks and can be combined/composed into complex data-wrangling scripts with its Luau-based DSL.
Luau will also serve as the backbone of a whole library of **qsv recipes** - reusable scripts for common tasks (e.g. street-level geocoding, removing PII, data enrichment, etc.) that prompt for easily modifiable parameters.
* **Composable/Interoperable** - qsv is designed to be composable, with a focus on interoperability with other common CLI tools like 'awk', 'xargs', 'ripgrep', 'sed', etc., and with well known ETL/ELT tools like Airbyte, Airflow, Pentaho Kettle, etc. Its commands can be combined with other tools via pipes, and it supports other common file formats like JSONL, Parquet, Arrow IPC, Excel, ODS, PostgreSQL, SQLite, etc. See [File Formats](#file-formats) for more info.
* **Composable/Interoperable** - qsv is designed to be composable, with a focus on interoperability with other common CLI tools like 'awk', 'xargs', 'ripgrep', 'sed', etc., and with well known ETL/ELT tools like Airbyte, Airflow, Pentaho Kettle, etc. Its commands can be combined with other tools via pipes, and it supports other common file formats like JSON/JSONL, Parquet, Arrow IPC, Avro, Excel, ODS, PostgreSQL, SQLite, etc. See [File Formats](#file-formats) for more info.
* **As Portable as Possible** - qsv is designed to be portable, with installers on several platforms with an integrated self-update mechanism. In preference order, it supports Linux, macOS and Windows. See [Installation Options](#installation-options) for more info.
* **As Easy to Use as Possible** - qsv is designed to be easy to use. As easy-to-use that is,
as command line interfaces go :shrug:. Its commands have numerous options but have sensible defaults if a user does not want to use options. The usage text is written for a data analyst audience, not developers; and there are numerous examples in the usage text, with the tests doubling as examples as well. In the future, it will also have a Terminal User Interface (TUI).
* **As Secure as Possible** - qsv is designed to be secure. It has no external runtime dependencies, is [written](https://aws.amazon.com/blogs/opensource/why-aws-loves-rust-and-how-wed-like-to-help/) [in](https://msrc.microsoft.com/blog/2019/07/why-rust-for-safe-systems-programming/) [Rust](https://opensource.googleblog.com/2023/06/rust-fact-vs-fiction-5-insights-from-googles-rust-journey-2022.html), and it's codebase is automatically audited for security vulnerabilities with automated [DevSkim](https://github.com/microsoft/DevSkim#devskim), ["cargo audit"](https://rustsec.org) and [Codacy](https://app.codacy.com/gh/jqnatividad/qsv/dashboard) Github Actions workflows.
It uses the latest stable Rust version, with an aggressive MSRV policy and the latest version of all its dependencies.
It has an extensive test suite with more than 1,250 tests, including several [property tests](https://medium.com/criteo-engineering/introduction-to-property-based-testing-f5236229d237) which [randomly generate](https://github.com/BurntSushi/quickcheck#quickcheck) parameters for oft-used commands. It also has a [Security Policy](SECURITY.md).
Its prebuilt binary archives are [zipsigned](https://github.com/Kijewski/zipsign#zipsign), so you can [verify their integrity](#verifying-the-integrity-of-the-prebuilt-binaries-zip-archives). Its self-update mechanism automatically verifies the integrity of the prebuilt binaries archive before applying an update.
However, it does not use cryptographically secure random number generators as the performance penalty is too high and qsv's `sort` & `sample` use cases do not require it.
(search for the codebase for _"[//DevSkim: ignore DS148264](https://github.com/search?q=repo%3Ajqnatividad%2Fqsv+%2F%2Fdevskim&type=code)"_ to find instances where qsv uses a non-cryptographically secure random number generator)
However, for performance reasons, it does not use cryptographically secure random number generators (RNGs) for the `sort` & `sample` commands, as their use cases do not require it.
(search for the codebase for _"[//DevSkim: ignore DS148264](https://github.com/search?q=repo%3Ajqnatividad%2Fqsv+%2F%2Fdevskim&type=code)"_ to find instances where qsv uses a non-cryptographically secure RNG)
* **As Easy to Contribute to as Possible** - qsv is designed to be easy to contribute to, with a focus on maintainability. It's architecture allows the easy addition of self-contained commands gated by feature flags, the source code is heavily commented, the usage text is embedded, and there are helper functions that make it easy to create tests. See [Features](docs/FEATURES.md) and [Contributing](CONTRIBUTING.md) for more info.

QuickSilver's non-goals are to be:
* **As Small as Possible** - qsv is designed to be small, but not at the expense of performance, features, composability, portability, usability, security or maintainability. However, we do have a `qsvlite` variant that is ~13% of the size of `qsv` and a `qsvdp` variant that is ~12% of the size of `qsv`. Those variants, however, have reduced functionality.
Further, several commands are gated behind feature flags, so you can compile qsv with only the features you need.
* **Multi-lingual** - qsv's _usage text_ and _messages_ are English-only. There are no plans to support other languages. This does not mean it can only process English CSVs.
* **Multi-lingual** - qsv's _usage text_ and _messages_ are English-only. There are no plans to support other languages. This does not mean it can only process English input files.
It can process well-formed CSVs in _any_ language so long as its UTF-8 encoded. Further, it supports alternate delimiters/separators other than comma; the `apply whatlang` operation detects 69 languages; and its `apply thousands, currency and eudex` operations supports different languages and country conventions for number, currency and date parsing/formatting.
Finally, though the default Geonames index of the `geocode` command is English-only, the index can be rebuilt with the `geocode index-update` subcommand with the `--languages` option to return place names in multiple languages ([with support for 253 languages](http://download.geonames.org/export/dump/alternatenames/)).

Expand Down

0 comments on commit c35bbc5

Please sign in to comment.