Skip to content

Commit

Permalink
Merge pull request #1857 from jqnatividad/stats-add_sem_cv
Browse files Browse the repository at this point in the history
`stats`: add Standard Error of the Mean (SEM) & Coefficient of Variation (CV)
  • Loading branch information
jqnatividad authored Jun 2, 2024
2 parents 740ef96 + dc7ad31 commit 24c060e
Show file tree
Hide file tree
Showing 8 changed files with 187 additions and 168 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@
| [sortcheck](/src/cmd/sortcheck.rs#L2)<br>📇 | Check if a CSV is sorted. With the --json options, also retrieve record count, sort breaks & duplicate count. |
| [split](/src/cmd/split.rs#L2)<br>📇🏎️ | Split one CSV file into many CSV files. It can split by number of rows, number of chunks or file size. Uses multithreading to go faster if an index is present when splitting by rows or chunks. |
| [sqlp](/src/cmd/sqlp.rs#L2)<br>✨🚀🐻‍❄️🗄️ | Run [Polars](https://pola.rs) SQL queries against several CSVs - converting queries to blazing-fast [LazyFrame](https://docs.pola.rs/user-guide/lazy/using/) expressions, processing larger than memory CSV files. |
| [stats](/src/cmd/stats.rs#L2)<br>📇🤯🏎️ | Compute [summary statistics](https://en.wikipedia.org/wiki/Summary_statistics) (sum, min/max/range, min/max length, mean, stddev, variance, nullcount, max precision, sparsity, quartiles, IQR, lower/upper fences, skewness, median, mode/s, antimode/s & cardinality) & make GUARANTEED data type inferences (Null, String, Float, Integer, Date, DateTime, Boolean) for each column in a CSV.<br>Uses multithreading to go faster if an index is present (with an index, can compile "streaming" stats on NYC's 311 data (15gb, 28m rows) in less than 7.3 seconds!). |
| [stats](/src/cmd/stats.rs#L2)<br>📇🤯🏎️ | Compute [summary statistics](https://en.wikipedia.org/wiki/Summary_statistics) (sum, min/max/range, min/max length, mean, SEM, stddev, variance, CV, nullcount, max precision, sparsity, quartiles, IQR, lower/upper fences, skewness, median, mode/s, antimode/s & cardinality) & make GUARANTEED data type inferences (Null, String, Float, Integer, Date, DateTime, Boolean) for each column in a CSV.<br>Uses multithreading to go faster if an index is present (with an index, can compile "streaming" stats on NYC's 311 data (15gb, 28m rows) in less than 7.3 seconds!). |
| [table](/src/cmd/table.rs#L2)<br>🤯 | Show aligned output of a CSV using [elastic tabstops](https://github.com/BurntSushi/tabwriter). To interactively view CSV files, qsv pairs well with [csvlens](https://github.com/YS-L/csvlens#csvlens). |
| [to](/src/cmd/to.rs#L2)<br>✨🚀🗄️ | Convert CSV files to [PostgreSQL](https://www.postgresql.org), [SQLite](https://www.sqlite.org/index.html), XLSX, [Parquet](https://parquet.apache.org) and [Data Package](https://datahub.io/docs/data-packages/tabular). |
| [tojsonl](/src/cmd/tojsonl.rs#L3)<br>📇😣🚀🔣 | Smartly converts CSV to a newline-delimited JSON ([JSONL](https://jsonlines.org/)/[NDJSON](http://ndjson.org/)). By scanning the CSV first, it "smartly" infers the appropriate JSON data type for each column. See `jsonl` command to convert JSONL to CSV. |
Expand Down
62 changes: 31 additions & 31 deletions resources/test/boston311-10-boolean-1or0-stats.csv
Original file line number Diff line number Diff line change
@@ -1,31 +1,31 @@
field,type,is_ascii,sum,min,max,range,min_length,max_length,mean,stddev,variance,nullcount,max_precision,sparsity,cardinality
case_enquiry_id,Integer,,1010041354742,101004113298,101004155594,42296,12,12,101004135474.2,14747.2697,217481962.3498,0,,0,10
open_dt,String,true,,2022-01-01 00:16:00,2022-01-31 11:46:00,,19,19,,,,0,,0,10
target_dt,String,true,,2022-01-11 08:30:00,2022-05-20 13:03:21,,0,19,,,,4,,0.4,6
closed_dt,String,true,,2022-01-09 06:43:06,2022-01-20 08:45:12,,0,19,,,,5,,0.5,6
ontime,String,true,,ONTIME,OVERDUE,,6,7,,,,0,,0,2
case_status,String,true,,Closed,Open,,4,6,,,,0,,0,2
case_status_boolean,Boolean,,5,0,1,1,1,1,0.5,0.5,0.25,0,,0,2
closure_reason,String,true,, ,Case Closed. Closed date : Wed Jan 19 11:42:16 EST 2022 Resolved Removed df ,,1,82,,,,0,,0,6
case_title,String,true,,BTDT: Complaint,Sidewalk Cover / Manhole,,13,57,,,,0,,0,8
subject,String,true,,Boston Police Department,Public Works Department,,21,31,,,,0,,0,5
reason,String,true,,Administrative & General Requests,Street Cleaning,,7,33,,,,0,,0,7
type,String,true,,CE Collection,Unsatisfactory Utilities - Electrical Plumbing,,13,47,,,,0,,0,8
queue,String,true,,BTDT_Parking Enforcement,PWDx_Snow Cases,,15,46,,,,0,,0,7
department,String,true,,BTDT,PWDx,,3,4,,,,0,,0,5
submittedphoto,NULL,,,,,,0,0,,,,10,,1,1
closedphoto,NULL,,,,,,0,0,,,,10,,1,1
location,String,true,, ,850 South St Roslindale MA 02131,,1,40,,,,0,,0,10
fire_district,String,true,, ,9,,1,1,,,,0,,0,4
pwd_district,String,true,, ,1C,,1,2,,,,0,,0,6
city_council_district,String,true,, ,8,,1,1,,,,0,,0,6
police_district,String,true,, ,E5,,1,3,,,,0,,0,6
neighborhood,String,true,, ,South End,,1,13,,,,0,,0,8
neighborhood_services_district,String,true,, ,6,,1,2,,,,0,,0,7
ward,String,true,, ,Ward 9,,1,7,,,,0,,0,8
precinct,String,true,, ,2004,,1,4,,,,0,,0,9
location_street_name,String,true,,12 Derne St,850 South St,,0,20,,,,1,,0.1,10
location_zipcode,String,true,,02113,02131,,0,5,,,,1,,0.1,8
latitude,Float,,423.4656,42.2884,42.3735,0.0851,7,7,42.3466,0.0252,0.0006,0,4,0,9
longitude,Float,,-710.782,-71.133,-71.0566,0.0764,6,8,-71.0782,0.0246,0.0006,0,4,0,10
source,String,true,,City Worker App,Constituent Call,,15,16,,,,0,,0,2
field,type,is_ascii,sum,min,max,range,min_length,max_length,mean,sem,stddev,variance,cv,nullcount,max_precision,sparsity,cardinality
case_enquiry_id,Integer,,1010041354742,101004113298,101004155594,42296,12,12,101004135474.2,4663.4961,14747.2697,217481962.3498,0,0,,0,10
open_dt,String,true,,2022-01-01 00:16:00,2022-01-31 11:46:00,,19,19,,,,,,0,,0,10
target_dt,String,true,,2022-01-11 08:30:00,2022-05-20 13:03:21,,0,19,,,,,,4,,0.4,6
closed_dt,String,true,,2022-01-09 06:43:06,2022-01-20 08:45:12,,0,19,,,,,,5,,0.5,6
ontime,String,true,,ONTIME,OVERDUE,,6,7,,,,,,0,,0,2
case_status,String,true,,Closed,Open,,4,6,,,,,,0,,0,2
case_status_boolean,Boolean,,5,0,1,1,1,1,0.5,0.1581,0.5,0.25,100,0,,0,2
closure_reason,String,true,, ,Case Closed. Closed date : Wed Jan 19 11:42:16 EST 2022 Resolved Removed df ,,1,82,,,,,,0,,0,6
case_title,String,true,,BTDT: Complaint,Sidewalk Cover / Manhole,,13,57,,,,,,0,,0,8
subject,String,true,,Boston Police Department,Public Works Department,,21,31,,,,,,0,,0,5
reason,String,true,,Administrative & General Requests,Street Cleaning,,7,33,,,,,,0,,0,7
type,String,true,,CE Collection,Unsatisfactory Utilities - Electrical Plumbing,,13,47,,,,,,0,,0,8
queue,String,true,,BTDT_Parking Enforcement,PWDx_Snow Cases,,15,46,,,,,,0,,0,7
department,String,true,,BTDT,PWDx,,3,4,,,,,,0,,0,5
submittedphoto,NULL,,,,,,0,0,,,,,,10,,1,1
closedphoto,NULL,,,,,,0,0,,,,,,10,,1,1
location,String,true,, ,850 South St Roslindale MA 02131,,1,40,,,,,,0,,0,10
fire_district,String,true,, ,9,,1,1,,,,,,0,,0,4
pwd_district,String,true,, ,1C,,1,2,,,,,,0,,0,6
city_council_district,String,true,, ,8,,1,1,,,,,,0,,0,6
police_district,String,true,, ,E5,,1,3,,,,,,0,,0,6
neighborhood,String,true,, ,South End,,1,13,,,,,,0,,0,8
neighborhood_services_district,String,true,, ,6,,1,2,,,,,,0,,0,7
ward,String,true,, ,Ward 9,,1,7,,,,,,0,,0,8
precinct,String,true,, ,2004,,1,4,,,,,,0,,0,9
location_street_name,String,true,,12 Derne St,850 South St,,0,20,,,,,,1,,0.1,10
location_zipcode,String,true,,02113,02131,,0,5,,,,,,1,,0.1,8
latitude,Float,,423.4656,42.2884,42.3735,0.0851,7,7,42.3466,0.008,0.0252,0.0006,0.0595,0,4,0,9
longitude,Float,,-710.782,-71.133,-71.0566,0.0764,6,8,-71.0782,0.0078,0.0246,0.0006,-0.0346,0,4,0,10
source,String,true,,City Worker App,Constituent Call,,15,16,,,,,,0,,0,2
62 changes: 31 additions & 31 deletions resources/test/boston311-10-boolean-tf-stats.csv
Original file line number Diff line number Diff line change
@@ -1,31 +1,31 @@
field,type,is_ascii,sum,min,max,range,min_length,max_length,mean,stddev,variance,nullcount,max_precision,sparsity,cardinality
case_enquiry_id,Integer,,1010041354742,101004113298,101004155594,42296,12,12,101004135474.2,14747.2697,217481962.3498,0,,0,10
open_dt,String,true,,2022-01-01 00:16:00,2022-01-31 11:46:00,,19,19,,,,0,,0,10
target_dt,String,true,,2022-01-11 08:30:00,2022-05-20 13:03:21,,0,19,,,,4,,0.4,6
closed_dt,String,true,,2022-01-09 06:43:06,2022-01-20 08:45:12,,0,19,,,,5,,0.5,6
ontime,String,true,,ONTIME,OVERDUE,,6,7,,,,0,,0,2
case_status,String,true,,Closed,Open,,4,6,,,,0,,0,2
case_status_boolean,Boolean,true,,False,True,,4,5,,,,0,,0,2
closure_reason,String,true,, ,Case Closed. Closed date : Wed Jan 19 11:42:16 EST 2022 Resolved Removed df ,,1,82,,,,0,,0,6
case_title,String,true,,BTDT: Complaint,Sidewalk Cover / Manhole,,13,57,,,,0,,0,8
subject,String,true,,Boston Police Department,Public Works Department,,21,31,,,,0,,0,5
reason,String,true,,Administrative & General Requests,Street Cleaning,,7,33,,,,0,,0,7
type,String,true,,CE Collection,Unsatisfactory Utilities - Electrical Plumbing,,13,47,,,,0,,0,8
queue,String,true,,BTDT_Parking Enforcement,PWDx_Snow Cases,,15,46,,,,0,,0,7
department,String,true,,BTDT,PWDx,,3,4,,,,0,,0,5
submittedphoto,NULL,,,,,,0,0,,,,10,,1,1
closedphoto,NULL,,,,,,0,0,,,,10,,1,1
location,String,true,, ,850 South St Roslindale MA 02131,,1,40,,,,0,,0,10
fire_district,String,true,, ,9,,1,1,,,,0,,0,4
pwd_district,String,true,, ,1C,,1,2,,,,0,,0,6
city_council_district,String,true,, ,8,,1,1,,,,0,,0,6
police_district,String,true,, ,E5,,1,3,,,,0,,0,6
neighborhood,String,true,, ,South End,,1,13,,,,0,,0,8
neighborhood_services_district,String,true,, ,6,,1,2,,,,0,,0,7
ward,String,true,, ,Ward 9,,1,7,,,,0,,0,8
precinct,String,true,, ,2004,,1,4,,,,0,,0,9
location_street_name,String,true,,12 Derne St,850 South St,,0,20,,,,1,,0.1,10
location_zipcode,String,true,,02113,02131,,0,5,,,,1,,0.1,8
latitude,Float,,423.4656,42.2884,42.3735,0.0851,7,7,42.3466,0.0252,0.0006,0,4,0,9
longitude,Float,,-710.782,-71.133,-71.0566,0.0764,6,8,-71.0782,0.0246,0.0006,0,4,0,10
source,String,true,,City Worker App,Constituent Call,,15,16,,,,0,,0,2
field,type,is_ascii,sum,min,max,range,min_length,max_length,mean,sem,stddev,variance,cv,nullcount,max_precision,sparsity,cardinality
case_enquiry_id,Integer,,1010041354742,101004113298,101004155594,42296,12,12,101004135474.2,4663.4961,14747.2697,217481962.3498,0,0,,0,10
open_dt,String,true,,2022-01-01 00:16:00,2022-01-31 11:46:00,,19,19,,,,,,0,,0,10
target_dt,String,true,,2022-01-11 08:30:00,2022-05-20 13:03:21,,0,19,,,,,,4,,0.4,6
closed_dt,String,true,,2022-01-09 06:43:06,2022-01-20 08:45:12,,0,19,,,,,,5,,0.5,6
ontime,String,true,,ONTIME,OVERDUE,,6,7,,,,,,0,,0,2
case_status,String,true,,Closed,Open,,4,6,,,,,,0,,0,2
case_status_boolean,Boolean,true,,False,True,,4,5,,,,,,0,,0,2
closure_reason,String,true,, ,Case Closed. Closed date : Wed Jan 19 11:42:16 EST 2022 Resolved Removed df ,,1,82,,,,,,0,,0,6
case_title,String,true,,BTDT: Complaint,Sidewalk Cover / Manhole,,13,57,,,,,,0,,0,8
subject,String,true,,Boston Police Department,Public Works Department,,21,31,,,,,,0,,0,5
reason,String,true,,Administrative & General Requests,Street Cleaning,,7,33,,,,,,0,,0,7
type,String,true,,CE Collection,Unsatisfactory Utilities - Electrical Plumbing,,13,47,,,,,,0,,0,8
queue,String,true,,BTDT_Parking Enforcement,PWDx_Snow Cases,,15,46,,,,,,0,,0,7
department,String,true,,BTDT,PWDx,,3,4,,,,,,0,,0,5
submittedphoto,NULL,,,,,,0,0,,,,,,10,,1,1
closedphoto,NULL,,,,,,0,0,,,,,,10,,1,1
location,String,true,, ,850 South St Roslindale MA 02131,,1,40,,,,,,0,,0,10
fire_district,String,true,, ,9,,1,1,,,,,,0,,0,4
pwd_district,String,true,, ,1C,,1,2,,,,,,0,,0,6
city_council_district,String,true,, ,8,,1,1,,,,,,0,,0,6
police_district,String,true,, ,E5,,1,3,,,,,,0,,0,6
neighborhood,String,true,, ,South End,,1,13,,,,,,0,,0,8
neighborhood_services_district,String,true,, ,6,,1,2,,,,,,0,,0,7
ward,String,true,, ,Ward 9,,1,7,,,,,,0,,0,8
precinct,String,true,, ,2004,,1,4,,,,,,0,,0,9
location_street_name,String,true,,12 Derne St,850 South St,,0,20,,,,,,1,,0.1,10
location_zipcode,String,true,,02113,02131,,0,5,,,,,,1,,0.1,8
latitude,Float,,423.4656,42.2884,42.3735,0.0851,7,7,42.3466,0.008,0.0252,0.0006,0.0595,0,4,0,9
longitude,Float,,-710.782,-71.133,-71.0566,0.0764,6,8,-71.0782,0.0078,0.0246,0.0006,-0.0346,0,4,0,10
source,String,true,,City Worker App,Constituent Call,,15,16,,,,,,0,,0,2
Loading

0 comments on commit 24c060e

Please sign in to comment.