ChangeLog

Jump to bottom

Ben Murray edited this page Apr 30, 2021 · 3 revisions

v0.5.1 -> 0.5.2

BugFix: DataStore.get_spans returning None when passed Readers in legacy scripts. Functionality has been restored from v0.4

v0.5.0 -> 0.5.1

Feature: DataFrame.rename function added; allows renaming of one or more fields within a dataframe

v0.4 -> v0.5

Major changes to API
- Datasets & DataFrames introduced
- Rich API on Fields introduced
- Much functionality previously accessed through Session can now be accessed through Datasets, DataFrames and Fields
- See (Basic Examples and Intermediate Examples for more details
Import improvements
- You can now specify include and exclude lists for fields in a table during import
  - This allows you to improve import performance and dataset size by excluding or only including the fields that you are interested in

v0.3.2 -> v0.4

Separation of all covid-specific functionality out to https://github.com/KCL-BMEIS/ExeTeraCovid.git
Removal of legacy csv pipeline code
Renaming of some of the ordered_merge_* functionality parameters for clarity
Addition of open/close/list/get_dataset functionality to Session
Made Session 'withable'
Improved performance of Session.get_spans
Bug fixes for Session API
- apply_spans / aggregation issues
Bug fixes for Field API
- provided __bool__ so that if field: works as expected
- provided single element read for IndexedStringField

v0.3.1 -> v0.3.2

Fixing issues with use of test_type_from_mechanism_v1
Adding ability to optionally import lsoa-based fields through add_imd script
Import now appends by default; to overwrite an existing dataset use -w \ --overwrite
Moved schema files to resources
Adding separate lsoa schema for import

v0.3.0 -> v0.3.1

Major performance improvement to Session.get_spans

v0.2.7 -> v0.3.0

Renaming of hystore to ExeTera, the project's new name!
Renaming of the hystorex command to exetera
Removal of scripts that now belong in https://github.com/KCL-BMEIS/ExeTeraCovid.git
Addition of snapshot journaling and extremely large sort functionality
Removal of the legacy csv script functionality

v0.2.7 -> v0.2.7.3

Fix to covid_schema.json for numeric diet fields marked 'float' instead of 'float32'
Addition of --daily flag to enable / disable generation of daily assessments
Addition of

v0.2.6 -> v0.2.7

Addition of diet questionnaire schema
Reworking of arguments for hystorex import to support arbitrary numbers and names of csvs
Provision of highly-scalable merge functionality through ordered merge functions
- Fix for filtering of indexed string fields

v0.2.5 -> v0.2.6

Moving from DataSet to Session class offering cleaner syntax
Moving from Readers/Writers to Fields for cleaner syntax
Introduction of schema for import command
Consolidating commands
- h5import -> hystorex import
- h5process -> hystorex process

v0.2.3 -> v0.2.5

Please note: there was no version v0.2.4; due to a numbering error when updating the version number
Simplifications to the API

v0.2.2 -> v0.2.3

Data schema updated for 1.5.1

v0.2.1 -> v0.2.2

Fix: Split functionality had not been moved to bin/csvsplit as documented
Fix: Missing license headers added

v0.2.0 -> v0.2.1 - tag

Refactor: Created the DataStore class and moved processor api methods onto it as member functions
Refactor: Simplified the creation of Writers. This can now be done through get_writer on a DataStore instance
Fix: Writes to a hdf5 store can no longer be interrupted by interrupts, resulting in more stable hdf5 files
Fix: Fixed critical bug in process method that resulted in exceptions when running on fields with a length that isn't an exact multiple of the chunksize

v0.1.9 -> v0.2.0

Added hdf5 import and process functionality

v0.1.8 -> v0.1.9

Feature: provision of the split.py script to split the dataset up into subsets of patients and their associated assessments
Fix: added treatments and other_symptoms to cleaned assessment file. These fields are concatenated during the merge step using using csv-style delimiters and escapes

v0.1.7 -> v0.1.8

Fix: had_covid_test was not being patched up along with tested_covid_positive'
Breaking change: output fields renamed
- Fixed up had_covid_test is output as had_covid_test_clean
- Fixed up tested_covid_positive is output as tested_covid_positive_clean
- had_covid_test and tested_covid_positive contain the un-fixed-up data (although rows may still be modified as a result of quantising assessments by day)

v0.1.6 -> v0.1.7

Fix: height_clean contains weight data and weight_clean contains height data. This has been the case since they were introduced in v0.1.5

v0.1.5 -> v0.1.6

Performance: reduced memory usage
Addition: provision of -ps flag for setting parsing schema

v0.1.4 -> v0.1.5

Fix: health_status was not being accumulated during the assessment compression phase of cleanup

v0.1.3 -> v0.1.4

Fix: added missing value rarely_left_the_house_but_visit_lots to level_of_isolation
Fix: added missing fields weight_clean, height_clean and bmi_clean

v0.1.2 -> v0.1.3

Fix: -po and -ao options now properly export patient and assessment csvs respectively

v0.1.1 -> v0.1.2

Fix: day no longer overwriting tested_covid_positive on assessment export
Fix: tested_covid_positive output as a label instead of a number

v0.1 -> v0.1.1

Change: Converted 'NA' to '' for csv export