Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby, elasticsearch upgrades, and documentation #211

Merged
merged 13 commits into from
Mar 23, 2023
27 changes: 26 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,21 +25,46 @@ Versioning](https://semver.org/spec/v2.0.0.html).
### Security
-->

## [Unreleased](https://github.com/CDRH/datura/compare/v0.2.0-beta...dev)
## [1.0.0](https://github.com/CDRH/datura/compare/v0.2.0-beta...dev)

### Added
- minor test for Datura::Helpers.date_standardize
- documentation for web scraping
- documentation for CsvToEs (transforming CSV files and posting to elasticsearch)
- documentation for adding new ingest formats to Datura
- byebug gem for debugging
- instructions for installing Javascript Runtime files for Saxon
- API schema can either be 1.0 or 2.0 (which includes nested fields); 1.0 will be run by default unless 2.0 is specified. Add the following to `public.yml` or `private.yml` in the data repo:
```
api_version: '2.0'
```
- schema validation with API version 2.0, invalidly constructed documents will not post
- authentication with Elasticesarch 8.5; add the following to `public.yml` or `private.yml` in the data repo:
```
es_user: username
es_password: ********
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instructions like these to add / edit config files should be in the Migration section. Otherwise the Migration section looks very thorough 👍 I'll have to edit this file for merge conflicts after I updated the changelog file to fix up the last release though. I'll take care of it during review of the release/v1.0.0 branch which will have the merge conflict.

- field overrides for new fields in the new API schema
- Functionality to transform EAD files and post them to elasticsearch

### Changed
- update ruby to 3.1.2
- date_standardize now relies on strftime instead of manual zero padding for month, day
- minor corrections to documentation
- XPath: "text" is now ingested as an array and will be displayed delimitted by spaces
- refactored command line methods into elasticsearch library
- refactored and moved date_standardize and date_display helper methods
- Nokogiri methods `get_text` and `get_list` on TEI now return nil rather than empty strings or arrays if there are no matches

### Migration
- check to make sure "text" xpath is doing desired behavior
- use Elasticsearch 8.5 or higher and add authentication as described above if security is enabled
- upgrade data repos to Ruby 3.1.2
- add api version to config as described above
- make sure fields are consistent with the api schema, many have been renamed or changed in format
- add nil checks with get_text and get_list methods
- add EadToES overrides if ingesting EAD files
- if overriding the `read_csv` method in `lib/datura/file_type.rb`, the hash must be prefixed with ** (`**{}`).

## [v0.2.0-beta](https://github.com/CDRH/datura/compare/v0.1.6...v0.2.0-beta) - 2020-08-17 - Altering field and xpath behavior, adds get_elements

Expand Down