Skip to content

Commit

Permalink
Merge pull request #211 from CDRH/ruby3upgrade
Browse files Browse the repository at this point in the history
Ruby, elasticsearch upgrades, and documentation
  • Loading branch information
techgique authored Mar 23, 2023
2 parents 8a99138 + 5b0f54f commit f286ab4
Show file tree
Hide file tree
Showing 14 changed files with 214 additions and 15 deletions.
2 changes: 1 addition & 1 deletion .ruby-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.7.1
3.1.2
28 changes: 27 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,21 +25,47 @@ Versioning](https://semver.org/spec/v2.0.0.html).
### Security
-->

## [Unreleased](https://github.com/CDRH/datura/compare/v0.2.0-beta...dev)
## [1.0.0](https://github.com/CDRH/datura/compare/v0.2.0-beta...dev)

### Added
- minor test for Datura::Helpers.date_standardize
- documentation for web scraping
- documentation for CsvToEs (transforming CSV files and posting to elasticsearch)
- documentation for adding new ingest formats to Datura
- byebug gem for debugging
- instructions for installing Javascript Runtime files for Saxon
- API schema can either be 1.0 or 2.0 (which includes nested fields); 1.0 will be run by default unless 2.0 is specified. Add the following to `public.yml` or `private.yml` in the data repo:
```
api_version: '2.0'
```
See new schema (2.0) documentation [here](https://github.com/CDRH/datura/docs/schema_v2.md)
- schema validation with API version 2.0, invalidly constructed documents will not post
- authentication with Elasticesarch 8.5; add the following to `public.yml` or `private.yml` in the data repo:
```
es_user: username
es_password: ********
```
- field overrides for new fields in the new API schema
- functionality to transform EAD files and post them to elasticsearch

### Changed
- update ruby to 3.1.2
- date_standardize now relies on strftime instead of manual zero padding for month, day
- minor corrections to documentation
- XPath: "text" is now ingested as an array and will be displayed delimitted by spaces
- refactored command line methods into elasticsearch library
- refactored and moved date_standardize and date_display helper methods
- Nokogiri methods `get_text` and `get_list` on TEI now return nil rather than empty strings or arrays if there are no matches

### Migration
- check to make sure "text" xpath is doing desired behavior
- use Elasticsearch 8.5 or higher and add authentication as described above if security is enabled. See [dev docs instructions](https://github.com/CDRH/cdrh_dev_docs/blob/update_elasticsearch_documentation/publishing/2_basic_requirements.md#downloading-elasticsearch).
- upgrade data repos to Ruby 3.1.2
- add api version to config as described above
- make sure fields are consistent with the api schema, many have been renamed or changed in format
- add nil checks with get_text and get_list methods
- add EadToES overrides if ingesting EAD files
- if overriding the `read_csv` method in `lib/datura/file_type.rb`, the hash must be prefixed with ** (`**{}`).

## [v0.2.0-beta](https://github.com/CDRH/datura/compare/v0.1.6...v0.2.0-beta) - 2020-08-17 - Altering field and xpath behavior, adds get_elements

Expand Down
9 changes: 7 additions & 2 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,13 @@ GEM
mime-types (3.4.1)
mime-types-data (~> 3.2015)
mime-types-data (3.2022.0105)
mini_portile2 (2.8.0)
minitest (5.16.3)
netrc (0.11.0)
nokogiri (1.13.8-x86_64-darwin)
nokogiri (1.13.9)
mini_portile2 (~> 2.8.0)
racc (~> 1.4)
nokogiri (1.13.9-x86_64-darwin)
racc (~> 1.4)
racc (1.6.0)
rake (13.0.6)
Expand All @@ -35,6 +39,7 @@ GEM
unf_ext (0.0.8.2)

PLATFORMS
ruby
x86_64-darwin-20

DEPENDENCIES
Expand All @@ -45,4 +50,4 @@ DEPENDENCIES
rake (~> 13.0)

BUNDLED WITH
2.2.26
2.2.33
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Looking for information about how to post documents? Check out the

## Install / Set Up Data Repo

Check that Ruby is installed, preferably 2.7.x or up. If you are using RVM, see the RVM section below.
Check that Ruby is installed, preferably 3.1.2 or up. If you are using RVM, see the RVM section below.

If your project already has a Gemfile, add the `gem "datura"` line. If not, create a new directory and add a file named `Gemfile` (no extension).

Expand Down
2 changes: 1 addition & 1 deletion datura.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Gem::Specification.new do |spec|
]
spec.require_paths = ["lib"]

spec.required_ruby_version = "~> 2.5"
spec.required_ruby_version = "~> 3.1"
spec.add_runtime_dependency "colorize", "~> 0.8.1"
spec.add_runtime_dependency "nokogiri", "~> 1.10"
spec.add_runtime_dependency "rest-client", "~> 2.1"
Expand Down
5 changes: 5 additions & 0 deletions docs/1_setup/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,10 @@ default:
collection:
es_index
es_path
es_user
es_password
```
(The options es_user and es_password are needed if you are using a secured Elasticsearch index.)

If there are any settings which must be different based on the local environment (your computer vs the server), place these in `config/private.yml`.

Expand Down Expand Up @@ -118,6 +121,8 @@ Some stuff commonly in `private.yml`:
- `threads: 5` (5 recommended for PC, 50 for powerful servers)
- `es_path: http://localhost:9200`
- `es_index: some_index`
- `es_user: elastic` (if you want to use security on your local elasticsearch instance)
- `es_password: ******`
- `solr_path: http://localhost:8983/solr`
- `solr_core: collection_name`

Expand Down
2 changes: 1 addition & 1 deletion docs/1_setup/prepare_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ You will need to make sure that somewhere, the following are being set in your p

### Step 2: Prepare Elasticsearch Index

Make sure elasticsearch is installed and running in the location you wish to push to. If there is already an index you will be using, take note of its name and skip this step. If you want to add an index, run this command with a specified environment:
Make sure elasticsearch is installed and running in the location you wish to push to. If there is already an index you will be using, take note of its name and skip this step. (Note that each index must be dedicated to data on one version of the API schema) If you want to add an index, run this command with a specified environment:

```
admin_es_create_index -e development
Expand Down
2 changes: 1 addition & 1 deletion docs/4_developers/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ TODO

### Elasticsearch

TODO
See installation instructions [here](https://github.com/CDRH/cdrh_dev_docs/blob/update_elasticsearch_documentation/publishing/2_basic_requirements.md#downloading-elasticsearch).

### Apache Permissions

Expand Down
Loading

0 comments on commit f286ab4

Please sign in to comment.