Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter data by draft status #124

Merged
merged 5 commits into from
Apr 21, 2022
Merged

Filter data by draft status #124

merged 5 commits into from
Apr 21, 2022

Conversation

movermeyer
Copy link
Collaborator

@movermeyer movermeyer commented Apr 19, 2022

What are you trying to accomplish?

Fixes #73.

Throughout ruby-cldr, there were draft? calls in some places, but not others.

What approach did you choose and why?

I added a new --draft-status CLI option that allows users to specify the minimum draft status that they want all exported data to have.

Instead of each area of the codebase needing to know to check draft? all the time, all access to the data is done through a new DataFile class, which filters the data by draft status transparently. This ensures that we are not missing places where we need to be doing the filtering.

The default minimum draft status of contributed matches the status needed for inclusion into Unicode's ICU (and consequently what cldr-json exports).

What should reviewers focus on?

  • DraftStatus is effectively an enum. Is there a better way to define these in pure Ruby?
  • I use a new class variable to store the "global" minimum_draft_status. It was a lot nicer than passing a new minimum_draft_status parameter around everywhere. I feel mostly OK about it.

There is a similar issue with the alt attribute, but that will be handled as part of #125.

The impact of these changes

Users can now control what the minimum draft status to accept data from.

Less data exported as you increase your minimum draft status:

v34 exported data size (bytes) v37
Before this PR (main) 28903804 (27.5648 MiB) 30827560 (29.3995 MiB)
--draft-status=unconfirmed 29095943 (27.7481 MiB) 31018722 (29.5818 MiB)
--draft-status=provisional 28741819 (27.4103 MiB) 30671275 (29.2504 MiB)
--draft-status=contributed 28556418 (27.2335 MiB) 30539281 (29.1245 MiB)
--draft-status=approved 21065654 (20.0898 MiB) 22348295 (21.313 MiB)

(Computed using find . -type f -exec ls -l {} \; | awk '{sum += $5} END {print sum}', since MacOS' du command doesn't have a bytes flag)

Testing

thor cldr:download --version=37

Find an entry in the vendor directory that has draft=provisional, for example:

vendor/cldr/common/main/el.xml:

<currency type="ALK">
  <displayName count="one" draft="provisional">Παλαιό λεκ Αλβανίας</displayName>
  <displayName count="other" draft="provisional">Παλαιά λεκ Αλβανίας</displayName>
</currency>

Export with different minimum draft statuses and compare the results:

thor cldr:export --draft-status=provisional --target=provisional --components=Currencies --locales=el
thor cldr:export --draft-status=contributed --target=contributed --components=Currencies --locales=el
diff -r provisional contributed

And see that the diffs are the draft=provisional entries.

@movermeyer movermeyer force-pushed the movermeyer/draft_level branch 3 times, most recently from 8576039 to ac267d7 Compare April 20, 2022 14:39
@movermeyer movermeyer changed the title Filter data by draft level Filter data by draft status Apr 20, 2022
@movermeyer movermeyer force-pushed the movermeyer/draft_level branch from ac267d7 to c24c606 Compare April 20, 2022 14:56
Comment on lines -71 to -81
# Some parts (`ldml`, `ldmlBCP47` amd `supplementalData`) of CLDR data require that you merge all the
# files with the same root element before doing lookups.
# Ref: https://www.unicode.org/reports/tr35/tr35.html#XML_Format
#
# The return of this method is a merged XML Nokogiri document.
# Note that it technically is no longer compliant with the CLDR `ldml.dtd`, since:
# * it has repeated elements
# * the <identity> elements no longer refer to the filename
#
# However, this is not an issue, since #select will find all of the matches from each of the repeated elements,
# and the <identity> elements are not important to us / make no sense when combined together.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this comment has been moved to DataFile#merge

The method itself has been rewritten to use reduce, which is cleaner IMO.

@movermeyer movermeyer force-pushed the movermeyer/draft_level branch from c24c606 to 4dae577 Compare April 20, 2022 17:57
@movermeyer movermeyer marked this pull request as ready for review April 20, 2022 18:26
@movermeyer movermeyer requested review from trishrempel and removed request for trishrempel April 20, 2022 18:28
@movermeyer movermeyer force-pushed the movermeyer/draft_level branch 2 times, most recently from 28f9756 to 31812e5 Compare April 20, 2022 19:29
Copy link
Collaborator

@trishrempel trishrempel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

lib/cldr/export/data/plurals.rb Show resolved Hide resolved
def export(options = {}, &block)
locales = options[:locales] || Data.locales
components = options[:components] || Data.components
self.minimum_draft_status = options[:minimum_draft_status] if options[:minimum_draft_status]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this should set a default if not provided, since it will throw an exception if not set.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer for it to explode, since it indicates that something is not working as expected. 🤷

@@ -25,6 +25,10 @@ def self.test(name, &block)
end
end
end

def setup
Cldr::Export.minimum_draft_status = Cldr::DraftStatus::CONTRIBUTED
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to not set the class variable explicitly by default, so you can see when an exception is thrown in tests when it's being unexpectedly used before it's been set.

You could set it explicitly for tests that expect it to already be set, or mock it and explicitly assert that it was used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I want it to default for everything except for the tests that exercise the draft status filtering code. I've broken the tests in data_file_test.rb into two TestCases, and I override the defaulting in the one that is testing the draft status filtering. 👍

@movermeyer movermeyer force-pushed the movermeyer/draft_level branch from 31812e5 to 19f78a7 Compare April 21, 2022 12:12
Provides a view into the data that is filtered by draft level
It is already handled by `DataFile`
@movermeyer movermeyer force-pushed the movermeyer/draft_level branch from ba983a6 to 92defe0 Compare April 21, 2022 14:03
@movermeyer movermeyer merged commit 1dbf467 into main Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose "draft status" as a CLI option; clean up
2 participants