Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: add composition information #27

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

klown
Copy link
Contributor

@klown klown commented Dec 19, 2024

WIP

While this does capture and store the composition information for every Bliss symbol in the authorized vocabulary, it's still a little rough. However, the actual composition information is available in ./data/bliss_symbol_explanations.json in case someone wants access to it.

  1. This uses a webapp as the main tool to launch the program, but a script would probably be better. The webapp code is included in the PR. The webapp is located in ./apps/composition-creator/ and is launched by the npm run serveAppsDemos task.
  2. The code removes the composingIds arrays from bliss_symbol_explanations.json since they are often different from the newcomposition array.
  3. Some of the BCI AV IDs do not have any composition information defined for them -- there are three. The webapp lists them, and the information is available in BlissSymbolComposition.md with the ./docs folder. Is there a better place for this information?
  4. Also, there are slightly over 1000 BCI AVI IDs where the composition information is identical to the symbol's BCI AV ID. In that case, no composition array is stored with the symbol. These are also listed by the webapp and in BlissSymbolComposition.md

@klown
Copy link
Contributor Author

klown commented Dec 19, 2024

Hi @cindyli : This is almost ready for review, but see my comments in the main description above. I've left it as a draft PR. If you have any use for the composition information, however, it's available as described above.

@cindyli
Copy link
Contributor

cindyli commented Dec 20, 2024

@klown, I agree using a script to update ./data/bliss_symbol_explanations.json is a good idea.

Since this script is likely to be run only once to update the file, and the composition information is not expected to change frequently, manual updates might be enough for future changes. When the composition information for the three currently missing symbols becomes available, a manual update might be an easy route too. With that in mind, here are some thoughts:

  1. Change this webapp script to a node.js script and save it in the ./scripts folder.
  2. Write a documentation explaining the purpose of the script, its usage, and its one-time nature. It should only be used again if there is a big update to the composition information.
  3. Remove ./data/BciAvCompositionAnalysis.tsv from the repository. Instead, include instructions in the documentation on how to generate this intermediate file from its source spreadsheet, for example, saving the spreadsheet as a .tsv file.

Let me know what you think. Thanks.

@klown
Copy link
Contributor Author

klown commented Jan 8, 2025

@klown, I agree using a script to update ./data/bliss_symbol_explanations.json is a good idea.
...

The latest version of the code uses npx vite-node to run the command line script ./scripts/createAndRecordCompositions.ts. There is documentation near the top of the script that explains what it is for, how to run it, and that it is meant to be a one-time execution.

  1. Remove ./data/BciAvCompositionAnalysis.tsv from the repository. Instead, include instructions in the
    documentation on how to generate this intermediate file from its source spreadsheet, for example, saving the
    spreadsheet as a .tsv file.

I agree with the removal but for different reasons. With respect to the composition information in the spreadsheet, it sometimes differs from the Blissary. Like the palette rendering code, the script here uses only the Blissary for determining the composition of the symbols. Note that Russell has updated the original spreadsheet on the shared drive following a discussion with him and Hannes. That means that BciAvCompositionAnalysis.tsv is out of date and that is another potential reason to get rid of it.

In fact, I'm thinking about deleting all of the tsv material at this point -- the spreadsheet and the associated scripts. The only thing stopping me is that the spreadsheet was used to categorize the symbols as Bliss-characters vs. Bliss-words. That information is available from the Blissary, but I need to talk with Hannes to see if we can access it programatically.

With respect to words vs characters, I have discovered additional inconsistencies between the spreadsheet and the Blissary. One set is small and it could be handled by editing bliss_symbol_explanations.json by hand. The other group is larger. In both cases, it's looks like the Blissary is correct.

The small set of inconsistencies involve four indicators. These are listed as Bliss-words in the spreadsheet, but as Bliss-characters by the Blissary:

  • 28043 indicator_(continuous_form)
  • 28044 indicator_(plural,definite)
  • 28045 indicator_(thing,definite)
  • 28046 indicator_(thing,plural,definite)

There are a lot of other indicators that have the same general form, but are listed as Bliss-characters in both the spreadsheet and by the Blissary. For example, 24677 indicator_(present_action) is a character in both and it's as complicated as 28045 indicator_(thing,definite). I think the Blissary is correct here.

The larger set are a number of symbols that are marked as characters in the spreadsheet, but as words in the Blissary. I'm still looking through them, but I think the Blissary is right. For example "12355 aid" is composed of "help+indicator_thing", but is marked as a character in the spreadsheet. Compare that with "14705 help(to)" which is classified as a word in both places but is constructed in a way that is very similar to "12355 aid": Here they are one above the other:

  • 12355 aid = help+indicator_thing (charecter in the spread sheet; word in the Blissary)
  • 12355 help(to) = help+indicator_action (word in the both cases)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants