allow non-translation tasks #25

lannelin · 2024-11-20T17:14:52Z

move seeding to utils for reuse
allow specification of list of languages with no concern for source/target terminology
creation of specific fn for translation loading
refactor of existing script to use new translation-specific fn

lannelin · 2024-11-21T20:02:10Z

ok @J-Dymond should be good to go, sorry for the false start!

lannelin · 2024-11-22T14:08:22Z

tests/test_multieurlex_utils.py

+
+from arc_spice.data import multieurlex_utils
+
+# def extract_articles(


to remove, sorry

The whole file? or the `extract_articles'?

just the commented out sig

J-Dymond · 2024-11-22T14:59:12Z

scripts/variational_RTC_example.py

        data_dir="data", level=1, lang_pair=lang_pair
    )
+    train = dataset_dict["train"]
    multi_onehot = MultiHot(metadata_params["n_classes"])
    test_row = get_test_row(train)
    class_labels = multi_onehot(test_row["class_labels"])
    return test_row, class_labels, metadata_params


 def get_test_row(train_data):


would it be appropriate to split these functionalities into two functions, or pass a debug_flag argument?

I've simply removed the manually entered data here. I assume this script will be superseded in time by something that goes over more than 1 sample

J-Dymond · 2024-11-22T15:16:41Z

tests/test_multieurlex_utils.py

+
+
+def test_extract_articles_single_lang():
+    langs = ["en"]


Should we loop over all languages here?

We only have test data for single lang for english (as the loader works a bit differently). I can create it for others if we want to expand the tests but it should be the same functionality. Thoughts?

ah ok is this addressed by the other comment re. testing all languages?

J-Dymond

All looks good to merge! I just had one question in there about get_test_row. Would appreciate a chat on how to use the test multieurlex, I think I'll need to use/adapt that for the inference tests.

tests/test_multieurlex_utils.py

allow non-translation tasks

546925a

lannelin requested a review from J-Dymond November 20, 2024 17:14

lannelin added 7 commits November 20, 2024 19:13

handle single lang case

7cfbc09

rm reqs file

176b6b6

drop empty rows

a5d9395

multieurlex tests

5dd7e24

rm cache files

03dd56a

ignore test caches

42b7a1d

fix quote mismatch

ae88310

lannelin commented Nov 22, 2024

View reviewed changes

J-Dymond reviewed Nov 22, 2024

View reviewed changes

J-Dymond approved these changes Nov 22, 2024

View reviewed changes

tests/test_multieurlex_utils.py Show resolved Hide resolved

lannelin added 2 commits November 22, 2024 15:20

rm hardcoded sample

5d35c90

rm commented code

ac6a292

J-Dymond merged commit 92b39c2 into main Nov 22, 2024
5 checks passed

J-Dymond deleted the refactor_data_preproc branch November 22, 2024 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow non-translation tasks #25

allow non-translation tasks #25

lannelin commented Nov 20, 2024

lannelin commented Nov 21, 2024

lannelin Nov 22, 2024

J-Dymond Nov 22, 2024

lannelin Nov 22, 2024

J-Dymond Nov 22, 2024

lannelin Nov 22, 2024

J-Dymond Nov 22, 2024

lannelin Nov 22, 2024

lannelin Nov 22, 2024

J-Dymond left a comment


		from arc_spice.data import multieurlex_utils

		# def extract_articles(

allow non-translation tasks #25

allow non-translation tasks #25

Conversation

lannelin commented Nov 20, 2024

lannelin commented Nov 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

J-Dymond left a comment

Choose a reason for hiding this comment