-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow non-translation tasks #25
Conversation
lannelin
commented
Nov 20, 2024
- move seeding to utils for reuse
- allow specification of list of languages with no concern for source/target terminology
- creation of specific fn for translation loading
- refactor of existing script to use new translation-specific fn
ok @J-Dymond should be good to go, sorry for the false start! |
tests/test_multieurlex_utils.py
Outdated
|
||
from arc_spice.data import multieurlex_utils | ||
|
||
# def extract_articles( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to remove, sorry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole file? or the `extract_articles'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just the commented out sig
scripts/variational_RTC_example.py
Outdated
data_dir="data", level=1, lang_pair=lang_pair | ||
) | ||
train = dataset_dict["train"] | ||
multi_onehot = MultiHot(metadata_params["n_classes"]) | ||
test_row = get_test_row(train) | ||
class_labels = multi_onehot(test_row["class_labels"]) | ||
return test_row, class_labels, metadata_params | ||
|
||
|
||
def get_test_row(train_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be appropriate to split these functionalities into two functions, or pass a debug_flag
argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've simply removed the manually entered data here. I assume this script will be superseded in time by something that goes over more than 1 sample
|
||
|
||
def test_extract_articles_single_lang(): | ||
langs = ["en"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we loop over all languages here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only have test data for single lang for english (as the loader works a bit differently). I can create it for others if we want to expand the tests but it should be the same functionality. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok is this addressed by the other comment re. testing all languages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good to merge! I just had one question in there about get_test_row
. Would appreciate a chat on how to use the test multieurlex, I think I'll need to use/adapt that for the inference tests.