Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example scripts for training CDS models #33

Open
martiansideofthemoon opened this issue Oct 27, 2021 · 11 comments
Open

Add example scripts for training CDS models #33

martiansideofthemoon opened this issue Oct 27, 2021 · 11 comments

Comments

@martiansideofthemoon
Copy link
Owner

martiansideofthemoon commented Oct 27, 2021

Additionally, add dict.txt files for these dataset folders

@martiansideofthemoon martiansideofthemoon changed the title Add details on running code using schedule.py, which launches evaluation as well Add example scripts for training CDS models Oct 27, 2021
@TufailAhmadSiddiq
Copy link

Hello @martiansideofthemoon
I hope you are fine and doing great. I am trying to set up the custom dataset to use in training. I have converted the text file into bpe format but now facing a dict.txt not founding error. I have attached the screenshot as well. Please have a look and let me know. Thank You.
dict txt file not found

@martiansideofthemoon
Copy link
Owner Author

Hi @TufailAhmadSiddiq , could you share output of ls datasets/new_dataset/*/*?

@TufailAhmadSiddiq
Copy link

Hi, I have executed that command, and here is the result
ls datasets new_dataset

@martiansideofthemoon
Copy link
Owner Author

Please follow instructions here, especially the first paragraph: https://github.com/martiansideofthemoon/style-transfer-paraphrase#custom-datasets

You need .txt, .label files to get it started. The first script will create the input0.bpe files for you.

@TufailAhmadSiddiq
Copy link

I have created .txt and .label files for training, validation, and testing and placed them inside the
Uploading inside new_dataset.PNG…
new_dataset. Here is the screenshot

@TufailAhmadSiddiq
Copy link

inside new_dataset

@TufailAhmadSiddiq
Copy link

Hi, @martiansideofthemoon
I have sent you the screenshot of my directory. Can you please tell me why this problem is occurring?

@martiansideofthemoon
Copy link
Owner Author

What's the error you get with this directory in place?

@HassanBinAli
Copy link

The following error is occurring
image
However the file dict.txt is there in datasets/new_dataset-bin.

@martiansideofthemoon
Copy link
Owner Author

I'm suspecting this error is coming from fairseq preprocessing. I think it creates the dict files for you (the entire bin folder in fact). Maybe try to run the code by temporarily renaming the bin folder to something else?

@HassanBinAli
Copy link

The error is still intact. However, it creates a folder named "new_dataset-bin". I am attaching a screen shot of what is inside dataset/new_dataset-bin folder below
image
There are two folders and one file. input0 folder is empty however label folder has following
image
I am also attaching screen shot of what I have in dict.txt below
image
I have some articles on which I am trying to fine tune this model so that the model can learn the writing style used in my articles. I gave this style the name of "custom_style". 15720 represents the entries in my train set. I think these files seem fine. So my question is can I proceed to fine tuning step with input0 having nothing in it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants