Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train / Test dirs #40

Open
JordanHolland opened this issue Jan 13, 2021 · 3 comments
Open

Train / Test dirs #40

JordanHolland opened this issue Jan 13, 2021 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@JordanHolland
Copy link
Collaborator

It occurs to me that many public datasets have predefined training and testing splits for comparison purposes. We need the ability to supply a --train_dir and --test_dir or a train.pcap and test.pcap for this purpose, as right now we split the data randomly that we get.

@JordanHolland JordanHolland added the enhancement New feature or request label Jan 13, 2021
@jesteria
Copy link
Collaborator

Sure, that could work, (and I've done it that way in the past).

Alternatively, though this might overload things slightly, it might be easier (for the user and the implementation), to identify test files via the labeling file….

@JordanHolland
Copy link
Collaborator Author

That's a good idea! Did not think of that. likely easier in the end. Add a column to the label file?

@jesteria
Copy link
Collaborator

Right. We can make it optional, even.

And perhaps make it smart-ish (though we needn't) – say, if some rows are marked "test", then we know which are test and which train, (and same for just some marked "train" and the rest left blank). But, if some are marked "test" and some "train" and there are any unmarked, we error. (Alternatively we make it less smart, and/or make this column boolean.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants