Train / Test dirs #40

JordanHolland · 2021-01-13T18:51:05Z

It occurs to me that many public datasets have predefined training and testing splits for comparison purposes. We need the ability to supply a --train_dir and --test_dir or a train.pcap and test.pcap for this purpose, as right now we split the data randomly that we get.

jesteria · 2021-01-13T20:10:32Z

Sure, that could work, (and I've done it that way in the past).

Alternatively, though this might overload things slightly, it might be easier (for the user and the implementation), to identify test files via the labeling file….

JordanHolland · 2021-01-13T21:19:04Z

That's a good idea! Did not think of that. likely easier in the end. Add a column to the label file?

jesteria · 2021-01-13T21:34:00Z

Right. We can make it optional, even.

And perhaps make it smart-ish (though we needn't) – say, if some rows are marked "test", then we know which are test and which train, (and same for just some marked "train" and the rest left blank). But, if some are marked "test" and some "train" and there are any unmarked, we error. (Alternatively we make it less smart, and/or make this column boolean.)

JordanHolland added the enhancement New feature or request label Jan 13, 2021

JordanHolland assigned jesteria and JordanHolland Jan 13, 2021

Chasexj mentioned this issue Jul 22, 2022

Support for specifying training and testing files #84

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train / Test dirs #40

Train / Test dirs #40

JordanHolland commented Jan 13, 2021

jesteria commented Jan 13, 2021

JordanHolland commented Jan 13, 2021

jesteria commented Jan 13, 2021

Train / Test dirs #40

Train / Test dirs #40

Comments

JordanHolland commented Jan 13, 2021

jesteria commented Jan 13, 2021

JordanHolland commented Jan 13, 2021

jesteria commented Jan 13, 2021