diff --git a/.gitignore b/.gitignore index 7f73e11..19976c2 100644 --- a/.gitignore +++ b/.gitignore @@ -4,6 +4,9 @@ networks/ checkpoints/ +# Image used for quick smoke tests +82148729_p0.jpg + # Bunch of files that have no place in version control 2020_0000_0599/encoded_tags_test.npy 2020_0000_0599/stats.txt diff --git a/README.md b/README.md index 88d6bd9..2b0b2e0 100644 --- a/README.md +++ b/README.md @@ -6,3 +6,43 @@ Repo for my Tensorflow/Keras CV experiments. Mostly revolving around the Danboor Framework: TF/Keras 2.7 Training SQLite DB built using fire-egg's tools: https://github.com/fire-eggs/Danbooru2019 + +Currently training on Danbooru2021, 512px SFW subset (sans the rating:q images that had been included in the 2022-01-21 release of the dataset) + +## Reference: +Anonymous, The Danbooru Community, & Gwern Branwen; “Danbooru2021: A Large-Scale Crowdsourced and Tagged Anime Illustration Dataset”, 2022-01-21. Web. Accessed 2022-01-28 https://www.gwern.net/Danbooru2021 + +---- + +## Journal +06/02/2022: great news crew! TRC allowed me to use a bunch of TPUs! + +To make better use of this amount of compute I had to overhaul a number of components, so a bunch of things are likely to have fallen to bitrot in the process. +I can only guarantee NFNet can work pretty much as before with the right arguments. +NFResNet changes *should* have left it retrocompatible with the previous version. +ResNet has been streamlined to be mostly in line with the Bag-of-tricks paper ([arXiv:1812.01187](https://arxiv.org/abs/1812.01187)) with the exception of the stem. It is not compatible with the previous version of the code. + +The training labels have been included in the 2021_0000_0899 folder for convenience. +The list of files used for training is going to be uploaded as a GitHub Release. + +Now for some numbers: +compared to my previous best run, the one that resulted in [NFNetL1V1-100-0.57141](https://github.com/SmilingWolf/SW-CV-ModelZoo/releases/tag/NFNetL1V1-100-0.57141): +- I'm using 1.86x the amount of images: 2.8M vs 1.5M +- I'm training bigger models: 61M vs 45M params +- ... in less time: 232 vs 700 hours of processor time +- don't get me started on actual wall clock time +- with a few amenities thrown in: ECA for channel attention, SiLU activation + +And it's all thanks to the folks at TRC, so shout out to them! + +I currently have a few runs in progress across a couple of dimensions: +- effect of model size with NFNet L0/L1/L2, with SiLU and ECA for all three of them +- effect of activation function with NFNet L0, with SiLU/HSwish/ReLU, no ECA + +Once the experiments are over, the plan is to select the network definitions that lay on the Pareto curve between throughput and F1 score and release the trained weights. + +One last thing. +I'd like to call your attention to the tools/cleanlab_stuff.py script. +It reads two files: one with the binarized labels from the database, the other with the predicted probabilities. +It then uses the [cleanlab](https://github.com/cleanlab/cleanlab) package to estimate whether if an image in a set could be missing a given label. At the end it stores its conclusions in a json file. +This file could, potentially, be used in some tool to assist human intervention to add the missing tags. diff --git a/tools/cleanlab_stuff.py b/tools/cleanlab_stuff.py index 09f1b08..90e56ec 100644 --- a/tools/cleanlab_stuff.py +++ b/tools/cleanlab_stuff.py @@ -16,7 +16,7 @@ samples = [x.rstrip() for x in f.readlines()] full_labels = np.load("2021_0000_0899/encoded_tags_test.npy") - full_psx = np.load("tags_probs_NFNetL0V1_01_29_2022_08h29m32s.npy") + full_psx = np.load("tags_probs_NFNetL1V1_01_29_2022_08h20m44s.npy") tags_actions = {} @@ -32,7 +32,10 @@ psx = np.stack([recip, psx], axis=1) ordered_label_errors = get_noise_indices( - s=train_labels_with_errors, psx=psx, sorted_index_method="normalized_margin" + s=train_labels_with_errors, + psx=psx, + sorted_index_method="normalized_margin", + n_jobs=1, ) # On Danbooru a tag is more likely to be missing than to be wrong,