Private Detector

This is the repo for Bumble's Private Detector™ model - an image classifier that can detect lewd images.

The internal repo has been heavily refactored and released as a fully open-source project to allow for the wider community to use and finetune a Private Detector model of their own. You can download the pretrained SavedModel and checkpoint here

Model

The SavedModel can be found in saved_model/ within private_detector.zip above

The model is based on Efficientnet-v2 and trained on our internal dataset of lewd images - more information can be found at the whitepaper here or here

Inference

Inference is pretty simple and an example has been given in inference.py

python3 inference.py \
    --model saved_model/ \
    --image_paths \
        Yes_samples/1.jpg \
        Yes_samples/2.jpg \
        Yes_samples/3.jpg \
        Yes_samples/4.jpg \
        Yes_samples/5.jpg \
        No_samples/1.jpg \
        No_samples/2.jpg \
        No_samples/3.jpg \
        No_samples/4.jpg \
        No_samples/5.jpg \

Sample Output


Probability: 93.71% - Yes_samples/1.jpg
Probability: 93.43% - Yes_samples/2.jpg
Probability: 94.06% - Yes_samples/3.jpg
Probability: 94.08% - Yes_samples/4.jpg
Probability: 91.01% - Yes_samples/5.jpg
Probability: 9.76% - No_samples/1.jpg
Probability: 7.14% - No_samples/2.jpg
Probability: 8.83% - No_samples/3.jpg
Probability: 4.87% - No_samples/4.jpg
Probability: 5.29% - No_samples/5.jpg

Additional Training

You can finetune the model yourself on your own data, to do so is fairly simple - though you will need the checkpoint files as can be found in saved_checkpoint/ in private_detector.zip

Set up a JSON file with links to your image path lists for each class:

{
    "Yes": {
        "path": "/home/sofarrell/private_detector/Yes.txt",
        "label": 0
    },
    "No": {
         "path": "/home/sofarrell/private_detector/No.txt",
         "label": 1
    }
}

With each .txt file listing off the image paths to your images

/home/sofarrell/private_detector_images/Yes/1093840880_309463828.jpg
/home/sofarrell/private_detector_images/Yes/657954182_3459624.jpg
/home/sofarrell/private_detector_images/Yes/1503714421_3048734.jpg

You can create the training environment with conda:

conda env create -f environment.yaml
conda activate private_detector

And then retrain like so:

python3 ./train.py \
    --train_json /home/sofarrell/private_detector/train_classes.json \
    --eval_json /home/sofarrell/private_detector/eval_classes.json \
    --checkpoint_dir saved_checkpoint/ \
    --train_id retrained_private_detector

The training script has several parameters that can be tweaked:

Command	Description	Type	Default
`train_id`	ID for this particular training run	str
`train_json`	JSON file(s) which describes classes and contains lists of filenames of data files	List[str]
`eval_json`	Validation json file which describes classes and contains lists of filenames of data files	str
`num_epochs`	Number of epochs to train for	int
`batch_size`	Number of images to process in a batch	int	`64`
`checkpoint_dir`	Directory to store checkpoints in	str
`model_dir`	Directory to store graph in	str	`.`
`data_format`	Data format: [channels_first, channels_last]	str	`channels_last`
`initial_learning_rate`	Initial learning rate	float	`1e-4`
`min_learning_rate`	Minimal learning rate	float	`1e-6`
`min_eval_metric`	Minimal evaluation metric to start saving models	float	`0.01`
`float_dtype`	Float Dtype to use in image tensors: [16, 32]	int	`16`
`steps_per_train_epoch`	Number of steps per train epoch	int	`800`
`steps_per_eval_epoch`	Number of steps per evaluation epoch	int	`1`
`reset_on_lr_update`	Whether to reset to the best model after learning rate update	bool	`False`
`rotation_augmentation`	Rotation augmentation angle, value <= 0 disables it	float	`0`
`use_augmentation`	Add speckle, v0, random or color distortion augmentation	str
`scale_crop_augmentation`	Resize image to the model's size times this scale and then randomly crop needed size	float	`1.4`
`reg_loss_weight`	L2 regularization weight	float	`0`
`skip_saving_epochs`	Do not save good checkpoint and update best metric for this number of the first epochs	int	`0`
`sequential`	Use sequential run over randomly shuffled filenames vs equal sampling from each class	bool	`False`
`eval_threshold`	Threshold above which to consider a prediction positive for evaluation	float	`0.5`
`epochs_lr_update`	Maximum number of epochs without improvement used to reset/decrease learning rate	int	`20`

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
private_detector		private_detector
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
inference.py		inference.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Private Detector

Model

Inference

Additional Training

About

Releases

Packages

Languages

License

locb65/private-detector

Folders and files

Latest commit

History

Repository files navigation

Private Detector

Model

Inference

Additional Training

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages