Questions about the finetune sampling process and fintuned model #527

RobertTzc · 2023-10-18T19:37:31Z

RobertTzc
Oct 18, 2023

We have been rigorously studying your BirdDetector model developed using the Deep Forest package and are quite impressed with its performance. We do, however, have some specific questions about the model's implementation and training results. We hope you could shed some light on these aspects.

Sampling Process: Upon reviewing the code repository on GitHub (BirdDetector Repo), it seems the model relies on randomly selected image samples for training. Could you confirm whether the fine-tuned model available online was trained on optimally sampled images or if the sampling was done randomly? Also are the finetuned model provided based on training with 1000 samples
Fine-Tuned Model Performance: We noticed that the online fine-tuned models perform exceptionally well on various datasets. However, our attempts to train a local-only model have yielded results that are only closely comparable. Could you provide details on the number of training samples used in these fine-tuned models? In the paper (Figure 2), it's mentioned that fine-tuning with just 1000 annotations results in 50% performance compared to the local-only model. This seems contradictory to the high performance of the online models.

We would greatly appreciate any clarification or insights you could offer on these matters. If there is something we have misunderstood, please do let us know. We value your contributions to this field and look forward to your response.

bw4sz · 2023-10-19T21:11:31Z

bw4sz
Oct 19, 2023
Maintainer

Great, i'm glad the data can be of use. Let's break these down into small questions. I'll need to review the code to make sure I remember correctly.

Could you confirm whether the fine-tuned model available online was trained on optimally sampled images or if the sampling was done randomly?

I believe the data was sampled at random. Looking here

https://github.com/weecology/BirdDetector/blob/b2c50d0c840e4a589056d3cfdc923f02969a6672/generalization.py#L181 each fine tune dataset draws from a train or test csv. Those csv are generated in utils/prepare.py

https://github.com/weecology/BirdDetector/blob/b2c50d0c840e4a589056d3cfdc923f02969a6672/utils/prepare.py#L157

And that function splits them randomly.

I just check out our HPC and I still have those train test files. Would you like them? One for each dataset. Which datasets are you using? I think the larger issue here is that we need better reproducibility. I can put the train test splits up on zenodo with the rest of the records.

https://zenodo.org/records/5033174

(base) [b.weinstein@login12 crops]$ head USGS_train.csv
Unnamed: 0,file_basename,author,label,xmin,ymin,xmax,ymax,species,genus,activity,age,sex,image_path
19,0095AGL_P1_20170226_141932_945_35271_59825.tif,umesc,Bird,641.0,1016.0,707.0,1100.0,,,,,,0095AGL_P1_20170226_141932_945_35271_59825_0.png
216,0095AGL_P1_20170226_141932_945_35271_59825.tif,umesc,Bird,748.0,46.0,810.0,120.0,,,,,,0095AGL_P1_20170226_141932_945_35271_59825_0.png
394,0095AGL_P1_20170226_141932_945_35271_59825.tif,umesc,Bird,333.0,804.0,389.0,889.0,,,,,,0095AGL_P1_20170226_141932_945_35271_59825_0.png
460,0095AGL_P1_20170226_141932_945_35271_59825.tif,umesc,Bird,73.0,552.0,129.0,653.0,,,,,,0095AGL_P1_20170226_141932_945_35271_59825_0.png
156,0095AGL_P1_20170226_141932_945_35271_59825.tif,umesc,Bird,338.0,853.0,393.0,949.0,,,,,,0095AGL_P1_20170226_141932_945_35271_59825_10.png
219,0095AGL_P1_20170226_141932_945_35271_59825.tif,umesc,Bird,143.0,304.0,221.0,383.0,,,,,,0095AGL_P1_20170226_141932_945_35271_59825_10.png
494,0095AGL_P1_20170226_141932_945_35271_59825.tif,umesc,Bird,889.0,864.0,989.0,917.0,,,,,,0095AGL_P1_20170226_141932_945_35271_59825_10.png
230,0095AGL_P1_20170226_141932_945_35271_59825.tif,umesc,Bird,879.0,806.0,945.0,901.0,,,,,,0095AGL_P1_20170226_141932_945_35271_59825_11.png
341,0095AGL_P1_20170226_141932_945_35271_59825.tif,umesc,Bird,944.0,242.0,1044.0,295.0,,,,,,0095AGL_P1_20170226_141932_945_35271_59825_11.png

1 reply

RobertTzc Oct 20, 2023
Author

Thanks for the reply, yes i am aware the train and test split are based on random split, i do use the traiin/test split that downloaded from zendo, so in that case our train/test split should be consistant.

bw4sz · 2023-10-19T21:21:00Z

bw4sz
Oct 19, 2023
Maintainer

Also are the finetuned model provided based on training with 1000 samples?

Looking at Figure 4, the one you've pasted, the legend reads.

Figure 4. Difference between models fit using only local annotations (the ‘local-only’ model)
and general models retrained with the same local data (the ‘fine-tuned’ model) for each dataset using different numbers of local annotations for training. The fine-tuned model is pre-trained on all other datasets, whereas the local-only model contains no annotations from other datasets. Each dataset was run up to its available number of annotations.

If I understand your question correctly, no, the finetuned models use all available data in the training split, the test.csv is of course never used. So in the case above the USGS fine-tuned model would have 26,000 annotations. Each point on the x axis in figure 4 compares a model train on that number of annotations to the fine-tuned model. So if you mean the models in the zenodo repo

I believe those are with all available train annotations. As you say, I would expect them to do very well against the test data, they are large and the datasets are very homogenous. The challenge is really across datasets.

(base) [b.weinstein@login12 crops]$ cat USGS_train.csv | wc -l
26422

1 reply

RobertTzc Oct 20, 2023
Author

Just to confim by saying Each dataset was run up to its available number of annotations.Is that mean the model provided on the zendo are trained on the entire training annotations (not just 1000/5000 sampled from training set), which are potentially the best model for that dataset that can achieve in the given methods?

bw4sz · 2023-10-19T21:26:04Z

bw4sz
Oct 19, 2023
Maintainer

However, our attempts to train a local-only model have yielded results that are only closely comparable. Could you provide details on the number of training samples used in these fine-tuned models?

I'd need alot more information to comment here. Can I see your models? Your learning rates? Any preprocessing steps? Have you compared your models to the ones in the BirdDetector repo? Let's start by getting those train/test splits up on zenodo. As is shown in Figure 4, each fine-tuned model, the ones released on zenodo, has the maximum number of train annotations in it, they are all different sizes.

You can see this in Figure 4, the one pasted above, notice for example the 'Seabirds - Indian Ocean' dataset doesn't go beyond 5,000, because there are not more than 10,000 annotations in the dataset.

(base) [b.weinstein@login12 crops]$ cat monash_train.csv | wc -l
9846

Is there something in the manuscript that made this confusing? Good to know for the others and in the future. I'm re-reading the paper, and I'm guessing because in Table 3, there is a F1 score from a finetuned model of 1000 annotations? This was just for comparison to the Local only models of similar sizes. Perhaps we should have named this model something like 'Finetune - small' to make it clearer than Finetuned in other parts of the paper that refer to models with all available training data.

1 reply

RobertTzc Oct 20, 2023
Author

one more questions, as we are sampling certain amounts of annotations for the finetune tasks, are these samples selected randomly from the tainset or there is a rule behind the selection, to me it seems if select randomly will slightly affect the model performance depending what data is being shown to the model during the training

bw4sz · 2023-10-19T21:43:59Z

bw4sz
Oct 19, 2023
Maintainer

"it's mentioned that fine-tuning with just 1000 annotations results in 50% performance compared to the local-only model."

Can you point to this in the manuscript, i'm not finding it where i'm reading. Maybe if you give me a tangible example of a model score you have for a particular dataset and then point in the paper, we can make sure the text is clear. Looking at Figure 4, I'd say more like 2,000 annotations crosses that 50% axis. But it depends on the dataset. The South Pacific Seabirds (red) nearly has equal precision compared to the full fine-tuned model with only 1000 annotations, but that penguin and shag dataset precision (green) takes awhile, perhaps 12,000 annotations to get to equal performance. And it looks like recall takes more annotations to improve compared to precision.

1 reply

bw4sz Oct 19, 2023
Maintainer

The more I look at this, I think your confusion probably comes from comparing Figure 2 and Figure 4. Figure 2 does indeed show zero-shot performance of the general model and then performance finetuned with 1000 annotations. Figure 4 shows the performance of the general model fine-tuned on a variable number of annotations (x axis) compared to the full local only model, which can have tens of thousands of annotations compared to the dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the finetune sampling process and fintuned model #527

{{title}}

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Questions about the finetune sampling process and fintuned model #527

RobertTzc Oct 18, 2023

Replies: 4 comments · 4 replies

bw4sz Oct 19, 2023 Maintainer

RobertTzc Oct 20, 2023 Author

bw4sz Oct 19, 2023 Maintainer

RobertTzc Oct 20, 2023 Author

bw4sz Oct 19, 2023 Maintainer

RobertTzc Oct 20, 2023 Author

bw4sz Oct 19, 2023 Maintainer

bw4sz Oct 19, 2023 Maintainer

RobertTzc
Oct 18, 2023

Replies: 4 comments 4 replies

bw4sz
Oct 19, 2023
Maintainer

RobertTzc Oct 20, 2023
Author

bw4sz
Oct 19, 2023
Maintainer

RobertTzc Oct 20, 2023
Author

bw4sz
Oct 19, 2023
Maintainer

RobertTzc Oct 20, 2023
Author

bw4sz
Oct 19, 2023
Maintainer

bw4sz Oct 19, 2023
Maintainer