-
Notifications
You must be signed in to change notification settings - Fork 14
Home
Welcome to the orcagsoc wiki!
- Joint proposal
- Trello board
- Slack channel
- AI for Orcas project, including GSoC 2020 blog posts
- Report progress on goals from last week
- Discuss any blocking issues or strategic decisions (e.g. upcoming scheduled events, code reviews, etc.)
- Set new goals for next week
Kunal's update
- Working on Valentina’s guidance
- ROC (0.83, 0.2)
- Precision, recall plot
- Jesse: Prepare to automate the active learning
- Use Rparse, or new libraries like Click (more efficient than Rparse)or Typer (requires Python versions) to create Command Line Interface
- Blocking:
- Error/exception above 1 batch in Tensorboard
- Abhishek will help troubleshoot via DM
Diego's update
- Documenting API (orcagsoc/tree/feature/statistics/api)
- Added date in order to plot # sounds validated
- Idea of tracking speed of labeler (future feature when/if gamification is used to motivate citizen scientists?)
- Idea of tracking evolution of model performance along with # of sounds validated (possibly on same time-series graph?)
-
Valentina: Tensorboard has examples of how to plot increase in performance
- here is an example of a sort of 3D plot that I referred to: https://jhui.github.io/2017/03/12/TensorBoard-visualize-your-learning/
-
Valentina: consider a version control (service?) to store model parameter evolution
-
Jesse: save to Drive or S3 Bucket (otherwise Colab resources will be exceeded)
-
General discussion
- Kunal: looking ahead, after a few rounds of active learning, could we use a much larger non-validated set of predictions to train in subsequent rounds?
- Jesse: that is done but it is preferable to validate at least some of the predictions
- Valentina: Ming used algorithm to get predictions of Beluga signals, then fed training data to deep learning model; other examples of good practice may be found in click detection literature.
- Valentina: Plot idea for machine learning scientists: spread or distribution of prediction probabilities or scores for each sample (e.g. lots of 0s and 1s with nothing in the middle) to show over-fitting vs confidence...
- Ideas for other (domain expert or data owner) user options:
- Scott: Maybe specify what portion of your data set you want to validate during each active learning iteration?
-
- Jesse: Maybe good to indicate confidence for each prediction, or whether a threshold is met or not
Scott's updates & questions:
- When/if to utilize Dan/Hannah data this summer, as well as iteratively improving Abhishek’s model and training/test data?
- Do either of you need more feedback from me, e.g. user feature specification and priority in the Trello board?
- How did Pod.Cast team choose the format of the tsv files and organization of the tar balls?
- How different are the Pod.Cast label format and metadata from other training data sets (in bioacoustics, generally)?
- DFO meeting #1 synopsis
- Oliver would like to join call on 2nd Fri in July
- DFO wants differentiation between ecotypes (SRKWs and Bigg’s)
- Are there any/many Biggs signals in the OrcaCNN data set?
- Abhishek: maybe, but if so very few
- Jesse/Abhishek: general stats/format of OrcaCNN data/labels?
- About 2000 KW labels (Abhishek generated samples; Dan provided small test set)
- Humpback train/test data from Monterey Bay (GPL-like usage, so not fully open)
Diego report
- Pytest implemented; tabling test of click for later (via Praful)
- enabled extension on backend
- tested on edge browser (fixed bug), now works on Firefox & Chrome
- added code snippet to handle expertise tag
- deployed backend on Heroku, front end to Github pages
- Github admin needs to publish
- Using Postman and PGadmin
Diego goals
- Valentina: add documentation to show how Heroku set-up and how to deploy Flask app on linux and docker
- Jesse: document API, including end points
- Charting libraries (8 default charts; will work with dummy data first)
- Valentina: look at tensor board/chart (are ML measures useful for expert users (scientists like Hannah), or could they be simplified for general audience?
- Grids to analyze/verify confusion matrix results (e.g. true vs false positives)
- Valentina: plot model performance over time (choose 1 score to track during internal validation, e.g. for each epoch)
- Table for 2 weeks: Javascript testing. Scott suggestion: ask Praful for Thurs hack group invite & timing (to jumpstart JS testing next week, and/or following week)
Kunal report
- working on documentation
- Resnet 412 models, VGG16, Inception
- Has not used WHOI data, only podcast round 2 & 3
- Discussed pre-training on WHOI data vs other orca labeled data
Scott: in anticipation of experiments with different combinations of orca training data, add to orcadata wiki the size of related data sets (with links to them)?
- OrcaCNN (Alaskan residents)
- OrcaSPOT (NRKWs with data from Orchive)
- OBI Lime Kiln data (SRKWs)
Kunal goals
- Valentina: start looking into and documenting formats for importing/exporting models and performance comparisons (open source formats? HD55?)
- Valentina: plot model performance over time (choose 1 score to track during internal validation, e.g. for each epoch)
- Jesse: create a callback for checkpoints, but also an accuracy threshold (stop training if accuracy > 0.95)
- Valentina: do a little more tuning, but main reason for over-fitting is that we need more data…
- Jesse: ~70% access is a good place to start, then try to improve through active learning process
- Valentina: do you have more negatives that you haven’t used for training? If so, does the model suggest that some are “interesting” -- possibly ones that are near your decision boundary?
- Kunal: All Orcasound negatives have been used in training, but Ketos background sounds (from NRKWs) might be a possibility
- Scott: Let me know if Google Cloud services help with Colab logistics. This week Beam Reach (my social purpose corp) was granted k credits that need to be used in next year...
Mentor thoughts on process for weekly Friday meetings? Jesse: report on progress, blocking issues Scott: include goals for next week Valentina: also schedule (code) review events
Kunal report:
Scott chat links:
How to visualize the model performance?
- ROC curves
- Confusion matrices
Diego Q for Kunal: What is difference in performance if mp3 is used instead of WAV?
Scott thot: Two experiment ideas to seek an answer --
- stream both HLS and FLAC when SRKWs are next calling, &
- Go back to WAV files in training (e.g. Pod.Cast rounds) and convert WAV samples to mp3, then re-run model... Ask Val for ideas, too...
Kunal goals:
- VGG may be best, but also trying ResNet and convolution 2D model
- Jesse: look at how to make code reusable (e.g. Orca), Kunal will convert from colab to to Python scripts…
- Valentina: Add markdown cells to document code (including organizing packages), and even images
- Abhishek: For next notebook, add subsections in notebook and a top-level README
Diego report:
- Added Bigg’s KWs to classification UI
- Added option to indicate experience level of labeler
- Table of labels, including mp3 filename, label, and user experience level
Diego goals:
- Test GUI with Kunal’s processed data (e.g. put it in S3 bucket)
- Jesse: include tests (for Flask you can use libraries to mock a post, and ensure something is returned) and embed in continuous integration
- Abhishek goal: you had chance to look at JS library?
- Valentina: look into each cloud environment’s app service…
- Diego: Heroku is easier (Github integration vs ssh from Ubuntu instance), but is more expensive
**Diego updates: ** -- UI branch w/ J,K,L and no orca categories (bird, ship…) -- SV: send goals for “expert user (SRKW, orca)” to Diego -- Error testing pod.cast - Goal: Will compare w/Valentina - SV: share Akash/Prakruti emails?
Kunal updates: -- Will share notebook with Ketos error -- Goals: -- Why getting error in Ketos (share w/Jesse to document for Fabio/Oliver)
Abhishek: -- keep documenting in the orcasoc repo README!
Val: -- experimenting with edge computing (with Fabio!)
Kunal, Val, and Scott discussed Kunal's initial call modeling efforts training with Podcast round 3 set and Val's latest pre-processing approaches.
Scott's list of insights from the discussion New open-source bioacoustic labeling tools should provide guidance about decisions made by domain experts (e.g. when validating predictions in a tool like Podcast and move towards standardization of annotation metadata:
- time bounds (fixed duration or variable procedure, start bound time vs signal start time, how much background noise included before/after...) and resolution
- frequency bounds and resolution
- Whether to exclude calls with clicks, or whistles, or snaps?
- What is a sufficient signal to noise ratio to qualify as a call vs a faint call vs a possible call?
Kunal ended with a good question about what to do next to improve his model performance...