Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix/remove outlier predictions #1

Open
jmlondon opened this issue Oct 8, 2024 · 3 comments
Open

fix/remove outlier predictions #1

jmlondon opened this issue Oct 8, 2024 · 3 comments
Assignees

Comments

@jmlondon
Copy link
Contributor

jmlondon commented Oct 8, 2024

For both ribbon and spotted seals, there are 'outlier' predictions showing up in the final movement dataset that need to be addressed.

For example, spotted seal predictions (pl_predict_pts) shows points well into the southern hemisphere

image

And ribbon seals, while remaining in the northern hemisphere, have some relatively extreme smoothed projections

image

The likely culprits worth investigating:

  1. observations in the raw data that occur outside of the initial deployment date or specified end date
  2. erroneous observations/location estimates
  3. very long time gaps between observed locations -- this is the most likely scenario and worth looking at some recent code from Devin Johnson to remove these time gaps by splitting into separate segments
@jmlondon jmlondon self-assigned this Oct 8, 2024
@jmlondon
Copy link
Contributor Author

jmlondon commented Oct 12, 2024

@emchuron and I had a discussion about two approaches for handling long time gaps between observed locations

  1. A priori split the sequence of observed locations into separate segments ... this would be done based on a specific maximum gap (e.g. 7 days) between observed locations before a new segment is designated. Each segment would be fit and predicted (and pseudo tracks generated) independently before merging back as needed
  2. Fit the complete track and rely on post hoc identification of time gaps ... after fitting based on the complete set of observations we can identify gaps as before. Predictions and pseudo tracks are only generated for the periods outside of the identified gaps

The initial consideration was to focus on the first because it seemed easier to implement and might result in better predictions/pseudo tracks because the gap periods wouldn't have influence on the model fit. After some experimentation though, this approach leads to short segments that may not converge during the model fit. Imagine you might have a stretch of 8 days of no observations followed by 7-10 locations and then another 8 day gap. Fitting a model to just those 7-10 locations can be unreliable.

In most cases, the large time gaps are not resulting in poor model fits or convergence issues. Instead the problem comes on the prediction side when large correlated loops are generated that are unrealistic.

So, I think the second approach is the path worth pursuing and here's what's needed to accomplish that

@emchuron
Copy link

sounds good to me! Let me know if I can help at all

@jmlondon
Copy link
Contributor Author

There are still some existing 'outlier' data within the spotted seal deployments. Need to do some additional investigation on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants