Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seaexplorer.raw_to_timeseries() outputs questionable navigation data #188

Open
clayton33 opened this issue Aug 29, 2024 · 15 comments
Open

Comments

@clayton33
Copy link

We're hoping to include heading, pitch, and roll in our data files. Below is an example figure comparing the processed data of these variables with the raw data.

processdVsRaw_navVars_731

I've deduced that the navigation data gets interpolated to the chosen timestamp using _interp_gli_to_pld. Is it attributed to the more coarse sampled navigation data being interpolated to the finer science data time ?

Also, to be honest, I'm not much of a python person, so I might be a bit useless to help if this is indeed an issue.

@jklymak
Copy link
Member

jklymak commented Aug 29, 2024

That's certainly "interesting". Can you share the raw files that made those, and maybe your deployment.yaml file?

@jklymak
Copy link
Member

jklymak commented Aug 29, 2024

Ping @callumrollo too in case this twigs anything for him. At some point we started using polars for the sea explorer and I didn't follow all the changes super closely.

@callumrollo
Copy link
Collaborator

Definitely not the behavior we would want! I remember some small changes in interpolated values when we switched from pandas to polars, but they were of the order 0.001 in the tests, nothing of this magnitude. I'll investigate this afternoon. The deployment.yaml file would be really useful, particularly the timebase

@clayton33
Copy link
Author

Just sent both of you a subset of the raw files, as well as the deployment.yaml in an email. Thanks again for having a look.

@callumrollo
Copy link
Collaborator

Hi @clayton33 thanks for sending the raw files. I think I've found the solution. As you described in your email, this mission did not sample science sensors on every row, so some data were discarded during processing. In the deployment yaml, you have:

  keep_variables:
  - conductivity
  - FREQUENCY_DOXY

so pyglider only keeps lines that have ctd or oxygen data. If you want it to keep data from the dives where no science was sampled, you can add one of the glider attitude parameters like roll

  keep_variables:
  - GLIDER_ROLL
  - conductivity
  - FREQUENCY_DOXY

When I made this change and reprocessed, it fixed the gaps in pitch/roll/heading, increading the timeseries file size by ~ 10 %

Does this solution work for you?

PS thanks for providing a fully reproducible example! It really sped up the troubleshooting process.

@callumrollo
Copy link
Collaborator

Using the following code to compare raw vs pyglider data:

import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt

ds = xr.open_dataset('L0-timeseries/dfo-Mersey022-20210531_delayed.nc')
df = pd.read_csv('raw/sea022.66.gli.sub.445', sep=';')
df['datetime'] = pd.to_datetime(df['Timestamp'], format='%d/%m/%Y %H:%M:%S')

fig, ax = plt.subplots(3, 1, figsize=(8, 12), sharex='col')
ax = ax.ravel()
ax[0].plot(df.datetime, df.Heading,  lw=5, label='Raw')
ax[0].plot(ds.time, ds.GLIDER_HEADING,label='Processed')
ax[0].legend()
ax[0].set(title='Heading', xlim=[df.datetime.min(), df.datetime.max()])

ax[1].plot(df.datetime, df.Roll,  lw=5, label='Raw')
ax[1].plot(ds.time, ds.GLIDER_ROLL,label='Processed')
ax[1].legend()
ax[1].set(title='Roll', xlim=[df.datetime.min(), df.datetime.max()])

ax[2].plot(df.datetime, df.Pitch,  lw=5, label='Raw')
ax[2].plot(ds.time, ds.GLIDER_PITCH,label='Processed')
ax[2].legend()
ax[2].set(title='Pitch', xlim=[df.datetime.min(), df.datetime.max()])

With original deplyment.yaml

old

With GLIDER_ROLL added to keep_variables:

new

@jklymak
Copy link
Member

jklymak commented Sep 2, 2024

@callumrollo Can this be better documented? For instance, I don't see any mention of keep_variables in any of the test yaml files, nor in the docs. I'll admit I'm somewhat flummoxed what this does - I think it is perhaps similar to what was being discussed for slocums, where folks want multiple time bases?

@callumrollo
Copy link
Collaborator

Yes it's not the most intuitive control. keep_variables works as an accessory to timebase, to give more fine-grained control over what lines pyglider keeps. I'm off on vacation today, but I'll make a note to update the docs to describe it when I get back

@clayton33
Copy link
Author

I now realize in my initial issue I should have stated the expected behavior (sorry ! I should know better).

I was expecting that the pitch, roll, and heading variables would be interpolated to the time of the data retained by the supplied keep_variables since . For my example data set (which I initially thought wasn't a good case, but I actually think it's the ideal one!), the goal is to keep as much science data (i.e. from the ctd and oxygen sensors). Is this possible, or is the only way to get sensible navigation data is to supply a variable to keep_variables (which is completely fine I think for now), and maybe do some post-processing (e.g. omit any data where there isn't science payload data).

@jklymak
Copy link
Member

jklymak commented Sep 3, 2024

@clayton33 Your intituition about how it is supposed to work was the original intention - I'm not entirely sure why that has changed.

We had a similar discussion about the slocum gliders, and my recommendation to folks is rather than keeping two timebases, for instance if they want all the heading data even for times when science is off, is that they make two separate netcdf files.

@callumrollo I'd call this a bug the way it is behaving now. I'm happy to try and fix it, or wait until you are back fro vacation.

@clayton33
Copy link
Author

deployment.zip

@callumrollo I tried your suggestion (adding one of the nav variables to keep_variables) and I'm still getting non-sensical pitch, roll, and heading values. I made sure to update all packages, and incorporate all recent changes in my fork... but still no luck. I've attached the updated deployment.yml if it helps.

@richardsc
Copy link
Collaborator

Hi folks -- just jumping in here that I tried running @clayton33's files through the latest version of pyglider, installed by following the instructions at https://pyglider.readthedocs.io/en/latest/Install.html, i.e. I did:

conda create -n gliderwork
conda activate gliderwork
conda install -c conda-forge pyglider

Followed by:

git clone [email protected]:richardsc/c-proof/pyglider.git

and then:

pip install -e .

When I tried running her process_delayed.py script, it errored with a "no module named polars" error. Indeed, polars had not been installed to the environment. After adding it to the environment manually (conda install polars), everything ran:

Downloads/GLD2021SEA022M66/$ python process_delayed.py
Created./rawnc/
Created./delayed_rawnc/
Created./L0-timeseries/
Created./L0-profiles/
Created./L0-grid/

and after running @callumrollo's plot script I get the expected result:
image

So, I think the issue here is not with pyglider but with @clayton33's environment or installed version of pyglider.

I'll open another issue related to the install of polars (I thought that it would have been taken care of by the environment.yml file).

@clayton33
Copy link
Author

@richardsc was right, it was an issue with my environment, disregard my comment above. It's working perfectly fine for me now.

@jklymak
Copy link
Member

jklymak commented Sep 12, 2024

Ok sorry for the packaging issues. I think the folks who use pyglider mostly use from source rather than relying on package managers because changes are still happening fast. I should update the docs and maybe try to make automated releases easier

@richardsc
Copy link
Collaborator

No worries Jody -- @clayton33 and I are both learning as we go, as we're not primarily "python people". We definitely are using the source versions of pyglider, but also relying on conda to install all the other dependencies, and sometimes things can get weird. The lesson we learned from this is "don't try and do everything in your base environment" 😄 .

I won't close the issue, because the last request for work was to update the docs and @callumrollo is currently on vacation. Or we could close this and open a new issue that better reflects what still needs to be done (as the original issue is completed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants