-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translating CSIRO tests #130
Conversation
Hi folks, Per @s-good's request in #146, here's another go at pulling these CSIRO translations together. A few (brief) comments:
The statistical test described here remains to be implemented. This test examines whether a profile falls too far outside a historical range for the region; the paper describes a multi-step automatic and manual QC process performed to build up sufficient statistics to construct these historical ranges. The semi-manual approach doesn't quite fit with what we're building here, but I was wondering if it would be possible to identify some static datafile describing the ranges found in that study, which we could apply without repeating their construction - perhaps @BecCowley knows if this is feasible? In any case, this statistical test can come in a separate PR - these tests will be ready for merge once the bullets above are addressed. |
The probe codes look similar to those on https://www.nodc.noaa.gov/GTSPP/document/codetbls/gtsppcodes/gtspp_type.html although there is no UN on that table. If that is the case BA means BATHY message which refers to the format the data is transmitted in rather than the probe type. The crude assumption is that these are XBTs, although in practise data from other probe types might be sent using this format. |
Alright well - |
Yes, it is a problem when only these two letter codes are available to work out the type of instrument. I think we can assume that if they have BA or XB in the Fortran they meant to select XBTs, so it sounds like what you have done is fine. |
@BillMills and @s-good apologies for not paying attention to this. BA is the GTS (global transmission system) version of XBTs. It is a 'bathy' message and is a low-resolution, non-QCd version of the full profile. Sometimes they are replaced by the high-resolution version, and sometimes not. There are possibly a lot of duplicates of low and high resolution. Here is a list of the codes:
Also, I attach a description of the MQuest netcdf format used for the Quota database (although I think the plan is for Tim to convert it to WOD format). I will look at your questions on the fortran in a bit more detail and get back to you with comments. |
Thanks, @BecCowley! Apologies for the very long comment above - I think it's all sorted out, with the exception of the final section regarding the spike test, implemented as *edit - also, I dropped the bottle temperature check, since in our case, these profiles will definitely be flagged by |
@BillMills. My logic interpretation of the short gradient/spike test: A 'short' gradient is a single spike or immediate gradient change over a small depth range. The code only looks for spikes greater than or equal to 0.2 and less than 0.5. deg C when the depth is less than/equal to 30m. The test only looks for a positive increase. However, in the Fortran code it is a loop, running down the temperature/depth profile. So it will pick up the spike/gradient no matter which way the direction is: If the rogue point is at depth m and is positive, temp2s-temp1s will be positive when temp2s = m and temp1s = m-1. Translating this to python will probably utilise an absolute statement. Moving down the profile, once the depth is greater than 30m, the gradients/spikes checked are less than 0.4degC. Next, the code checks for the 'long' gradients which are gradual inversions of positive temperature changes. These shouldn't be applied to the Southern Ocean because such inversions are normal there and we don't want to check them. However, I can't see the switch to check the latitude in the code. It might be somewhere else, I can hunt that down. Fixing spikes at this stage is not a good idea. We need to check how efficient the code is at finding the spikes. Ultimately, we will want to interpolate over them, I think, but that might be up for discussion. |
This reverts commit e7ac6ee.
Thanks so much, @BecCowley! I hadn't quite picked up on the long inversion, thanks for pointing it out. Two (hopefully last) questions:
Seems like this is checking for a depth difference of 30m between this and the next point, not an absolute depth of 30m - as always, you're the boss, happy to implement anything you like - just want to make sure I'm implementing things as intended.
Thanks again for all your help! |
@BillMills, yes, you are correct. I missed that subtlety. The check is looking for gradients between points with <30m depth differences. That is so we avoid falsely labelling gradients where the depth resolution is very low. The flag 'GL' gets added at line 396-397 and is set at depth 'dgradl' which is at depth i. The test is for depth i+1 being higher in temperature than depth i. Note the flag is only applied if the 'gradlong' value is less than 4. |
Thanks @BecCowley, looks good to me now - I think that should do it for v1! |
While #129 finished implementing the gradient, wire break and shallow XBT depth test from CSIRO's paper, examining the underlying FORTRAN revealed a number of qc tests that weren't obvious from the text; this PR begins implementing them. Before we merge, comments from @BecCowley on the questions inline below would be much appreciated.
Constant Bottom Test
Constant bottom test checks if the two deepest temperatures in a profile are the same, and flags the bottom one only if they are from a latitude above -40, and only if they are more than 30m apart. Original FORTRAN:
Reimplemented in
CSIRO_constant_bottom.py
. Problem: what are these data type codesXB
andBA
, in terms of the WOD encoding described on table 2.13 page 90 of this document? I assumeXB
is XBT == 2, but I'm not sure whatBA
is.Surface Spikes Test
Surface spikes test flags any level shallower than 4m, which is followed by another level shallower than 8m. Original FORTRAN:
Reimplemented as
CSIRO_surface_spikes.py
. I'm a bit concerned about my reading of this test; as described, it seems to flag clusters of shallow levels, which doesn't much seem like a spike - I'm concerned I may have just misunderstood this test altogether. Our implementation omits the temperature check; checking temperature < 90 seems to be a flag check rather than an actual physical value check, since the FORTRAN uses temperature = 99.99 to flag suspicious temperatures.Bottle Temperature Check
Bottle temperature check flags any level with temperature < -20 C, if it came from one of several detector types. Original FORTRAN:
Reimplemented in
CSIRO_bottle.py
. Like the first test, I'm not sure what the detector codes in the FORTRAN correspond to. I'm currently guessing:BO
== bottle, rossette or net == code 7 from the WOD table linked aboveUN
== underway == code 8CT
== CTD == code 4; note that there are several other types of CTD (XCTD and towed CTD) available in the WOD spec, not included in code 4.Spike Test
Finally, the paper linked above briefly describes a spike test which I can't quite tease out of the FORTRAN. The paper describes:
Which isn't quite enough information for me to reconstruct this test. There are some lines of FORTRAN that roughly correspond to a spike check:
The first part looks to be flagging a positive jump of at least 0.5 C (not 1 C) in less than 30 m - which I assume is the high depth resolution criteria. Thereafter,
fixspikes
is applied on neighboring temperatures at least 0.4 C apart from each other (in either direction), but only as a subclause of the original condition that there be a positive 0.5 degree increase moving down the profile. Which is all well and good if it's what we want - but seems much more sophisticated than what the text is proposing. Any thoughts on how to proceed, baring in mind that at this stage we won't be proposing any modifications to the data (we're raising flags only for now), and we want to be fairly loose in our criteria for suspicious data, so that the auto QC catches as close to all suspicious profiles as possible, would be much appreciated.Many thanks for sharing these ideas, @BecCowley, and for any clarifications you can offer!