Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translating CSIRO tests #130

Merged
merged 9 commits into from
Apr 7, 2016
Merged

Conversation

bkatiemills
Copy link
Member

While #129 finished implementing the gradient, wire break and shallow XBT depth test from CSIRO's paper, examining the underlying FORTRAN revealed a number of qc tests that weren't obvious from the text; this PR begins implementing them. Before we merge, comments from @BecCowley on the questions inline below would be much appreciated.

Constant Bottom Test

Constant bottom test checks if the two deepest temperatures in a profile are the same, and flags the bottom one only if they are from a latitude above -40, and only if they are more than 30m apart. Original FORTRAN:

c  check for constant temps at bottom of trace:

      if(data_type.eq.'XB'.or.data_type.eq.'BA')then
         if(indep.gt.1)then
            if(temp(indep).eq.temp(indep-1).and.latitude.gt.-40.)then
               if((depth(indep)-depth(indep-1).gt.30.)
     &  .and.(Qflags(indep).ne.'3'.and.Qflags(indep).ne.'4'))then
                  grad=.true.
                  ctr=.true.
                  Qflags(indep)='3'
               endif
            endif
         endif
      endif

Reimplemented in CSIRO_constant_bottom.py. Problem: what are these data type codes XB and BA, in terms of the WOD encoding described on table 2.13 page 90 of this document? I assume XB is XBT == 2, but I'm not sure what BA is.

Surface Spikes Test

Surface spikes test flags any level shallower than 4m, which is followed by another level shallower than 8m. Original FORTRAN:

c  check for surface spikes and remove:

      if(data_type.eq.'XB' .or. data_type.eq.'BA')then
         i=1
         do while(depth(i).lt.4. .and.temp(i).lt.90.)
            if(Qflags(i).ne.'3'.and.Qflags(i).ne.'4')then

               if(i.lt.indep-1)then
                  if(depth(i+1).le.8.)then
                     write (str10, '(f10.2)') temp(i)
             call add_hist ('CS', 'CSCB', 'grad', Up_Date, 
     &          'CS', 'TEMP', depth(i), str10, 0)
                     temp(i)=99.99
                     Qflags(i)='5'
                     grad=.true.
                     csa=.true.
                     ddcs=i
                  endif
               endif
            endif
            i=i+1
         enddo
      endif

 7788 continue

Reimplemented as CSIRO_surface_spikes.py. I'm a bit concerned about my reading of this test; as described, it seems to flag clusters of shallow levels, which doesn't much seem like a spike - I'm concerned I may have just misunderstood this test altogether. Our implementation omits the temperature check; checking temperature < 90 seems to be a flag check rather than an actual physical value check, since the FORTRAN uses temperature = 99.99 to flag suspicious temperatures.

Bottle Temperature Check

Bottle temperature check flags any level with temperature < -20 C, if it came from one of several detector types. Original FORTRAN:

c  check for bad bottle values:

      if(data_type.eq.'BO' .or. data_type.eq.'UN')then

         do i=1,indep
            if(temp(i).le.-20.)then
               temp(i)=99.99
               grad=.true.
               bbd=.true.
               dbbd=depth(i)
        call add_hist ('CS', 'AUTO', 'grad', Up_Date, 
     &      'BB', 'TEMP', dbbd, '   9999.99', 0)
c            print *,'added bb flag to BO at ',dbbd,' m'
            endif
         enddo
      endif
c     do same for CTDs but don't flag with BB
      if(data_type.eq.'CT') then
         do i=1,indep
            if(temp(i).le.-20.)then
c               print *,'temp le -20=',temp(i)
               temp(i)=99.99
               grad=.true.
               bbd=.true.
               dbbd=depth(i)
            endif
c               print *,'temp ge -20=',temp(i)
         enddo


      endif

Reimplemented in CSIRO_bottle.py. Like the first test, I'm not sure what the detector codes in the FORTRAN correspond to. I'm currently guessing:

  • BO == bottle, rossette or net == code 7 from the WOD table linked above
  • UN == underway == code 8
  • CT == CTD == code 4; note that there are several other types of CTD (XCTD and towed CTD) available in the WOD spec, not included in code 4.

Spike Test

Finally, the paper linked above briefly describes a spike test which I can't quite tease out of the FORTRAN. The paper describes:

Large single point spikes (> 1°C) are found and removed from the profile using interpolation for high-resolution profiles (e.g., Fig. 1) and replacement with a missing value if the profile is of low vertical resolution (e.g., Fig. 2).

Which isn't quite enough information for me to reconstruct this test. There are some lines of FORTRAN that roughly correspond to a spike check:

         if(temp2s-temp1s.eq.0.)goto 90
         if(abs(temp2s-temp1s).lt.0.20)goto 90  !ignore small spikes...
         gradshort=(depth2s-depth1s)/(temp2s-temp1s)
         if((temp2s-temp1s).ge.0.5 .and. (depth2s-depth1s).le. 30.)then
            if(tpq(i+1).eq.'3'.or.tpq(i+1).eq.'4')then

            else
               grad=.true.
               dgrads=depth1s
               grads=.true.
               if(abs(temp2s-temp1s).ge.0.4)then
                  if(data_type.eq.'BO')then !do this to just flag the BO spikes, not chop them

                     write (str102, '(f10.2)') temp2s

                     call add_hist ('CS', 'AUTO', 'grad', Up_Date, 
     &          'SP', 'TEMP', depth2s, str102, 0)

                  else
c                     print *,'calling fixspikes for non-bottles'
c                     print *,'t1,t2,d1,d2',temp1s,temp1s,depth1s,depth2s
                     call fixspikes(j,indep)
                  endif
               endif

The first part looks to be flagging a positive jump of at least 0.5 C (not 1 C) in less than 30 m - which I assume is the high depth resolution criteria. Thereafter, fixspikes is applied on neighboring temperatures at least 0.4 C apart from each other (in either direction), but only as a subclause of the original condition that there be a positive 0.5 degree increase moving down the profile. Which is all well and good if it's what we want - but seems much more sophisticated than what the text is proposing. Any thoughts on how to proceed, baring in mind that at this stage we won't be proposing any modifications to the data (we're raising flags only for now), and we want to be fairly loose in our criteria for suspicious data, so that the auto QC catches as close to all suspicious profiles as possible, would be much appreciated.

Many thanks for sharing these ideas, @BecCowley, and for any clarifications you can offer!

@bkatiemills
Copy link
Member Author

Hi folks,

Per @s-good's request in #146, here's another go at pulling these CSIRO translations together. A few (brief) comments:

  • From above, the only outstanding problem is the probe codes. I am currently guessing:
    • BA == ???
    • XB == XBT == code 2 from table 2.13 page 90 here
    • BO == bottle, rossette or net == code 7
    • UN == underway == code 8
    • CT == CTD == code 4; note that there are several other types of CTD (XCTD and towed CTD) available in the WOD spec, not included in code 4.
  • These tests extract the logic of the CSIRO tests as closely as possible, while simplifying them to just raise a flag or not - no cleaning, no spectrum of different flags. I tried to match the FORTRAN faithfully, but @BecCowley or others closer to the original work are best able to judge if we caught the key points.

The statistical test described here remains to be implemented. This test examines whether a profile falls too far outside a historical range for the region; the paper describes a multi-step automatic and manual QC process performed to build up sufficient statistics to construct these historical ranges. The semi-manual approach doesn't quite fit with what we're building here, but I was wondering if it would be possible to identify some static datafile describing the ranges found in that study, which we could apply without repeating their construction - perhaps @BecCowley knows if this is feasible?

In any case, this statistical test can come in a separate PR - these tests will be ready for merge once the bullets above are addressed.

@s-good
Copy link
Contributor

s-good commented Mar 20, 2016

The probe codes look similar to those on https://www.nodc.noaa.gov/GTSPP/document/codetbls/gtsppcodes/gtspp_type.html although there is no UN on that table. If that is the case BA means BATHY message which refers to the format the data is transmitted in rather than the probe type. The crude assumption is that these are XBTs, although in practise data from other probe types might be sent using this format.

@bkatiemills
Copy link
Member Author

Alright well - BA only ever appears or'ed together with XB in the original FORTRAN, so that would be consistent with wanting to consider data from an XBT; only having explicit XBT is a bit stricter, but perhaps still in the spirit of what was intended; let me know if any other detector types should be admitted.

@bkatiemills bkatiemills mentioned this pull request Mar 20, 2016
@s-good
Copy link
Contributor

s-good commented Mar 20, 2016

Yes, it is a problem when only these two letter codes are available to work out the type of instrument. I think we can assume that if they have BA or XB in the Fortran they meant to select XBTs, so it sounds like what you have done is fine.

@BecCowley
Copy link
Member

@BillMills and @s-good apologies for not paying attention to this.

BA is the GTS (global transmission system) version of XBTs. It is a 'bathy' message and is a low-resolution, non-QCd version of the full profile. Sometimes they are replaced by the high-resolution version, and sometimes not. There are possibly a lot of duplicates of low and high resolution.

Here is a list of the codes:
Data_type codes ('data_t' field in keys file)

BA - bathy
BO - bottle
BT - micro or digital bathythermograph (early xbt)?
CT - CTD
CU – CTD upcast
DB - drifting buoy
DT – digital thermograph
MB – MBT (mechanical bathythermograph)
MR - moored buoy (moorings)
ST – surface temperature
TE - tesac
UN - Unknown
UO - Undulating oceanic recorder
XB - XBT
XC – XCTD

Also, I attach a description of the MQuest netcdf format used for the Quota database (although I think the plan is for Tim to convert it to WOD format).
MQNC_format.txt

I will look at your questions on the fortran in a bit more detail and get back to you with comments.

@bkatiemills
Copy link
Member Author

Thanks, @BecCowley! Apologies for the very long comment above - I think it's all sorted out, with the exception of the final section regarding the spike test, implemented as CSIRO_short_gradient.py; hopefully I've captured the spirit of what was intended here, but I'm just concerned with whether I've made the right connection between the text and code.

*edit - also, I dropped the bottle temperature check, since in our case, these profiles will definitely be flagged by EN_range.

@BecCowley
Copy link
Member

@BillMills. My logic interpretation of the short gradient/spike test:

A 'short' gradient is a single spike or immediate gradient change over a small depth range.

The code only looks for spikes greater than or equal to 0.2 and less than 0.5. deg C when the depth is less than/equal to 30m. The test only looks for a positive increase. However, in the Fortran code it is a loop, running down the temperature/depth profile. So it will pick up the spike/gradient no matter which way the direction is:

If the rogue point is at depth m and is positive, temp2s-temp1s will be positive when temp2s = m and temp1s = m-1.
If the rogue point is at depth m and is negative, temp2s-temp1s will be positive when temp2s = m+1 and temp1s = m.

Translating this to python will probably utilise an absolute statement.

Moving down the profile, once the depth is greater than 30m, the gradients/spikes checked are less than 0.4degC.

Next, the code checks for the 'long' gradients which are gradual inversions of positive temperature changes. These shouldn't be applied to the Southern Ocean because such inversions are normal there and we don't want to check them. However, I can't see the switch to check the latitude in the code. It might be somewhere else, I can hunt that down.

Fixing spikes at this stage is not a good idea. We need to check how efficient the code is at finding the spikes. Ultimately, we will want to interpolate over them, I think, but that might be up for discussion.

@bkatiemills
Copy link
Member Author

Thanks so much, @BecCowley! I hadn't quite picked up on the long inversion, thanks for pointing it out. Two (hopefully last) questions:

  • short inversion: sorry if I'm just being very dense, but I'm looking at the line
if((temp2s-temp1s).ge.0.5 .and. (depth2s-depth1s).le. 30.)then

Seems like this is checking for a depth difference of 30m between this and the next point, not an absolute depth of 30m - as always, you're the boss, happy to implement anything you like - just want to make sure I'm implementing things as intended.

  • long inversion: given that all the conditions for flagging a long inversion level are satisfied, which level exactly is flagged? It looks like the one at the start of the inversion, but I may be wrong.

Thanks again for all your help!

@BecCowley
Copy link
Member

@BillMills, yes, you are correct. I missed that subtlety. The check is looking for gradients between points with <30m depth differences. That is so we avoid falsely labelling gradients where the depth resolution is very low.

The flag 'GL' gets added at line 396-397 and is set at depth 'dgradl' which is at depth i. The test is for depth i+1 being higher in temperature than depth i.

Note the flag is only applied if the 'gradlong' value is less than 4.

@bkatiemills bkatiemills merged commit 9281efc into IQuOD:master Apr 7, 2016
@bkatiemills
Copy link
Member Author

Thanks @BecCowley, looks good to me now - I think that should do it for v1!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants