Translating CSIRO tests #130

bkatiemills · 2015-12-24T21:18:11Z

While #129 finished implementing the gradient, wire break and shallow XBT depth test from CSIRO's paper, examining the underlying FORTRAN revealed a number of qc tests that weren't obvious from the text; this PR begins implementing them. Before we merge, comments from @BecCowley on the questions inline below would be much appreciated.

Constant Bottom Test

Constant bottom test checks if the two deepest temperatures in a profile are the same, and flags the bottom one only if they are from a latitude above -40, and only if they are more than 30m apart. Original FORTRAN:

c  check for constant temps at bottom of trace:

      if(data_type.eq.'XB'.or.data_type.eq.'BA')then
         if(indep.gt.1)then
            if(temp(indep).eq.temp(indep-1).and.latitude.gt.-40.)then
               if((depth(indep)-depth(indep-1).gt.30.)
     &  .and.(Qflags(indep).ne.'3'.and.Qflags(indep).ne.'4'))then
                  grad=.true.
                  ctr=.true.
                  Qflags(indep)='3'
               endif
            endif
         endif
      endif

Reimplemented in CSIRO_constant_bottom.py. Problem: what are these data type codes XB and BA, in terms of the WOD encoding described on table 2.13 page 90 of this document? I assume XB is XBT == 2, but I'm not sure what BA is.

Surface Spikes Test

Surface spikes test flags any level shallower than 4m, which is followed by another level shallower than 8m. Original FORTRAN:

c  check for surface spikes and remove:

      if(data_type.eq.'XB' .or. data_type.eq.'BA')then
         i=1
         do while(depth(i).lt.4. .and.temp(i).lt.90.)
            if(Qflags(i).ne.'3'.and.Qflags(i).ne.'4')then

               if(i.lt.indep-1)then
                  if(depth(i+1).le.8.)then
                     write (str10, '(f10.2)') temp(i)
             call add_hist ('CS', 'CSCB', 'grad', Up_Date, 
     &          'CS', 'TEMP', depth(i), str10, 0)
                     temp(i)=99.99
                     Qflags(i)='5'
                     grad=.true.
                     csa=.true.
                     ddcs=i
                  endif
               endif
            endif
            i=i+1
         enddo
      endif

 7788 continue

Reimplemented as CSIRO_surface_spikes.py. I'm a bit concerned about my reading of this test; as described, it seems to flag clusters of shallow levels, which doesn't much seem like a spike - I'm concerned I may have just misunderstood this test altogether. Our implementation omits the temperature check; checking temperature < 90 seems to be a flag check rather than an actual physical value check, since the FORTRAN uses temperature = 99.99 to flag suspicious temperatures.

Bottle Temperature Check

Bottle temperature check flags any level with temperature < -20 C, if it came from one of several detector types. Original FORTRAN:

c  check for bad bottle values:

      if(data_type.eq.'BO' .or. data_type.eq.'UN')then

         do i=1,indep
            if(temp(i).le.-20.)then
               temp(i)=99.99
               grad=.true.
               bbd=.true.
               dbbd=depth(i)
        call add_hist ('CS', 'AUTO', 'grad', Up_Date, 
     &      'BB', 'TEMP', dbbd, '   9999.99', 0)
c            print *,'added bb flag to BO at ',dbbd,' m'
            endif
         enddo
      endif
c     do same for CTDs but don't flag with BB
      if(data_type.eq.'CT') then
         do i=1,indep
            if(temp(i).le.-20.)then
c               print *,'temp le -20=',temp(i)
               temp(i)=99.99
               grad=.true.
               bbd=.true.
               dbbd=depth(i)
            endif
c               print *,'temp ge -20=',temp(i)
         enddo


      endif

Reimplemented in CSIRO_bottle.py. Like the first test, I'm not sure what the detector codes in the FORTRAN correspond to. I'm currently guessing:

BO == bottle, rossette or net == code 7 from the WOD table linked above
UN == underway == code 8
CT == CTD == code 4; note that there are several other types of CTD (XCTD and towed CTD) available in the WOD spec, not included in code 4.

Spike Test

Finally, the paper linked above briefly describes a spike test which I can't quite tease out of the FORTRAN. The paper describes:

Large single point spikes (> 1°C) are found and removed from the profile using interpolation for high-resolution profiles (e.g., Fig. 1) and replacement with a missing value if the profile is of low vertical resolution (e.g., Fig. 2).

Which isn't quite enough information for me to reconstruct this test. There are some lines of FORTRAN that roughly correspond to a spike check:

         if(temp2s-temp1s.eq.0.)goto 90
         if(abs(temp2s-temp1s).lt.0.20)goto 90  !ignore small spikes...
         gradshort=(depth2s-depth1s)/(temp2s-temp1s)
         if((temp2s-temp1s).ge.0.5 .and. (depth2s-depth1s).le. 30.)then
            if(tpq(i+1).eq.'3'.or.tpq(i+1).eq.'4')then

            else
               grad=.true.
               dgrads=depth1s
               grads=.true.
               if(abs(temp2s-temp1s).ge.0.4)then
                  if(data_type.eq.'BO')then !do this to just flag the BO spikes, not chop them

                     write (str102, '(f10.2)') temp2s

                     call add_hist ('CS', 'AUTO', 'grad', Up_Date, 
     &          'SP', 'TEMP', depth2s, str102, 0)

                  else
c                     print *,'calling fixspikes for non-bottles'
c                     print *,'t1,t2,d1,d2',temp1s,temp1s,depth1s,depth2s
                     call fixspikes(j,indep)
                  endif
               endif

The first part looks to be flagging a positive jump of at least 0.5 C (not 1 C) in less than 30 m - which I assume is the high depth resolution criteria. Thereafter, fixspikes is applied on neighboring temperatures at least 0.4 C apart from each other (in either direction), but only as a subclause of the original condition that there be a positive 0.5 degree increase moving down the profile. Which is all well and good if it's what we want - but seems much more sophisticated than what the text is proposing. Any thoughts on how to proceed, baring in mind that at this stage we won't be proposing any modifications to the data (we're raising flags only for now), and we want to be fairly loose in our criteria for suspicious data, so that the auto QC catches as close to all suspicious profiles as possible, would be much appreciated.

Many thanks for sharing these ideas, @BecCowley, and for any clarifications you can offer!

…ranslation

bkatiemills · 2016-03-20T20:07:50Z

Hi folks,

Per @s-good's request in #146, here's another go at pulling these CSIRO translations together. A few (brief) comments:

From above, the only outstanding problem is the probe codes. I am currently guessing:
- BA == ???
- XB == XBT == code 2 from table 2.13 page 90 here
- BO == bottle, rossette or net == code 7
- UN == underway == code 8
- CT == CTD == code 4; note that there are several other types of CTD (XCTD and towed CTD) available in the WOD spec, not included in code 4.
These tests extract the logic of the CSIRO tests as closely as possible, while simplifying them to just raise a flag or not - no cleaning, no spectrum of different flags. I tried to match the FORTRAN faithfully, but @BecCowley or others closer to the original work are best able to judge if we caught the key points.

The statistical test described here remains to be implemented. This test examines whether a profile falls too far outside a historical range for the region; the paper describes a multi-step automatic and manual QC process performed to build up sufficient statistics to construct these historical ranges. The semi-manual approach doesn't quite fit with what we're building here, but I was wondering if it would be possible to identify some static datafile describing the ranges found in that study, which we could apply without repeating their construction - perhaps @BecCowley knows if this is feasible?

In any case, this statistical test can come in a separate PR - these tests will be ready for merge once the bullets above are addressed.

s-good · 2016-03-20T20:56:54Z

The probe codes look similar to those on https://www.nodc.noaa.gov/GTSPP/document/codetbls/gtsppcodes/gtspp_type.html although there is no UN on that table. If that is the case BA means BATHY message which refers to the format the data is transmitted in rather than the probe type. The crude assumption is that these are XBTs, although in practise data from other probe types might be sent using this format.

bkatiemills · 2016-03-20T21:52:21Z

Alright well - BA only ever appears or'ed together with XB in the original FORTRAN, so that would be consistent with wanting to consider data from an XBT; only having explicit XBT is a bit stricter, but perhaps still in the spirit of what was intended; let me know if any other detector types should be admitted.

s-good · 2016-03-20T23:06:12Z

Yes, it is a problem when only these two letter codes are available to work out the type of instrument. I think we can assume that if they have BA or XB in the Fortran they meant to select XBTs, so it sounds like what you have done is fine.

BecCowley · 2016-03-29T22:50:55Z

@BillMills and @s-good apologies for not paying attention to this.

BA is the GTS (global transmission system) version of XBTs. It is a 'bathy' message and is a low-resolution, non-QCd version of the full profile. Sometimes they are replaced by the high-resolution version, and sometimes not. There are possibly a lot of duplicates of low and high resolution.

Here is a list of the codes:
Data_type codes ('data_t' field in keys file)

BA - bathy
BO - bottle
BT - micro or digital bathythermograph (early xbt)?
CT - CTD
CU – CTD upcast
DB - drifting buoy
DT – digital thermograph
MB – MBT (mechanical bathythermograph)
MR - moored buoy (moorings)
ST – surface temperature
TE - tesac
UN - Unknown
UO - Undulating oceanic recorder
XB - XBT
XC – XCTD

Also, I attach a description of the MQuest netcdf format used for the Quota database (although I think the plan is for Tim to convert it to WOD format).
MQNC_format.txt

I will look at your questions on the fortran in a bit more detail and get back to you with comments.

bkatiemills · 2016-03-29T23:27:12Z

Thanks, @BecCowley! Apologies for the very long comment above - I think it's all sorted out, with the exception of the final section regarding the spike test, implemented as CSIRO_short_gradient.py; hopefully I've captured the spirit of what was intended here, but I'm just concerned with whether I've made the right connection between the text and code.

*edit - also, I dropped the bottle temperature check, since in our case, these profiles will definitely be flagged by EN_range.

BecCowley · 2016-04-01T00:26:48Z

@BillMills. My logic interpretation of the short gradient/spike test:

A 'short' gradient is a single spike or immediate gradient change over a small depth range.

The code only looks for spikes greater than or equal to 0.2 and less than 0.5. deg C when the depth is less than/equal to 30m. The test only looks for a positive increase. However, in the Fortran code it is a loop, running down the temperature/depth profile. So it will pick up the spike/gradient no matter which way the direction is:

If the rogue point is at depth m and is positive, temp2s-temp1s will be positive when temp2s = m and temp1s = m-1.
If the rogue point is at depth m and is negative, temp2s-temp1s will be positive when temp2s = m+1 and temp1s = m.

Translating this to python will probably utilise an absolute statement.

Moving down the profile, once the depth is greater than 30m, the gradients/spikes checked are less than 0.4degC.

Next, the code checks for the 'long' gradients which are gradual inversions of positive temperature changes. These shouldn't be applied to the Southern Ocean because such inversions are normal there and we don't want to check them. However, I can't see the switch to check the latitude in the code. It might be somewhere else, I can hunt that down.

Fixing spikes at this stage is not a good idea. We need to check how efficient the code is at finding the spikes. Ultimately, we will want to interpolate over them, I think, but that might be up for discussion.

This reverts commit e7ac6ee.

bkatiemills · 2016-04-02T01:25:06Z

Thanks so much, @BecCowley! I hadn't quite picked up on the long inversion, thanks for pointing it out. Two (hopefully last) questions:

short inversion: sorry if I'm just being very dense, but I'm looking at the line

if((temp2s-temp1s).ge.0.5 .and. (depth2s-depth1s).le. 30.)then

Seems like this is checking for a depth difference of 30m between this and the next point, not an absolute depth of 30m - as always, you're the boss, happy to implement anything you like - just want to make sure I'm implementing things as intended.

long inversion: given that all the conditions for flagging a long inversion level are satisfied, which level exactly is flagged? It looks like the one at the start of the inversion, but I may be wrong.

Thanks again for all your help!

BecCowley · 2016-04-04T06:20:37Z

@BillMills, yes, you are correct. I missed that subtlety. The check is looking for gradients between points with <30m depth differences. That is so we avoid falsely labelling gradients where the depth resolution is very low.

The flag 'GL' gets added at line 396-397 and is set at depth 'dgradl' which is at depth i. The test is for depth i+1 being higher in temperature than depth i.

Note the flag is only applied if the 'gradlong' value is less than 4.

bkatiemills · 2016-04-07T02:10:10Z

Thanks @BecCowley, looks good to me now - I think that should do it for v1!

bkatiemills added 5 commits December 24, 2015 12:08

began some translations of qc tests from CSIRO work

cad74c0

Merge branch 'master' of https://github.com/IQuOD/AutoQC into CSIRO-t…

6fa76ef

…ranslation

surface spikes

91f3500

Merge branch 'master' of https://github.com/IQuOD/AutoQC into CSIRO-t…

ed1749a

…ranslation

polishing CSIRO tests

cd4145a

bkatiemills mentioned this pull request Mar 20, 2016

How are we doing? #146

Open

bkatiemills added 3 commits April 1, 2016 17:22

exploring new qc tests

e7ac6ee

added long inversoin

cd95a29

Revert "exploring new qc tests"

75b1b92

This reverts commit e7ac6ee.

completes CSIRO implementation

95343df

bkatiemills merged commit 9281efc into IQuOD:master Apr 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translating CSIRO tests #130

Translating CSIRO tests #130

bkatiemills commented Dec 24, 2015

bkatiemills commented Mar 20, 2016

s-good commented Mar 20, 2016

bkatiemills commented Mar 20, 2016

s-good commented Mar 20, 2016

BecCowley commented Mar 29, 2016

bkatiemills commented Mar 29, 2016

BecCowley commented Apr 1, 2016

bkatiemills commented Apr 2, 2016

BecCowley commented Apr 4, 2016

bkatiemills commented Apr 7, 2016

Translating CSIRO tests #130

Translating CSIRO tests #130

Conversation

bkatiemills commented Dec 24, 2015

Constant Bottom Test

Surface Spikes Test

Bottle Temperature Check

Spike Test

bkatiemills commented Mar 20, 2016

s-good commented Mar 20, 2016

bkatiemills commented Mar 20, 2016

s-good commented Mar 20, 2016

BecCowley commented Mar 29, 2016

bkatiemills commented Mar 29, 2016

BecCowley commented Apr 1, 2016

bkatiemills commented Apr 2, 2016

BecCowley commented Apr 4, 2016

bkatiemills commented Apr 7, 2016