diff --git a/deployment.html b/deployment.html index c9e2bdd..361c334 100644 --- a/deployment.html +++ b/deployment.html @@ -2181,8 +2181,8 @@
01 Feb, 2024 | 13.5 | -78.0 | -OCEANOLOG | +🇨🇦 OCEANOLOG | 300534062474370 | 4902609 | AI2600-22CA007 | @@ -2350,7 +2350,7 @@20 Apr, 2024 | 43.5 | -58.0 | -TELEOST | +🇨🇦 TELEOST | 300125061376360 | 4902622 | P41305-22CA003 | @@ -2362,7 +2362,7 @@25 Apr, 2024 | 42.5 | -61.5 | -TELEOST | +🇨🇦 TELEOST | 300125010910310 | 4902674 | P41305-23CA001 | @@ -2374,7 +2374,7 @@30 Apr, 2024 | 43.5 | -57.0 | -TELEOST | +🇨🇦 TELEOST | 300125061376210 | 4902621 | P41305-22CA002 | @@ -2386,7 +2386,7 @@14 May, 2024 | 56.0 | -52.0 | -CELTIC EXPLORER | +🇮🇪 CELTIC EXPLORER | 300125062423120 | 4902686 | P53865-23CA003 | @@ -2398,7 +2398,7 @@16 May, 2024 | 56.5 | -51.5 | -CELTIC EXPLORER | +🇮🇪 CELTIC EXPLORER | 300125062031400 | 4902687 | P53865-23CA004 | @@ -2410,7 +2410,7 @@18 May, 2024 | 57.0 | -51.0 | -CELTIC EXPLORER | +🇮🇪 CELTIC EXPLORER | 300125062035430 | 4902688 | P53865-23CA005 | @@ -2422,7 +2422,7 @@26 May, 2024 | 50.0 | -145.0 | -JOHN P. TULLY | +🇨🇦 JOHN P. TULLY | 300125062909910 | 4902690 | P53875-23CA001 | @@ -2434,7 +2434,7 @@08 Jun, 2024 | 60.0 | -54.0 | -CAPT. JACQUES CARTIER | +🇨🇦 CAPT. JACQUES CARTIER | 300125010915310 | 4902675 | P41305-23CA002 | @@ -2446,7 +2446,7 @@11 Jun, 2024 | 60.0 | -54.5 | -CAPT. JACQUES CARTIER | +🇨🇦 CAPT. JACQUES CARTIER | 300125010915300 | 4902676 | P41305-23CA003 | @@ -2458,7 +2458,7 @@14 Jun, 2024 | 60.0 | -55.0 | -CAPT. JACQUES CARTIER | +🇨🇦 CAPT. JACQUES CARTIER | 300125010917310 | 4902677 | P41305-23CA004 | @@ -2470,7 +2470,7 @@14 Jun, 2024 | 60.0 | -55.5 | -CAPT. JACQUES CARTIER | +🇨🇦 CAPT. JACQUES CARTIER | 300125010912280 | 4902678 | P41305-23CA005 | @@ -2524,7 +2524,7 @@4 | 2 | 0 | -10 | +8 |
Argo Canada | @@ -2591,7 +2591,7 @@IOS | 7 | 0 | -10 | +8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Argo Canada | diff --git a/deployment/inventory.csv b/deployment/inventory.csv index 6ddad15..19e5113 100644 --- a/deployment/inventory.csv +++ b/deployment/inventory.csv @@ -1,6 +1,6 @@ Program,Institute,ARVOR,ARVOR+DO,ARVOR-RBR,Deep ARVOR,PROVOR Argo Canada,BIO,3,1,0,0,8 -Argo Canada,IOS,1,4,2,0,10 +Argo Canada,IOS,1,4,2,0,8 Argo Canada,SAEON,5,1,0,0,0 Argo ONC,ONC,0,0,0,11,0 Argo Dalhousie,Dal,0,0,0,0,3 diff --git a/deployment/ship_info.csv b/deployment/ship_info.csv new file mode 100644 index 0000000..aa3e712 --- /dev/null +++ b/deployment/ship_info.csv @@ -0,0 +1,8 @@ +Ship,Country +LAURA BASSI,Italy +OCEANOLOG,Canada +HESPERIDES,Spain +TELEOST,Canada +CELTIC EXPLORER,Ireland +JOHN P. TULLY,Canada +CAPT. JACQUES CARTIER,Canada \ No newline at end of file diff --git a/notes/notes.json b/notes/notes.json index 1d01c13..04e73cc 100644 --- a/notes/notes.json +++ b/notes/notes.json @@ -13,7 +13,7 @@ "categories": [], "contents": "\n\n\n\nUpdate/Takeaways from ADMT\nCHLA_ADJUSTED: Raphaelle Suazede presented on adjusting chlorophyll using a factor from a lookup table (date, location) or calculated based on first 5 cycles. Current process is to divide by 2 for all data (Roesler 2017). Simple to implement in python RTQC once we have LUT.\nRT adjustment of DOXY is a high priority. Can provide a gain value after 5 cycles, and populate DOXY_ADJUSTED in RT at python RTQC step. Chris needs to talk to Anh about this - new VMS variable and populating SCIENTIFIC_CALIB_COMMENT.\nCycle Timing\nCanadian floats are almost all compliant with time of day sampling. Only floats that appeared in Drucker/Riser audit list were NOVA, deep ARVOR floats and 2 very recently deployed ARVORs that have been reprogrammed to 235hr cycles already.\n\n\n\n\n\n\nDiscussion on indicating time of day sampling regime in the meta file by John Gilson via CONFIG_MISSION_COMMENT. Not totally clear to me but I think we are meant to set it to -1 (i.e. TOD sampling off?).\nPresented our method for actively reprogramming PROVOR floats to effectively result in 235 hour cycles.\n\n\n\nPROVOR CTS5 floats: the new floats will require new syntax to reprogram, and also carry radiometry sensors. Plan will be to cycle through 24hr cycle plus extra local noon surfacings through the year.\n\n\n\nRTQC Of PROVOR CTS5 Floats\nNeed to add radiometry RTQC to python setup - just range test, very simple\nDMQC Updates\n4 floats ready for submission, going to do a couple more and submit as a batch\nMaking an effort to do 1-2/week right now\nWill need to spend some time validating in-air method in code\nUpon DMQC, updating DOXY_QC=1 to DOXY_QC=3. Should we do that for all files more urgently?\nDeployment Summary\nAll completed and planned deployments can be seen on the development blog planning page.\nSo far in 2023, 24 floats have been deployed:\n5 PROVOR / 14 ARVOR / 5 deep ARVOR\n11 BIO / 8 IOS / 5 ONC\n3 RBR / 21 SBE\nIn the remainder of 2023 and into early 2024, 13 more deployments are planned:\n2 PROVOR CTS5 / 9 ARVOR / 3 deep ARVOR\n7 BIO / 2 IOS / 2 ONC / 2 Dalhousie\n4 RBR / 9 SBE\nNotable deployments:\nThe crew of the DISCOVREY will be deploying 6 floats on their way to Cape Town, South Africa.\nAn additional 6 floats (not in table) will be delivered to a vessel in Cape Town via the DISCOVERY for deployment in the Western Indian Ocean, date TBD.\nThe HMCS VANCOUVER will deploy 2 floats near Hawaii\nThe 2 PROVOR CTS5 floats will be deployed in the Lab Sea from the MERIAN, being loaded on November 21.\nThe sailing vessel OCEANOLOG will deploy an ARVOR float as part of Sail for Science\n2 deep ARVOR floats will be deployed from the HESPERIDES in the Southern Ocean (more floats to avoid ice!)\nAn additional 5 floats (not in the table) will be sent on a joint Italian/NZ voyage to the Ross Sea\nBeaufort Floats\nFloat Programming\nThe two floats deployed in the Beaufort Sea (4902610 and 4902611) were updated with new ice sensing algorithm (ISA) parameters. The parameters were provided by Steve Jayne based on data analysis of the Beaufort Gyre.\nISA works by detecting a temperature threshold over a certain depth range. The suggested range was between 20-10dbar with a threshold of -1.21deg C (90% success rate of avoiding ice when present) or -1.03 deg C (95% success rate).\nI ran the parameter change by Jerome at NKE. He was cautious about it as the depth range Steve provided was more shallow than their default setting (-1.79 deg C between 50 and 20dbar), but that the changes looked fine in terms of reprogramming the float:\n!MC 2 235 # cycle timing\n!MC 3 235 # cycle timing phase 2, inaccessible\n!MC 17 475 # surface/mid threshold \n!MC 18 1000 # mid/bottom threshold\n!MC 20 5 # mid resolution\n!MC 21 10 # bottom resolution\n!IC 2 3 # number of detections to confirm ice, default\n!IC 3 20 # start pressure detection\n!IC 4 10 # stop pressure detection\n!IC 5 -1210 # temperature threshold, millidegrees\nBoth floats successfully received the parameter change, but I cannot see ice parameter values in the messages. May consult NKE on this.\nPreliminary Data\nFloat 4902610 collected 2 profiles before stopping reporting (last Oct 1, 2023). Float 4902611 has collected 3 (last reported Oct 25, 2023).\n\n\n\n\n\n\n\n\n\n\n\n\n", "preview": {}, - "last_modified": "2024-02-13T18:11:56+00:00", + "last_modified": "2024-02-21T18:35:47+00:00", "input_file": {} } ] diff --git a/posts/posts.json b/posts/posts.json index 92bbbe5..2b8fe98 100644 --- a/posts/posts.json +++ b/posts/posts.json @@ -13,7 +13,7 @@ "categories": [], "contents": "\nIn this post I will examine the updated backscatter testing described in Dall’Olmo et al. 2023 implementation in medsrtqc, a python implementation of Argo Real Time Quality Control (RTQC). In its current state, the package contains the RTQC tests for chlorophyll (CHLA), backscatter (BBP) and pH, as well as many of standard tests listed in the official vocabulary.\nAnalysis Structure\nWe will use the same floats and cycles shown as examples in Dall’Olmo et al., discuss the expected results, and then examine the results given by the package.\nFor each test, code for that test is shown in python. This code is incomplete, as it is just snippets of the tests written in medsrtqc, however the code structure and variable names should be instructive should anyone want to re-purpose these functions.\nA custom plotting function to show the flags of each profile is used throughout the post. The definition of that function can be found at the bottom of the post.\nThe floats we will be analyzing will have already been QC’ed (in fact, most have been DMQC’ed), so most data will already be flagged appropriately, or in some cases the QC flags may have been elevated by the DMQC operator. For this reason, we will fully reset all flags to 0 to simulate the actual RTQC result.\nMissing Data Test\nThe missing data test bins the profile and checks how many of those bins are populated. Good data will have all bins populated, probably bad data will have multiple bins with no data, and bad data will have only one bin with data.\nThe test is written as follows in python:\n\n# missing data test\nself.log('Performing missing data test')\n# find amount of data in each bin\nbins = [0, 50, 156, 261, 367, 472, 578, 683, 789, 894, 1000]\nhist, bins = np.histogram(bbp.pres, bins=bins)\n# 1 or more bin empty, flag as probably bad (3)\nnew_flag = Flag.PROBABLY_BAD if sum(hist == 0) > 1 else Flag.GOOD\n# all but 1 empty, flag as bad (4)\nnew_flag = Flag.BAD if sum(hist != 0) == 1 else new_flag\n# all empty, flag as missing (9)\nnew_flag = Flag.MISSING if all(hist == 0) else new_flag\n# update flags and log\nFlag.update_safely(bbp.qc, new_flag)\nself.log(f'Missing data test results: flags set to {new_flag}')\n\nThe examples shown in the paper are on floats 1901339 cycle 1, 4900561 cycle 8, and 6901004 cycle 41. We expect to see the full profile flagged as 3 (probably bad), 3, and 4 (bad) respectively.\n\nfrom medsrtqc.qc.util import ResetQCOperation\nfrom medsrtqc.qc.bbp import bbpTest\nfrom medsrtqc.nc import read_nc_profile\nfrom medsrtqc.qc.check import preTestCheck\n\n# example files\nfiles = [\"BD1901339_001.nc\", \"BR7900561_008.nc\", \"BD6901004_041.nc\"]\n# fig/axes to plot results\nfig, axes = plt.subplots(1, 3, sharey=True, constrained_layout=True)\n\ncheck = preTestCheck()\nbbp = bbpTest()\n\n# loop through each profile\nfor fn, ax in zip(files, axes):\n nc = read_nc_profile(\"data/\" + fn, mode=\"r+\")\n ResetQCOperation().run(nc)\n tests = check.run(nc)\n \n print(\"before: \", nc[\"BBP700\"])\n nc.prepare(tests)\n \n bbp.run(nc)\n print(\"after: \", nc[\"BBP700\"])\n \n g = plot_profile_flags(nc, ax=ax, ylim=(1200, 0), xlim=(-0.002, 0.015))\n nc.close()\nbefore: Trace(\n value=[0.0025575757, 0.0022508001, 0.0028643121, [25 values], 0.00084325275, 0.00070981274, 0.00081648194],\n qc=[b'0', b'0', b'0', [25 values], b'0', b'0', b'0'],\n adjusted=[masked, masked, masked, [25 values], masked, masked, masked],\n adjusted_qc=[masked, masked, masked, [25 values], masked, masked, masked],\n adjusted_error=[masked, masked, masked, [25 values], masked, masked, masked],\n pres=[7.61, 11.83, 21.68, [25 values], 281.02, 291.23, 300.71],\n mtime=[masked, masked, masked, [25 values], masked, masked, masked]\n)\nafter: Trace(\n value=[0.0025575757, 0.0022508001, 0.0028643121, [25 values], 0.00084325275, 0.00070981274, 0.00081648194],\n qc=[b'3', b'3', b'3', [25 values], b'3', b'3', b'3'],\n adjusted=[masked, masked, masked, [25 values], masked, masked, masked],\n adjusted_qc=[masked, masked, masked, [25 values], masked, masked, masked],\n adjusted_error=[masked, masked, masked, [25 values], masked, masked, masked],\n pres=[7.61, 11.83, 21.68, [25 values], 281.02, 291.23, 300.71],\n mtime=[masked, masked, masked, [25 values], masked, masked, masked]\n)\nbefore: Trace(\n value=[0.0004414144, 0.00048095727, 0.00046771928, [137 values], 0.00033030732, 0.00036985305, 0.0003302439],\n qc=[b'0', b'0', b'0', [137 values], b'0', b'0', b'0'],\n adjusted=[0.0004414144, 0.00048095727, 0.00046771928, [137 values], 0.00033030732, 0.00036985305, 0.0003302439],\n adjusted_qc=[b'4', b'4', b'4', [137 values], b'4', b'4', b'4'],\n adjusted_error=[masked, masked, masked, [137 values], masked, masked, masked],\n pres=[267.06625, 272.18, 277.255, [137 values], 982.7982, 987.80884, 991.61237],\n mtime=[masked, masked, masked, [137 values], masked, masked, masked]\n)\nafter: Trace(\n value=[0.0004414144, 0.00048095727, 0.00046771928, [137 values], 0.00033030732, 0.00036985305, 0.0003302439],\n qc=[b'3', b'3', b'3', [137 values], b'3', b'3', b'3'],\n adjusted=[0.0004414144, 0.00048095727, 0.00046771928, [137 values], 0.00033030732, 0.00036985305, 0.0003302439],\n adjusted_qc=[b'4', b'4', b'4', [137 values], b'4', b'4', b'4'],\n adjusted_error=[masked, masked, masked, [137 values], masked, masked, masked],\n pres=[267.06625, 272.18, 277.255, [137 values], 982.7982, 987.80884, 991.61237],\n mtime=[masked, masked, masked, [137 values], masked, masked, masked]\n)\nbefore: Trace(\n value=[0.0004893391, -0.00025918605, 0.0006870628, [1101 values], 0.00033398485, 0.001548573, 0.00043284672],\n qc=[b'0', b'0', b'0', [1101 values], b'0', b'0', b'0'],\n adjusted=[masked, masked, masked, [1101 values], masked, masked, masked],\n adjusted_qc=[b'4', b'4', b'4', [1101 values], b'4', b'4', b'4'],\n adjusted_error=[masked, masked, masked, [1101 values], masked, masked, masked],\n pres=[107.5, 107.5, 107.5, [1101 values], 107.5, 107.5, 107.5],\n mtime=[masked, masked, masked, [1101 values], masked, masked, masked]\n)\nafter: Trace(\n value=[0.0004893391, -0.00025918605, 0.0006870628, [1101 values], 0.00033398485, 0.001548573, 0.00043284672],\n qc=[b'4', b'4', b'4', [1101 values], b'4', b'4', b'4'],\n adjusted=[masked, masked, masked, [1101 values], masked, masked, masked],\n adjusted_qc=[b'4', b'4', b'4', [1101 values], b'4', b'4', b'4'],\n adjusted_error=[masked, masked, masked, [1101 values], masked, masked, masked],\n pres=[107.5, 107.5, 107.5, [1101 values], 107.5, 107.5, 107.5],\n mtime=[masked, masked, masked, [1101 values], masked, masked, masked]\n)\n\n[medsrtqc.nc.NetCDFProfile log] Setting previously unset flags for BBP to GOOD\n[medsrtqc.nc.NetCDFProfile log] Performing missing data test\n[medsrtqc.nc.NetCDFProfile log] Missing data test results: flags set to 3\n[medsrtqc.nc.NetCDFProfile log] Performing high deep value\n[medsrtqc.nc.NetCDFProfile log] Calculating running median over window size 5\n[medsrtqc.nc.NetCDFProfile log] High deep value test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing noisy profile test\n[medsrtqc.nc.NetCDFProfile log] Noisy profile test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing negative bbp test\n[medsrtqc.nc.NetCDFProfile log] Negative bbp test result: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Applying global range test to BBP\n[medsrtqc.nc.NetCDFProfile log] Performing negative spike test\n[medsrtqc.nc.NetCDFProfile log] Calculating running median over window size 5\n[medsrtqc.nc.NetCDFProfile log] Performing stuck value test on bbp\n[medsrtqc.nc.NetCDFProfile log] Setting previously unset flags for BBP to GOOD\n[medsrtqc.nc.NetCDFProfile log] Performing missing data test\n[medsrtqc.nc.NetCDFProfile log] Missing data test results: flags set to 3\n[medsrtqc.nc.NetCDFProfile log] Performing high deep value\n[medsrtqc.nc.NetCDFProfile log] Calculating running median over window size 5\n[medsrtqc.nc.NetCDFProfile log] High deep value test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing noisy profile test\n[medsrtqc.nc.NetCDFProfile log] Noisy profile test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing negative bbp test\n[medsrtqc.nc.NetCDFProfile log] Negative bbp test result: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing parking hook test\n[medsrtqc.nc.NetCDFProfile log] Parking hook test results: 0 points set to 4\n[medsrtqc.nc.NetCDFProfile log] Applying global range test to BBP\n[medsrtqc.nc.NetCDFProfile log] Performing negative spike test\n[medsrtqc.nc.NetCDFProfile log] Calculating running median over window size 5\n[medsrtqc.nc.NetCDFProfile log] Performing stuck value test on bbp\n[medsrtqc.nc.NetCDFProfile log] Setting previously unset flags for BBP to GOOD\n[medsrtqc.nc.NetCDFProfile log] Performing missing data test\n[medsrtqc.nc.NetCDFProfile log] Missing data test results: flags set to 4\n[medsrtqc.nc.NetCDFProfile log] Performing high deep value\n[medsrtqc.nc.NetCDFProfile log] Calculating running median over window size 5\n[medsrtqc.nc.NetCDFProfile log] High deep value test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing noisy profile test\n[medsrtqc.nc.NetCDFProfile log] Noisy profile test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing negative bbp test\n[medsrtqc.nc.NetCDFProfile log] Negative bbp test result: flags set to 3\n[medsrtqc.nc.NetCDFProfile log] Applying global range test to BBP\n[medsrtqc.nc.NetCDFProfile log] Performing negative spike test\n[medsrtqc.nc.NetCDFProfile log] Calculating running median over window size 5\n[medsrtqc.nc.NetCDFProfile log] Performing stuck value test on bbp\nplt.show()\n\n\nSummary\n✅ BD1901339_001: Flags are set to 3 as expected.\n✅ BD7900561_008: Flags are set to 3 as expected.\n✅ BD6901004_041: Flags are set to 4 as expected.\nHigh Deep Value Test\nThe high deep value test aims to flag profiles with anomalously high BBP values at depth. This could be a symptom of biofouling, poor calibration, or sensor error. These could also be valid data, in which case the flags would be returned to a value of 1 or 2 by the D-mode operator. The test checks if median-filtered BBP is higher than a threshold value of 5e-4.\nThe test is written as follows in python:\n\n# high deep value test\nself.log('Performing high deep value')\n# compute median, check which are above threshold value\nmedian_bbp = self.running_median(5)\nhigh_deep_value = (sum(bbp.pres > 700) > 5) & (np.nanmedian(median_bbp[bbp.pres > 700]) > 5e-4)\n# update to 3 if there are high deep values\nnew_flag = Flag.PROBABLY_BAD if high_deep_value else Flag.GOOD\nFlag.update_safely(bbp.qc, new_flag)\nself.log(f'High deep value test results: flags set to {new_flag}')\n\nThe example shown in the paper is on float 3901531 cycle 125. We expect the flags to be set to 3.\n\nfn = \"BD3901531_125.nc\"\nfig, ax = plt.subplots(constrained_layout=True)\nfig.set_size_inches(fig.get_figwidth()*2/5, fig.get_figheight())\n\nnc = read_nc_profile(\"data/\" + fn, mode=\"r+\")\nResetQCOperation().run(nc)\ntests = check.run(nc)\n\nprint(\"before: \", nc[\"BBP700\"])\nbefore: Trace(\n value=[0.009043367, 0.027663978, 0.021865454, [1082 values], 0.0023098632, 0.0025140366, 0.002350698],\n qc=[b'0', b'0', b'0', [1082 values], b'0', b'0', b'0'],\n adjusted=[0.009043367, 0.027663978, 0.021865454, [1082 values], 0.0023098632, 0.0025140366, 0.002350698],\n adjusted_qc=[b'3', b'3', b'3', [1082 values], b'3', b'4', b'4'],\n adjusted_error=[masked, masked, masked, [1082 values], masked, masked, masked],\n pres=[0.0, 0.0, 0.0, [1082 values], 1017.7, 1017.7, 1017.5],\n mtime=[masked, masked, masked, [1082 values], masked, masked, masked]\n)\nnc.prepare(tests)\nbbp.run(nc)\n[medsrtqc.nc.NetCDFProfile log] Setting previously unset flags for BBP to GOOD\n[medsrtqc.nc.NetCDFProfile log] Performing missing data test\n[medsrtqc.nc.NetCDFProfile log] Missing data test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing high deep value\n[medsrtqc.nc.NetCDFProfile log] Calculating running median over window size 5\n[medsrtqc.nc.NetCDFProfile log] High deep value test results: flags set to 3\n[medsrtqc.nc.NetCDFProfile log] Performing noisy profile test\n[medsrtqc.nc.NetCDFProfile log] Noisy profile test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing negative bbp test\n[medsrtqc.nc.NetCDFProfile log] Negative bbp test result: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing parking hook test\n[medsrtqc.nc.NetCDFProfile log] Parking hook test results: 12 points set to 4\n[medsrtqc.nc.NetCDFProfile log] Applying global range test to BBP\n[medsrtqc.nc.NetCDFProfile log] Performing negative spike test\n[medsrtqc.nc.NetCDFProfile log] Calculating running median over window size 5\n[medsrtqc.nc.NetCDFProfile log] Performing stuck value test on bbp\nprint(\"after: \", nc[\"BBP700\"])\nafter: Trace(\n value=[0.009043367, 0.027663978, 0.021865454, [1082 values], 0.0023098632, 0.0025140366, 0.002350698],\n qc=[b'3', b'3', b'3', [1082 values], b'3', b'4', b'4'],\n adjusted=[0.009043367, 0.027663978, 0.021865454, [1082 values], 0.0023098632, 0.0025140366, 0.002350698],\n adjusted_qc=[b'3', b'3', b'3', [1082 values], b'3', b'4', b'4'],\n adjusted_error=[masked, masked, masked, [1082 values], masked, masked, masked],\n pres=[0.0, 0.0, 0.0, [1082 values], 1017.7, 1017.7, 1017.5],\n mtime=[masked, masked, masked, [1082 values], masked, masked, masked]\n)\ng = plot_profile_flags(nc, ax=ax, ylim=(1200, 0), xlim=(-0.002, 0.015))\nnc.close()\nplt.show()\n\n\nSummary\n✅ BD3901531_125: Flags are set to 3 as expected. Some points at the bottom also fail the Parking Hook Test.\nNoisy Profile Test\nFlag profiles that are affected by noisy data by checking the portion of residuals to the median profile above a threshold value.\nThe test is written as follows in python:\n\nself.log('Performing noisy profile test')\n# below surface\ndeep_ix = bbp.pres > 100\n# compute residuals below surface\nresidual = bbp.value - median_bbp\nhigh_residuals = residual > 0.0005\nhigh_residuals = high_residuals[deep_ix]\n# portion of residuals\npct_residuals = 100*sum(high_residuals)/len(high_residuals)\nmany_high_residuals = pct_residuals > 10\n# if there are a lot of high residuals, flag as 3\nnew_flag = Flag.PROBABLY_BAD if many_high_residuals else Flag.GOOD\nFlag.update_safely(bbp.qc, new_flag)\nself.log(f'Noisy profile test results: flags set to {new_flag}')\n\nOn float 6901151 cycle 79 we expect to flag to profile as 3:\n\nfn = \"BD6901151_079.nc\"\nfig, ax = plt.subplots(constrained_layout=True)\nfig.set_size_inches(fig.get_figwidth()*2/5, fig.get_figheight())\n\nnc = read_nc_profile(\"data/\" + fn, mode=\"r+\")\nResetQCOperation().run(nc)\ntests = check.run(nc)\n\nprint(\"before: \", nc[\"BBP700\"])\nbefore: Trace(\n value=[0.0074264565, 0.0022428727, 0.0032238269, [328 values], 0.0036023555, 0.0008325885, 0.0008325483],\n qc=[b'0', b'0', b'0', [328 values], b'0', b'0', b'0'],\n adjusted=[0.0074264565, 0.0022428727, 0.0032238269, [328 values], 0.0036023555, 0.0008325885, 0.0008325483],\n adjusted_qc=[b'3', b'3', b'3', [328 values], b'3', b'3', b'3'],\n adjusted_error=[masked, masked, masked, [328 values], masked, masked, masked],\n pres=[0.7, 1.1, 1.2, [328 values], 977.6, 989.2, 993.8],\n mtime=[masked, masked, masked, [328 values], masked, masked, masked]\n)\nnc.prepare(tests)\nbbp.run(nc)\n[medsrtqc.nc.NetCDFProfile log] Setting previously unset flags for BBP to GOOD\n[medsrtqc.nc.NetCDFProfile log] Performing missing data test\n[medsrtqc.nc.NetCDFProfile log] Missing data test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing high deep value\n[medsrtqc.nc.NetCDFProfile log] Calculating running median over window size 5\n[medsrtqc.nc.NetCDFProfile log] High deep value test results: flags set to 3\n[medsrtqc.nc.NetCDFProfile log] Performing noisy profile test\n[medsrtqc.nc.NetCDFProfile log] Noisy profile test results: flags set to 3\n[medsrtqc.nc.NetCDFProfile log] Performing negative bbp test\n[medsrtqc.nc.NetCDFProfile log] Negative bbp test result: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing parking hook test\n[medsrtqc.nc.NetCDFProfile log] Parking hook test results: 1 points set to 4\n[medsrtqc.nc.NetCDFProfile log] Applying global range test to BBP\n[medsrtqc.nc.NetCDFProfile log] Performing negative spike test\n[medsrtqc.nc.NetCDFProfile log] Calculating running median over window size 5\n[medsrtqc.nc.NetCDFProfile log] Performing stuck value test on bbp\nprint(\"after: \", nc[\"BBP700\"])\nafter: Trace(\n value=[0.0074264565, 0.0022428727, 0.0032238269, [328 values], 0.0036023555, 0.0008325885, 0.0008325483],\n qc=[b'3', b'3', b'3', [328 values], b'3', b'3', b'3'],\n adjusted=[0.0074264565, 0.0022428727, 0.0032238269, [328 values], 0.0036023555, 0.0008325885, 0.0008325483],\n adjusted_qc=[b'3', b'3', b'3', [328 values], b'3', b'3', b'3'],\n adjusted_error=[masked, masked, masked, [328 values], masked, masked, masked],\n pres=[0.7, 1.1, 1.2, [328 values], 977.6, 989.2, 993.8],\n mtime=[masked, masked, masked, [328 values], masked, masked, masked]\n)\ng = plot_profile_flags(nc, ax=ax, ylim=(1200, 0), xlim=(-0.002, 0.015))\nnc.close()\nplt.show()\n\n\nSummary\n✅ BD6901151_079: Flags are set to 3 as expected. One point also fails the Parking Hook Test.\nNegative BBP Test\nThe objective of this test is to flag negative backscatter values. Negative values in the top 5dbar will be flagged as 4 as they most likely represent in-air backscatter samples. If there are other negative values, the profile will be flagged as 3, or if the profile consists of more than 10% negative values, the profile is flagged as 4.\nThe test is written as follows in python:\n\nself.log('Performing negative bbp test')\n# negative points at the very surface\nshallow_and_negative = (bbp.pres < 5) & (bbp.value < 0)\nif any(shallow_and_negative):\n self.log(f'Negative bbp test results: shallow negative flags set to {Flag.BAD}')\n# update surface flags\nFlag.update_safely(bbp.qc, Flag.BAD, where=shallow_and_negative)\n# negative values in rest of profile\ndeep_and_negative = (bbp.pres > 5) & (bbp.value < 0)\npct_negative = 100*sum(deep_and_negative)/len(deep_and_negative)\n# if more than 10 pct are negative\nmany_negative = pct_negative > 10\n# flag as 3 if any negative\nnew_flag = Flag.PROBABLY_BAD if any(deep_and_negative) else Flag.GOOD\n# flag as 4 if > 10% negative\nnew_flag = Flag.BAD if many_negative else new_flag\nFlag.update_safely(bbp.qc, new_flag)\nself.log(f'Negative bbp test result: flags set to {new_flag}')\n\nFor this test we will check 2 floats: 6901654 cycle 56 and 6901151 cycle 7. For the first profile, there is only a surface negative value, which should be flagged as 4 but the rest of the profile should be fine. For the second profile, there are negative deep values so the whole profile should be flagged 3 or 4 depending on the portion of negative data.\n\n# example files\nfiles = [\"BD6901654_056.nc\", \"BD6901151_007.nc\"]\n# fig/axes to plot results\nfig, axes = plt.subplots(1, 2, sharey=True, constrained_layout=True)\nfig.set_size_inches(fig.get_figwidth()*4/5, fig.get_figheight())\n\n# loop through each profile\nfor fn, ax in zip(files, axes):\n nc = read_nc_profile(\"data/\" + fn, mode=\"r+\")\n ResetQCOperation().run(nc)\n tests = check.run(nc)\n \n print(\"before: \", nc[\"BBP700\"])\n nc.prepare(tests)\n \n bbp.run(nc)\n print(\"after: \", nc[\"BBP700\"])\n \n g = plot_profile_flags(nc, ax=ax, ylim=(1200, 0), xlim=(-0.002, 0.005))\n nc.close()\nbefore: Trace(\n value=[-0.0004011667, 0.00046792105, 0.00046791686, [1190 values], 0.00025597008, 0.00024438222, 0.00025597008],\n qc=[b'0', b'0', b'0', [1190 values], b'0', b'0', b'0'],\n adjusted=[-0.0004011667, 0.00046792105, 0.00046791686, [1190 values], 0.00025597008, 0.00024438222, 0.00025597008],\n adjusted_qc=[b'4', b'1', b'1', [1190 values], b'1', b'1', b'1'],\n adjusted_error=[masked, masked, masked, [1190 values], masked, masked, masked],\n pres=[0.6, 0.7, 1.1, [1190 values], 986.7, 986.7, 986.7],\n mtime=[masked, masked, masked, [1190 values], masked, masked, masked]\n)\nafter: Trace(\n value=[-0.0004011667, 0.00046792105, 0.00046791686, [1190 values], 0.00025597008, 0.00024438222, 0.00025597008],\n qc=[b'4', b'1', b'1', [1190 values], b'1', b'1', b'1'],\n adjusted=[-0.0004011667, 0.00046792105, 0.00046791686, [1190 values], 0.00025597008, 0.00024438222, 0.00025597008],\n adjusted_qc=[b'4', b'1', b'1', [1190 values], b'1', b'1', b'1'],\n adjusted_error=[masked, masked, masked, [1190 values], masked, masked, masked],\n pres=[0.6, 0.7, 1.1, [1190 values], 986.7, 986.7, 986.7],\n mtime=[masked, masked, masked, [1190 values], masked, masked, masked]\n)\nbefore: Trace(\n value=[0.00080094114, 0.00082017767, 0.00083941227, [354 values], -0.0003456371, -0.0002303965, -0.00021136053],\n qc=[b'0', b'0', b'0', [354 values], b'0', b'0', b'0'],\n adjusted=[0.00080094114, 0.00082017767, 0.00083941227, [354 values], -0.0003456371, -0.0002303965, -0.00021136053],\n adjusted_qc=[b'4', b'4', b'4', [354 values], b'4', b'4', b'4'],\n adjusted_error=[masked, masked, masked, [354 values], masked, masked, masked],\n pres=[0.6, 1.1, 1.2, [354 values], 1840.2, 1903.7, 1970.2],\n mtime=[masked, masked, masked, [354 values], masked, masked, masked]\n)\nafter: Trace(\n value=[0.00080094114, 0.00082017767, 0.00083941227, [354 values], -0.0003456371, -0.0002303965, -0.00021136053],\n qc=[b'4', b'4', b'4', [354 values], b'4', b'4', b'4'],\n adjusted=[0.00080094114, 0.00082017767, 0.00083941227, [354 values], -0.0003456371, -0.0002303965, -0.00021136053],\n adjusted_qc=[b'4', b'4', b'4', [354 values], b'4', b'4', b'4'],\n adjusted_error=[masked, masked, masked, [354 values], masked, masked, masked],\n pres=[0.6, 1.1, 1.2, [354 values], 1840.2, 1903.7, 1970.2],\n mtime=[masked, masked, masked, [354 values], masked, masked, masked]\n)\n\n[medsrtqc.nc.NetCDFProfile log] Setting previously unset flags for BBP to GOOD\n[medsrtqc.nc.NetCDFProfile log] Performing missing data test\n[medsrtqc.nc.NetCDFProfile log] Missing data test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing high deep value\n[medsrtqc.nc.NetCDFProfile log] Calculating running median over window size 5\n[medsrtqc.nc.NetCDFProfile log] High deep value test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing noisy profile test\n[medsrtqc.nc.NetCDFProfile log] Noisy profile test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing negative bbp test\n[medsrtqc.nc.NetCDFProfile log] Negative bbp test results: shallow negative flags set to 4\n[medsrtqc.nc.NetCDFProfile log] Negative bbp test result: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing parking hook test\n[medsrtqc.nc.NetCDFProfile log] Parking hook test results: 1 points set to 4\n[medsrtqc.nc.NetCDFProfile log] Applying global range test to BBP\n[medsrtqc.nc.NetCDFProfile log] Performing negative spike test\n[medsrtqc.nc.NetCDFProfile log] Calculating running median over window size 5\n[medsrtqc.nc.NetCDFProfile log] Performing stuck value test on bbp\n[medsrtqc.nc.NetCDFProfile log] Setting previously unset flags for BBP to GOOD\n[medsrtqc.nc.NetCDFProfile log] Performing missing data test\n[medsrtqc.nc.NetCDFProfile log] Missing data test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing high deep value\n[medsrtqc.nc.NetCDFProfile log] Calculating running median over window size 5\n[medsrtqc.nc.NetCDFProfile log] High deep value test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing noisy profile test\n[medsrtqc.nc.NetCDFProfile log] Noisy profile test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing negative bbp test\n[medsrtqc.nc.NetCDFProfile log] Negative bbp test result: flags set to 4\n[medsrtqc.nc.NetCDFProfile log] Applying global range test to BBP\n[medsrtqc.nc.NetCDFProfile log] Performing negative spike test\n[medsrtqc.nc.NetCDFProfile log] Calculating running median over window size 5\n[medsrtqc.nc.NetCDFProfile log] Performing stuck value test on bbp\nplt.show()\n\n\nSummary\n✅ BD6901654_056: Top negative point marked as bad, rest of profile ok. One point also fails the Parking Hook Test.\n✅ BD6901151_007: Whole profile marked as bad as expected.\nParking Hook Test\nThe parking hook test applies to profiles that have the same parking depth as their profile depth. In this configuration, there is no descent phase before the beginning of the profile, but instead the float immediately starts sampling and ascending toward the surface. Accumulated particles during the parking stage cause a distinctive hook at the bottom of the profile before they are released off the sensor back into the water. Although these data can be useful to expert users, they are not the expected data a user would want to find in an Argo profile and so the goal of this test is to flag these anomalous points.\nThe test is written as follows in python:\n\nif ascending:\n pres = bbp.pres\n pres[np.abs(pres) > 6000] = np.nan # remove fill values of 99999\n pres = np.sort(pres)\n # calculate deepest difference\n deepest_diff = pres[-1] - pres[-2]\n # deep points closer than 20m\n if deepest_diff < 20:\n parking_pres = 1000 # placeholder, assumed for now\n # difference between parking pressure and \n parking_diff = np.abs(pres[-1] - parking_pres)\n if parking_diff < 100:\n self.log('Performing parking hook test')\n # check for high points in deepest 50m of the profile\n ix = (bbp.pres < (pres[-1] - 20)) & (bbp.pres > (pres[-1] - 50))\n baseline = np.median(bbp.value[ix]) + 0.0002\n deep_above_baseline = (bbp.pres > (pres[-1] - 50)) & (bbp.value > baseline)\n all_passed = all_passed and not any(deep_above_baseline)\n # update only the bottom flags\n Flag.update_safely(bbp.qc, Flag.BAD, where=deep_above_baseline)\n self.log(f'Parking hook test results: flags set to {new_flag}')\n\nThe example profile for this test is float 6903197 cycle 26.\n\nfn = \"BD6903197_026.nc\"\nfig, ax = plt.subplots(constrained_layout=True)\nfig.set_size_inches(fig.get_figwidth()*2/5, fig.get_figheight())\n\nnc = read_nc_profile(\"data/\" + fn, mode=\"r+\")\nResetQCOperation().run(nc)\ntests = check.run(nc)\n\nprint(\"before: \", nc[\"BBP700\"])\nbefore: Trace(\n value=[0.0013924582, 0.00164023, 0.0015381566, [436 values], 0.00073923037, 0.00094326853, 0.0018410363],\n qc=[b'0', b'0', b'0', [436 values], b'0', b'0', b'0'],\n adjusted=[0.0013924582, 0.00164023, 0.0015381566, [436 values], 0.00073923037, 0.00094326853, 0.0018410363],\n adjusted_qc=[b'1', b'1', b'1', [436 values], b'1', b'1', b'1'],\n adjusted_error=[0.00027849164, 0.00032804598, 0.00030763133, [436 values], 0.00014784608, 0.0001886537, 0.00036820726],\n pres=[0.3, 1.0, 1.2, [436 values], 932.8, 932.8, 932.8],\n mtime=[masked, masked, masked, [436 values], masked, masked, masked]\n)\nnc.prepare(tests)\nbbp.run(nc)\n[medsrtqc.nc.NetCDFProfile log] Setting previously unset flags for BBP to GOOD\n[medsrtqc.nc.NetCDFProfile log] Performing missing data test\n[medsrtqc.nc.NetCDFProfile log] Missing data test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing high deep value\n[medsrtqc.nc.NetCDFProfile log] Calculating running median over window size 5\n[medsrtqc.nc.NetCDFProfile log] High deep value test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing noisy profile test\n[medsrtqc.nc.NetCDFProfile log] Noisy profile test results: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing negative bbp test\n[medsrtqc.nc.NetCDFProfile log] Negative bbp test result: flags set to 1\n[medsrtqc.nc.NetCDFProfile log] Performing parking hook test\n[medsrtqc.nc.NetCDFProfile log] Parking hook test results: 9 points set to 4\n[medsrtqc.nc.NetCDFProfile log] Applying global range test to BBP\n[medsrtqc.nc.NetCDFProfile log] Performing negative spike test\n[medsrtqc.nc.NetCDFProfile log] Calculating running median over window size 5\n[medsrtqc.nc.NetCDFProfile log] Performing stuck value test on bbp\nprint(\"after: \", nc[\"BBP700\"])\nafter: Trace(\n value=[0.0013924582, 0.00164023, 0.0015381566, [436 values], 0.00073923037, 0.00094326853, 0.0018410363],\n qc=[b'1', b'1', b'1', [436 values], b'4', b'4', b'4'],\n adjusted=[0.0013924582, 0.00164023, 0.0015381566, [436 values], 0.00073923037, 0.00094326853, 0.0018410363],\n adjusted_qc=[b'1', b'1', b'1', [436 values], b'1', b'1', b'1'],\n adjusted_error=[0.00027849164, 0.00032804598, 0.00030763133, [436 values], 0.00014784608, 0.0001886537, 0.00036820726],\n pres=[0.3, 1.0, 1.2, [436 values], 932.8, 932.8, 932.8],\n mtime=[masked, masked, masked, [436 values], masked, masked, masked]\n)\ng = plot_profile_flags(nc, ax=ax, ylim=(1070, 800), xlim=(-0.0005, 0.003))\nnc.close()\nplt.show()\n\n\nSummary\n✅ BD6903197_027: 9 high value points at depth flagged as bad.\nAppendix\nPlotting function\n\ndef plot_profile_flags(nc, ax=None, ylim=(2000, 0), xlim=\"auto\"):\n if ax is None:\n fig, ax = plt.subplots()\n\n # put netcdf Profile into dataframe\n df = pd.DataFrame(dict(\n PRES=nc[\"BBP700\"].pres, BBP700=nc[\"BBP700\"].value, \n QC=[s.decode() for s in nc[\"BBP700\"].qc]\n ))\n\n # plot results\n sns.lineplot(\n data=df, x=\"BBP700\", y=\"PRES\", \n color=\"k\", ax=ax, sort=False, \n legend=False, estimator=None, zorder=0\n )\n g = sns.scatterplot(\n data=df, x=\"BBP700\", y=\"PRES\", hue=\"QC\", \n hue_order=(\"0\", \"1\", \"2\", \"3\", \"4\"), ax=ax, zorder=1\n )\n \n ax.set_xlim(xlim)\n ax.set_ylim(ylim)\n\n return g\n\n\n\n\n", "preview": "posts/2023-12-05-implementing-updated-backscatter-rtqc/implementing-updated-backscatter-rtqc_files/figure-html5/missing data test-1.png", - "last_modified": "2024-02-13T18:11:56+00:00", + "last_modified": "2024-02-21T18:35:47+00:00", "input_file": {}, "preview_width": 1248, "preview_height": 768 @@ -32,7 +32,7 @@ "categories": [], "contents": "\nThis post is analogous to a presentation made at the 24th Argo Data Management Team meeting which can be found here.\nBackground & Motivation\nArgo floats have operated on, and been well served by, a 10-day cycle since its inception in the late 1990’s. Recently, however, it has been recommended that Argo floats begin to operate on near but not exact 10-day cycles in order to reduce biases. Reducing these biases is important to calculations of ocean heat content (Gille (2012), von Schuckmann et al. 2020) as well as primary productivity via biogeochemcial variables (Johnson & Bif 2021, Stoer & Fennel 2022).\nArgo Canada has made a significant effort to ensure good time-of-day coverage from its floats in the last 2 years (read more: Diversifying Argo Surfacing Times). As part of that effort, a protocol using python and github actions was developed to update the surfacing time of NKE PROVOR floats after each cycle.\nUnlike many other floats, PROVOR floats use a time of day parameter along with cycle time to determine their surfacing time. With other floats, setting the cycle time to say, 10 days plus 5 hours, would result in a different surfacing time with each cycle. With PROVOR floats however, because the time of day parameter controls the surfacing time, it must be updated after each cycle in order to achieve good time of day coverage throughout the lifetime of the float. This parameter can be reprogrammed remotely, but doing so manually quickly becomes impossible to manage for even a few different PROVOR floats.\nFortunately the method for reprogramming a PROVOR float is to upload a text file to the remote command directory of that float’s RUDICS FTP server, where the float’s data also arrives. This means the problem is well suited to be solved programatically.\nCode Framework\nThe following pseudocode outlines a script that runs once per day via github actions:\nLog into RUDICS FTP server1\nCheck which floats have reported within the last 24 hours\nFor those that have reported recently, create a command file with the new surface time2\nUpload the command file to the FTP server\nLog the actions taken for which float3\n1Log in credentials are stored using github secrets, which allow you to securely call the stored variables within a github action without worry of data leaks\n2The method of deciding the next target time may be defined by the user, but by default is a simple delta (ex. most recent time minus 5 hours). See next section for details\n3Logged information uses the float’s IMEI number, which is how the directories are defined by default. The command files are also archived. Example log: [2023-10-18] Updated 300125061656740 surfacing time from 4 to 23\nTiming definitions & floats with Radiometry\nSo far, Argo Canada has been operating its PROVOR floats by making the surface time 5 hours earlier than the previous one. This effectively results in a 235 hour cycle. This choice of cycle time was made as (1) it provides good time of day coverage, (2) it was suggested by the manufacturer that slightly reducing cycle time rather than slightly increasing it may better maximize the float’s battery life in terms of number of completed cycles, and (3) Argo Canada is not currently operating any floats that are equipped with radiometry sensors.\nFor floats that have radiometry sensors, it is best to sample at local noon. To both comply with varying time of day and operate the float to more frequently profile at local noon time, the user can define for example a list of times for the float to cycle through that includes extra noon time profiles.\n\n\n\nResults\nTo demonstrate the results of this code, the figures below show the distribution of PROVOR surfacing times for the last 3 years, and an example of the surfacing times of one float deployed in 2022.\n\n\n\nFuture Work & Flexibility\nI believe for PROVOR CTS5 floats, this will still be necessary (though I don’t know for sure). Argo Canada (Dalhousie University) will be deploying this type of float in the near furure, so the next priority will be to implement commands in the style/language of the new firmware.\nAdditionally, although this routine was built to automate PROVOR surfacing time of day, there could be many uses to automatically updating mission parameters of any type of float. Off the top of my head - though so of these may be possible within a float’s existing firmware. Of course I would not encourage anything outside the Argo mission :).\nIncreased vertical resolution on every nth profile\nDowncast measurements every nth profile\nSeasonally varying missions\n\n\n\n", "preview": {}, - "last_modified": "2024-02-13T18:11:56+00:00", + "last_modified": "2024-02-21T18:35:47+00:00", "input_file": {} }, { @@ -49,7 +49,7 @@ "categories": [], "contents": "\nA 2021 paper by Johnson and Bif (2021) demonstrated that diel cycles of oxygen can be observed using the Argo network as a collective set of sensors. More recently, Stoer and Fennel (2022) used similar methodology on particle backscatter to estimate Net Primary Production (NPP) from diel cycles in carbon biomass. Each of these recent papers apply the method Gille (2012) demonstrated for diel cycles of temperature to biogeochemical variables. The calculation of the diel cycle depends on multiple factors, one of them being that floats included in the analysis do not surface at a fixed or few times of day for every profile. Instead, floats must demonstrate good temporal coverage of a 24hr period with near even occurrences.\nFigure 2 from Johnson and Bif (2021). Their caption reads: a, Mean oxygen anomaly in the upper 20 m from each profile with acceptable cycle timing (N = 14,294) versus local hour of the day. Data from all days of year from 2010 to 2020 are included. b, Mean oxygen anomaly in each hourly interval and the least-squares fit of equation (2) to the data shown in a with GOP = 2.2 \\(\\pm\\) 0.3 (1 standard error) mmol m\\(^{−3}\\) d\\(^{−1}\\) O2. c, GOP determined at 2-month intervals in the upper 20 m versus day of year. d, Depth profile of GOP rates for all days of the year. Error bars are one standard error.This is not necessarily typical behaviour for Argo floats. For the analysis presented in Johnson and Bif (2021), of the 50,736 profiles available in the Northern Hemisphere, only 14,294 profiles satisfied the surface time requirements. In this post we will detail Argo Canada’s effort over the last 2-3 years to shift it’s Argo floats to satisfy this surfacing schedule and contribute to future research using this methodology.\nThe post will be a mix of text and python code, showing both the changes over time and the python code needed to demonstrate the change. We will use pandas for data handling, and matplotlib and seaborn for plotting. Lets load our packages and the Argo global index:\n\nimport pandas as pd\n\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nsns.set(style=\"ticks\", palette=\"colorblind\")\n\n# load the global index\nglobal_index = pd.read_csv(\n \"ftp://ftp.ifremer.fr/ifremer/argo/ar_index_global_prof.txt.gz\", \n compression=\"gzip\", header=8\n)\n# subset to only the MEDS DAC, profiles with valid dates\nmeds = global_index.loc[global_index.file.str.contains('meds')]\nmeds = meds.loc[meds.date.notna()]\n\n# convert date to pandas timestamps, take only profiles from the last 3 years\nmeds[\"date\"] = meds.date.astype(str)\\\n .apply(lambda x: x.replace(\".0\", \"\"))\\\n .apply(pd.Timestamp, tz=\"utc\")\nmeds = meds.loc[meds.date > pd.Timestamp(\"01-2020\", tz=\"utc\")]\n\nThe few surviving MetOcean NOVA floats are included here, but it should be noted that these were not specifically reprogrammed for varied surfacing times. Recently deployed deep ARVOR floats are also included, however their cycle period is defined as an integer number of days rather than in hours, and so those will not produce good diel coverage. Now, we will calculate the local time using the location data, and python packages timezonefinder and pytz, and visualize local surfacing times over the last 3 years.\n\nimport timezonefinder\ntf = timezonefinder.TimezoneFinder()\nimport pytz\n\nprofiler_type = {\n \"865\":\"NOVA \",\n \"844\":\"ARVOR_SBE \",\n \"878\":\"ARVOR_RBR \",\n \"838\":\"ARVOR_DEEP\",\n \"836\":\"PROVOR \",\n}\n\n# exclude invalid locations\nmeds = meds.loc[(meds.latitude.notna()) & (meds.longitude.notna())]\n\n# get timezone, local time, and hour at surface for each profile\nmeds[\"timezone\"] = [\n pytz.timezone(tf.certain_timezone_at(lat=lat, lng=lon))\\\n for lat, lon in zip(meds.latitude, meds.longitude)\n]\nmeds[\"local_time\"] = [utc_time.tz_convert(tz) for utc_time, tz in zip(meds.date, meds.timezone)]\nmeds[\"surface_hour\"] = [local_time.hour + 0.5 for local_time in meds.local_time]\n\n# add a column for WMO number as well as platform name\nmeds[\"WMO\"] = [int(s.split(\"/\")[1]) for s in meds.file]\nmeds[\"platform\"] = [profiler_type[f\"{p}\"] for p in meds.profiler_type]\n\nfig, ax = plt.subplots(dpi=300, constrained_layout=True)\nsns.lineplot(data=meds, x=\"local_time\", y=\"surface_hour\", hue=\"platform\", \n units=\"WMO\", estimator=None, alpha=0.25, ax=ax\n)\nsns.move_legend(ax, \"upper left\", bbox_to_anchor=(1, 1))\nplt.setp(ax.get_xticklabels(), ha=\"right\", rotation=45)\n\n\n\n\nThe above plot is very busy, but clearly shows a shift in regime in mid-2021 when we began reprogramming all new deployments as well as existing floats via remote commands. For NKE ARVOR floats, which constitute most of the Canadian fleet, we set the cycle period to 245 hours, or 10 days + 5 hours. This creates diel coverage relatively quickly, without sampling at times near each other on subsequent profiles. NKE PROVOR floats were slightly trickier to reprogram, as instead of a cycle period they have a surface time they aim to achieve. This parameter therefore must be reprogrammed every cycle. This is achieved by running a daily github action powered by python, which you can find on the ArgoCanada github page.\nTo better understand the timing, lets look closely at an ARVOR float that was deployed in late 2020 and reprogrammed, and a PROVOR float that gets a new command each cycle.\n\n# ARVOR deployed in 2020, PROVOR in 2022\narvor = 4902523\nprovor = 4902623\nsubset = meds.loc[meds.WMO.isin((arvor, provor))]\n# make day of mission our time variable so we can plot them on the same axis\nsubset[\"mission_day\"] = [\n subset.date.loc[i] - subset.date.loc[subset.WMO == subset.WMO.loc[i]].min()\\\n for i in subset.index\n]\n# fractional days\nsubset[\"mission_day\"] = subset[\"mission_day\"].apply(lambda x: x/pd.to_timedelta(1, 'D'))\n\nfig, ax = plt.subplots(dpi=300, constrained_layout=True)\nsns.lineplot(data=subset, x=\"mission_day\", y=\"surface_hour\", hue=\"platform\", \n style=\"platform\", dashes=False, markers=True, ax=ax\n)\nax.set_xlim((0,300))\n\n\n\n\nThe ARVOR float was deployed in 2020, and was reprogrammed remotely in the second half of 2021. The PROVOR float was deployed before the surface time reprogramming protocol was live, but has begun taking commands as of May 2023. The lines slope in opposite directions because the ARVORs operate on a 10 days + 5 hours (245hr) cycle, while the PROVORs are being programmed for 10 days - 5 hours (235hr) cycle. The latter is a minor difference, but was a suggestion from the manufacturer as it may produce an extra profile or two if/when a float dies of exhausted battery life.\nFinally, to get a better idea of how we have performed fleet-wide, let’s look at the distributions of surfacing times since 2020 by year.\n\n# create column for profile year\nmeds[\"year\"] = [d.year for d in meds.local_time]\nmeds = meds.loc[meds.year > 2019] # 2 floats have local times in 2019\n# create a FacetGrid that will plot by year, 2020, 2021, 2022, 2023\ng = sns.displot(\n meds, x=\"surface_hour\", col=\"year\", hue=\"platform\", \n kind=\"hist\", bins=list(range(24)), multiple=\"stack\", \n col_wrap=2, facet_kws=dict(despine=False, sharey=False)\n)\ng.fig.set_dpi(300)\ng.fig.set_constrained_layout(True)\n\n\n\n\nThe above figure shows the progression of the Canadian Argo fleet over the years. Blue bars correspond to ARVOR floats with Seabird CTDs, orange bars to ARVOR floats with RBR CTDs and green bars to PROVOR floats. In the 2020 panel, there are two peaks, one representing local times in the Eastern Pacific, and the other in the Western Atlantic. In 2021 these peaks persist, but for roughly half the year we have good coverage. In 2022, some peaks remain as the final floats still operating on 240 hour cycles are reprogrammed. In the final panel, we see that the fleet operates well to cover the entire day. There are slight biases toward 0600-0700 and 1400-1500 local times in the PROVOR floats (green bars) as the live reprogramming was not active until May this year, but those profiles are now being well distributed. Overall, almost all recent profiles recorded by Argo Canada floats should now meet the statistical criteria to be able to construct diel cycles as in Gille (2012), Johnson and Bif (2021), and Stoer and Fennel (2022).\nReferences\nJohnson, K.S., Bif, M.B. Constraint on net primary productivity of the global ocean by Argo oxygen measurements. Nat. Geosci. 14, 769-774 (2021). https://doi.org/10.1038/s41561-021-00807-z\nStoer, A.C. and Fennel, K. (2023), Estimating ocean net primary productivity from daily cycles of carbon biomass measured by profiling floats. Limnol. Oceanogr. Lett, 8: 368-375. https://doi.org/10.1002/lol2.10295\nGille, S. T. (2012), Diurnal variability of upper ocean temperatures from microwave satellite measurements and Argo profiles, J. Geophys. Res., 117, C11027, doi:10.1029/2012JC007883.\n\n\n\n", "preview": {}, - "last_modified": "2024-02-13T18:11:56+00:00", + "last_modified": "2024-02-21T18:35:47+00:00", "input_file": {} }, { @@ -66,7 +66,7 @@ "categories": [], "contents": "\nIn this post we will work through performing the response time correction on oxygen observations following Bittig et al. (2014) on Argo data. The focus is more on accessing the proper variables within Argo than describing the actual correction. We will use argopandas package to manage our data fetching from Argo, and use a function from bgcArgoDMQC to do the response time correction. Other basic data manipulation and visualization will use the pandas, numpy, and scipy packages, and matplotlib and seaborn for plotting.\n# conda install -c conda-forge argopandas bgcArgoDMQC\n\nimport numpy as np\nimport pandas as pd\nfrom scipy.interpolate import interp1d\n\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nsns.set(style='ticks', palette='colorblind')\n\nimport argopandas as argo\nfrom bgcArgoDMQC import correct_response_time\nWe will use float 7900589, an APEX float in the North Atlantic which has the intermediate parameter MTIME, defined as the relative time in fractional days since the date of the profile JULD.\nflt = argo.float(7900589)\n# grab core and bgc files for just the most recent cycle\ncore = flt.prof\nbgc = flt.bio_prof\ncore = core[core.file == core.file.iloc[-1]]\nbgc = bgc[bgc.file == bgc.file.iloc[-1]]\nDownloading 'https://data-argo.ifremer.fr/ar_index_global_prof.txt.gz'\nDownloading 'https://data-argo.ifremer.fr/argo_bio-profile_index.txt.gz'\ncore\n\n\n\n\nfile\n\n\ndate\n\n\nlatitude\n\n\nlongitude\n\n\nocean\n\n\nprofiler_type\n\n\ninstitution\n\n\ndate_update\n\n\n1853971\n\n\ncoriolis/7900589/profiles/R7900589_043.nc\n\n\n2021-11-08 12:13:44+00:00\n\n\n55.682\n\n\n-46.691\n\n\nA\n\n\n846\n\n\nIF\n\n\n2021-11-08 15:34:25+00:00\n\n\nbgc\n\n\n\n\nfile\n\n\ndate\n\n\nlatitude\n\n\nlongitude\n\n\nocean\n\n\nprofiler_type\n\n\ninstitution\n\n\nparameters\n\n\nparameter_data_mode\n\n\ndate_update\n\n\n179672\n\n\ncoriolis/7900589/profiles/BR7900589_043.nc\n\n\n2021-11-08 12:13:44+00:00\n\n\n55.682\n\n\n-46.691\n\n\nA\n\n\n846\n\n\nIF\n\n\nMTIME PRES TEMP_DOXY C1PHASE_DOXY C2PHASE_DOXY…\n\n\nRRRRRARRRARR\n\n\n2021-11-08 16:02:45+00:00\n\n\ncore_df = core.levels[['PRES', 'TEMP', 'PSAL']]\nbgc_df = bgc.levels[['PRES', 'MTIME', 'DOXY']]\nDownloading 'https://data-argo.ifremer.fr/dac/coriolis/7900589/profiles/R7900589_043.nc'\nReading 1 file\nDownloading 'https://data-argo.ifremer.fr/dac/coriolis/7900589/profiles/BR7900589_043.nc'\nReading 1 file\ncore_df\n\n\n\n\n\n\n\n\nPRES\n\n\nTEMP\n\n\nPSAL\n\n\nfile\n\n\nN_PROF\n\n\nN_LEVELS\n\n\n\n\n\n\n\n\ncoriolis/7900589/profiles/R7900589_043.nc\n\n\n0\n\n\n0\n\n\n0.43\n\n\n6.7983\n\n\n34.502201\n\n\n1\n\n\n2.30\n\n\n6.7997\n\n\n34.501499\n\n\n2\n\n\n4.42\n\n\n6.8032\n\n\n34.501801\n\n\n3\n\n\n6.01\n\n\n6.8057\n\n\n34.501900\n\n\n4\n\n\n8.07\n\n\n6.8026\n\n\n34.502102\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n3\n\n\n470\n\n\nNaN\n\n\nNaN\n\n\nNaN\n\n\n471\n\n\nNaN\n\n\nNaN\n\n\nNaN\n\n\n472\n\n\nNaN\n\n\nNaN\n\n\nNaN\n\n\n473\n\n\nNaN\n\n\nNaN\n\n\nNaN\n\n\n474\n\n\nNaN\n\n\nNaN\n\n\nNaN\n\n\n1900 rows × 3 columns\n\n\nbgc_df\n\n\n\n\n\n\n\n\nPRES\n\n\nMTIME\n\n\nDOXY\n\n\nfile\n\n\nN_PROF\n\n\nN_LEVELS\n\n\n\n\n\n\n\n\ncoriolis/7900589/profiles/BR7900589_043.nc\n\n\n0\n\n\n0\n\n\n0.43\n\n\n-0.000613\n\n\nNaN\n\n\n1\n\n\n2.30\n\n\n-0.001296\n\n\nNaN\n\n\n2\n\n\n4.42\n\n\nNaN\n\n\nNaN\n\n\n3\n\n\n6.01\n\n\nNaN\n\n\nNaN\n\n\n4\n\n\n8.07\n\n\nNaN\n\n\nNaN\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n3\n\n\n470\n\n\nNaN\n\n\nNaN\n\n\nNaN\n\n\n471\n\n\nNaN\n\n\nNaN\n\n\nNaN\n\n\n472\n\n\nNaN\n\n\nNaN\n\n\nNaN\n\n\n473\n\n\nNaN\n\n\nNaN\n\n\nNaN\n\n\n474\n\n\nNaN\n\n\nNaN\n\n\nNaN\n\n\n1900 rows × 3 columns\n\n\nYou will notice looking at the printout of bgc_df that there are a lot of NaN values. The valid MTIME and DOXY values are in the N_PROF dimension 2. There are a variety of reasons why there might be N_PROF > 1 dimensions in an Argo profile. Where that is not the subject I won’t go into why, and frankly I only know the valid data is in N_PROF = 2 by inspecting the dataframe. The valid core data is in N_PROF = 0. If we simply tried to line these separate dataframes up into one, we would fail miserably since our time and oxygen data would not be aligned with our physical data. So instead, we will use the common pressure axis to interpolate onto a common axis.\n# create a dataframe to store interpolated data in\ndf = pd.DataFrame()\n\n# define a pressure axis to interpolate and a depth resolution\ndP = 2.5\ninterp_pressure = np.arange(0, core_df['PRES'].max(), dP)\ndf['PRES'] = interp_pressure\n\n# interpolate\nfor key, source in zip(['MTIME', 'TEMP', 'DOXY'], [bgc_df, core_df, bgc_df]):\n ix = source[key].notna() # remove nan values that will mess with interp\n f = interp1d(source['PRES'][ix], source[key][ix], bounds_error=False)\n df[key] = f(interp_pressure)\ndf\n\n\n\n\nPRES\n\n\nMTIME\n\n\nTEMP\n\n\nDOXY\n\n\n0\n\n\n0.0\n\n\nNaN\n\n\nNaN\n\n\nNaN\n\n\n1\n\n\n2.5\n\n\n-0.001345\n\n\n6.800030\n\n\n265.266078\n\n\n2\n\n\n5.0\n\n\n-0.001957\n\n\n6.804258\n\n\n265.227454\n\n\n3\n\n\n7.5\n\n\n-0.002542\n\n\n6.802751\n\n\n265.246096\n\n\n4\n\n\n10.0\n\n\n-0.003235\n\n\n6.804123\n\n\n264.956293\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n374\n\n\n935.0\n\n\n-0.139717\n\n\n3.358495\n\n\n263.701094\n\n\n375\n\n\n937.5\n\n\n-0.140046\n\n\n3.354090\n\n\n263.718486\n\n\n376\n\n\n940.0\n\n\n-0.140375\n\n\n3.351910\n\n\n263.735879\n\n\n377\n\n\n942.5\n\n\n-0.140704\n\n\n3.351850\n\n\n263.753272\n\n\n378\n\n\n945.0\n\n\n-0.141049\n\n\n3.351151\n\n\nNaN\n\n\n379 rows × 4 columns\n\n\nNow we are almost ready to perform the time response correction, except that we don’t know what the time response of this optode is. Without a reference data set like in Bittig et al. (2014) or consecutive up- and down-casts as in Gordon et al. (2020), knowing the response time is not possible. For the purposes of demonstration we will choose a boundary layer thickness (an equivalent parameter, but independent of temperature unlike response time) of 120 micrometers (equivalent to a response time of 67.2 seconds at 20 degrees C).\nIl = 120\ndf['DOXY_ADJUSTED'] = correct_response_time(df['MTIME'], df['DOXY'], df['TEMP'], Il)\ndf['DOXY_DELTA'] = df.DOXY - df.DOXY_ADJUSTED # change in oxygen\nFinally, we’ll plot the profiles to see the end result of the correction.\n# melt the dataframe so that we can use hue keyword when plotting\ndf_melt = df.melt(id_vars=['PRES', 'MTIME', 'TEMP', 'DOXY_DELTA'], var_name='DOXY_STATUS', value_name='DOXY')\n\nfig, axes = plt.subplots(1, 2, sharey=True)\nsns.lineplot(x='DOXY', y='PRES', hue='DOXY_STATUS', data=df_melt, sort=False, ax=axes[0])\nsns.lineplot(x='DOXY_DELTA', y='PRES', data=df, sort=False, ax=axes[1])\naxes[0].legend(loc=3)\naxes[0].set_ylim((250, 0))\noxygen and delta oxygen profilesSome observations based on the above:\nIt is important to recall that this is an ascending profile.\nThe first thing your eye was likely drawn to is the large change 70m depth. I would wager that this single point is probably too dramatic, but also could be real as the gradient is strong there and oxygen would be favouring the higher side. This point makes me uncomfortable without reference data, but I can’t say for sure that it is wrong.\nFrom 250-100m, oxygen is relatively linear. In this section of the profile, we see a slighly lower DOXY_ADJUSTED than the original DOXY. Since oxygen is decreasing as the float ascends, the float remembers the higher concentration from the deeper depth, and therefore slightly overestimates the true oxygen concentration.\nAt points where there are “notches” in the original profile, those “notches” are amplified in the corrected on.\nThinking more generally about the wider Argo program, there are a few key questions:\nHow would you include this adjusted data in the B-file? Would it go in the DOXY_ADJUSTED field, which currently is used for gain adjustment (Johnson et al. 2015), or would it merit a different field?\nAssuming there is no reliable way to determine boundary layer thickness (time constant), should Argo correct using a generic one since the adjusted data will be “more correct” than the original, even if it is not perfect?\nGiven a lack of reference data, how would you flag the above adjusted profile? Are there any points you don’t believe that should be flagged as bad?\n\n\n\n", "preview": {}, - "last_modified": "2024-02-13T18:11:56+00:00", + "last_modified": "2024-02-21T18:35:47+00:00", "input_file": {} }, { @@ -83,7 +83,7 @@ "categories": [], "contents": "\nHi Argo community! Today I present to you the argoFloats package. Not only does this package make Argo data more accessible by helping to identify, download, cache, and analyze Argo data, it does so in a way that makes it easy for someone with little coding experience.\nBefore getting ahead of ourselves, let’s download and load the package:\n\n\n\n\n\n\nTo eliminate the gap between Argo data and the end user, we created an easy 5 step process, shown below.\nWork flow for argoFloats package with descriptions on the left and relevent functions on the rightgetIndex()\ngetIndex(), the foundational tool of the package, queries an Argo server for an index of data files. Various servers can be used, with the default being to try Infremer and USGODAE in turn. The possible indices are listed below.\nFile Name\nNickname\nContents\nar_greylist.txt\n-\nSuspicious/malfunctioning floats\nar_index_global_meta.txt.gz\n-\nMetadata files\nar_index_global_prof.txt.gz\n\"core\"\nArgo data\nar_index_global_tech.txt.gz\n-\nTechnical files\nar_index_global_traj.txt.gz\n-\nTrajectory files\nargo_bio-profile_index.txt.gz\n\"bgc\" or \"bgcargo\"\nBiogeochemical data (without S or T)\nargo_bio-traj_index.txt.gz\n-\nBiogeochemical trajectory files\nargo_synthetic-profile_index.txt.gz\n\"synthetic\"\nSynthetic data, successor to \"merge\"\nUsing getIndex() saves the user from having to understand the file structures on the various servers (which are similar, but not quite the same). So, for example, a user need not study an FTP listing such as\n\n\n\nbut instead would simply write\n\n\n\nEasy, right? Now what about the next barrier? How do we cache an index to speed up our download? As of today (2021-10-19), there are 2527015 Argo profiles files. That means 2527015 files that look like this (ftp.ifremer.fr):\n\n\n\nContinuously downloading this index can be tiresome and time consuming. To avoid this problem, we created the age argument in getIndex(). This lets the user indicate how old a downloaded file must be (in days), for it to be considered out-of-date. For example\n\n\n\nindicates that getIndex() should return a previously-downloaded index, provided that it was downloaded less than 10 days ago. Since the previous index would have been downloaded with getIndex(), we call this process “caching”.\nsubset()\nThe indices (obtained with getIndex() and possibly subset using subset,argoFloats-method) store information about individual argo data files, including: the name of the data file, the date of observation, the observation location (latitude, longitude, and ocean), the profiler type, the institution that owns the float, and the most recent time of data update.\nOnce the indices are downloaded, a burning question still remains. How in the world do we handle over two million profiles? The answer to this is to use subset(), which provides the ability to subset by ID, time, geography, variable, institution, ocean, dataMode, cycle, direction, profile, dataStateIndicator, section, etc. This makes it easy to sift through the data based on parameters that the user is likely to be interested in. For more details and examples of each type of subset look at the help docs for subset within the argoFloats package.\nLets consider Argo floats near Nova Scotia. If a user wanted to only look at the core Argo profiling files within a 250 km radius of Halifax, Nova Scotia, they would do the following:\n\n\n\ngetProfiles() and readProfiles()\nTo take a deeper dive (excuse the pun) into the data stored within individual Argo data files, the user must download the data files, typically done with getProfiles(), and then read those file, typically with readProfiles().\nTo continue with our example, we might use\n\n\n\nto download and read the data files. Note that getProfiles() uses an age-based cache, just like getIndex() does, to speed up processing.\nAnalyzing Argo data\nThe argoFloats package has a suite of functions for analyzing Argo data, and these work well with the functions provided by the oce package (Kelley and Richards, 2021). These are described in great detail in our paper (Kelley et al. 2021) as well as the help documentation. As with oce, there is an emphasis on simplicity and a philosophy of isolating users from the details of storage. For example, the [[ operator provides a way to extract items from argoFloats objects, with e.g. index[[\"file\"]] being a replacement for index@data$index$file. Similarly, it is easy to extract individual Argo objects from a collection, and data within those objects, including derived data (such as potential temperature).\nPlotting being an important component of analysis, argoFloats provides specialized plots for maps, profiles, temperature-salinity, quality control analysis, to name a few. Continuing with our example,\n\n\n\ngives a temperature-salinity plot for a set of Argo profiles.\nQuality Control\nUh oh! What are all of those red points in the TS diagram shown above? That TS diagram must be useless, right? Wrong. We’ve got you covered. argoFloats has also included a three-step process for dealing with Quality Control of Argo data.\n\n\n\napplyQC\nTo first answer the question, “What are all of those red dots in the TS diagram?” They are points that were flagged as “bad” during quality control. The applyQC() function identifies any points that are classified as “bad” and sets them to NA to not be used in future plots or calculations.\nLet’s take a look at the impact of applyQC() where the left hand side shows before applyQC() was done and the right hand side shows after it was completed.\n\n\n\nshowQCTests()\nAlthough that was easy, your work is not done yet. It’s always a good idea to know why a certain point was flagged as bad. That is where showQCTests() comes into play. This function reveals which tests were passed and failed for each particular Argo profile. In our example, we might want to loop through all of the profiles in argos and when a “bad” point is encountered, display which tests were performed/failed. Note for the sake of space, the output is not shown.\n\n\n\nplot(x, which=“QC”)\nFor a last QC visual, we can use the plot(x, which=\"QC\"). Let’s take the first output (ie. float number 4901400) from above. As stated by the output, it failed the Density Inversion and Grey List test. We can look at the quality of each cycle in the data by doing the following:\n\n\n\nmapApp()\nLastly, just in case argoFloats wasn’t cool enough, we’ve also created an app, named mapApp(), which is a way to analyze spatial-temporal distribution of Argo profiles. One neat feature about mapApp() is the “Code” button. This button shows a popup window containing R code that performs the basic steps used to isolate data for the mapApp() display. It even provides code for some reasonable next steps, such as downloading, reading, and plotting the Argo profile data.\n\n\n\nReferences\nKelley, D. E., Harbin, J., & Richards, C. (2021). argoFloats: An R package for analyzing Argo data. Frontiers in Marine Science, 8, 636922. https://doi.org/10.3389/fmars.2021.635922\nKelley, D., and Richards, C. (2020). oce: Analysis of Oceanographic Data. Available online at: https://CRAN.R-project.org/package=oce (accessed October 22, 2021).\n\n\n\n", "preview": "posts/2021-10-22-argofloats-an-r-package-for-argo-researchers/infremer.png", - "last_modified": "2024-02-13T18:11:56+00:00", + "last_modified": "2024-02-21T18:35:47+00:00", "input_file": {}, "preview_width": 828, "preview_height": 894 @@ -102,7 +102,7 @@ "categories": [], "contents": "\nThis post is a rough guide to implementing real-time adjustments to CHLA values in Argo NetCDF files. The real-time adjustments are relatively new and reflect a balance between simplicity and completeness, with the understanding that delayed-mode QC will be more robust and incorporate more float-specific considerations. In case it’s not clear, nothing in this post should be construed as a definitive QC implementation and should be subject to considerable testing prior to deployment as actual live QC.\nTo implement the tests on existing ArgoNetCDF files in Python, we’ll use the argopandas package (to list float profiles and do some basic interaction with Argo NetCDF files) and gsw (to perform seawater calculations required for some of the checks). We’ll also use pandas data frames and numpy arrays to marshal the generic data handling and matplotlib to plot as we go along.\n# pip install argopandas gsw matplotlib\n# conda install -c conda-forge argopandas gsw matplotlib\nimport argopandas as argo\nimport gsw\nimport numpy as np\nimport matplotlib.pyplot as plt\nExample data\nWe’ll need some data to practice with, too. As an example, I’ll use all the profiles from float 6904117. It’s probably best to use a fresh cache for every Python session (the default), but for the purposes of rendering this post I’ll use a local cache to avoid downloading the files everytime I render it.\nprofiles = argo.prof \\\n .subset_float(6904117) \\\n .subset_direction('ascending') \nbgc_profiles = argo.bio_prof \\\n .subset_float(6904117) \\\n .subset_direction('ascending')\nDownloading 'https://data-argo.ifremer.fr/ar_index_global_prof.txt.gz'\nDownloading 'https://data-argo.ifremer.fr/argo_bio-profile_index.txt.gz'\nprofiles\n\n\n\n\nfile\n\n\ndate\n\n\nlatitude\n\n\nlongitude\n\n\nocean\n\n\nprofiler_type\n\n\ninstitution\n\n\ndate_update\n\n\n1826644\n\n\ncoriolis/6904117/profiles/R6904117_001.nc\n\n\n2021-02-03 12:42:43+00:00\n\n\n57.295\n\n\n19.963\n\n\nA\n\n\n834\n\n\nIF\n\n\n2021-09-14 15:09:29+00:00\n\n\n1826646\n\n\ncoriolis/6904117/profiles/R6904117_002.nc\n\n\n2021-02-05 00:55:14+00:00\n\n\n57.251\n\n\n19.854\n\n\nA\n\n\n834\n\n\nIF\n\n\n2021-09-14 15:09:42+00:00\n\n\n1826648\n\n\ncoriolis/6904117/profiles/R6904117_003.nc\n\n\n2021-02-06 12:42:44+00:00\n\n\n57.165\n\n\n19.849\n\n\nA\n\n\n834\n\n\nIF\n\n\n2021-09-14 15:09:52+00:00\n\n\n1826650\n\n\ncoriolis/6904117/profiles/R6904117_004.nc\n\n\n2021-02-08 00:59:14+00:00\n\n\n57.136\n\n\n19.973\n\n\nA\n\n\n834\n\n\nIF\n\n\n2021-09-14 15:10:04+00:00\n\n\n1826652\n\n\ncoriolis/6904117/profiles/R6904117_005.nc\n\n\n2021-02-09 12:44:44+00:00\n\n\n57.163\n\n\n20.046\n\n\nA\n\n\n834\n\n\nIF\n\n\n2021-09-14 15:10:15+00:00\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n1826978\n\n\ncoriolis/6904117/profiles/R6904117_168.nc\n\n\n2021-10-12 12:26:21+00:00\n\n\n57.789\n\n\n19.844\n\n\nA\n\n\n834\n\n\nIF\n\n\n2021-10-12 16:43:24+00:00\n\n\n1826980\n\n\ncoriolis/6904117/profiles/R6904117_169.nc\n\n\n2021-10-14 00:28:17+00:00\n\n\n57.828\n\n\n19.870\n\n\nA\n\n\n834\n\n\nIF\n\n\n2021-10-14 04:40:19+00:00\n\n\n1826982\n\n\ncoriolis/6904117/profiles/R6904117_170.nc\n\n\n2021-10-15 12:30:17+00:00\n\n\n57.816\n\n\n19.850\n\n\nA\n\n\n834\n\n\nIF\n\n\n2021-10-15 17:48:45+00:00\n\n\n1826984\n\n\ncoriolis/6904117/profiles/R6904117_171.nc\n\n\n2021-10-17 00:29:12+00:00\n\n\n57.831\n\n\n19.861\n\n\nA\n\n\n834\n\n\nIF\n\n\n2021-10-17 04:40:26+00:00\n\n\n1826986\n\n\ncoriolis/6904117/profiles/R6904117_172.nc\n\n\n2021-10-18 13:12:23+00:00\n\n\n57.825\n\n\n19.852\n\n\nA\n\n\n834\n\n\nIF\n\n\n2021-10-18 16:39:57+00:00\n\n\n172 rows × 8 columns\n\n\nbgc_profiles\n\n\n\n\nfile\n\n\ndate\n\n\nlatitude\n\n\nlongitude\n\n\nocean\n\n\nprofiler_type\n\n\ninstitution\n\n\nparameters\n\n\nparameter_data_mode\n\n\ndate_update\n\n\n172309\n\n\ncoriolis/6904117/profiles/BR6904117_001.nc\n\n\n2021-02-03 12:42:43+00:00\n\n\n57.295\n\n\n19.963\n\n\nA\n\n\n834\n\n\nIF\n\n\nMTIME PRES C1PHASE_DOXY C2PHASE_DOXY TEMP_DOXY…\n\n\nRRRRRARRRRRRRRRRRARRRRRRRRAR\n\n\n2021-09-14 15:09:29+00:00\n\n\n172311\n\n\ncoriolis/6904117/profiles/BR6904117_002.nc\n\n\n2021-02-05 00:55:14+00:00\n\n\n57.251\n\n\n19.854\n\n\nA\n\n\n834\n\n\nIF\n\n\nMTIME PRES C1PHASE_DOXY C2PHASE_DOXY TEMP_DOXY…\n\n\nRRRRRARRRRRRRRRRRARRRRRRRRAR\n\n\n2021-09-14 15:09:42+00:00\n\n\n172313\n\n\ncoriolis/6904117/profiles/BR6904117_003.nc\n\n\n2021-02-06 12:42:44+00:00\n\n\n57.165\n\n\n19.849\n\n\nA\n\n\n834\n\n\nIF\n\n\nMTIME PRES C1PHASE_DOXY C2PHASE_DOXY TEMP_DOXY…\n\n\nRRRRRARRRRRRRRRRRARRRRRRRRAR\n\n\n2021-09-14 15:09:52+00:00\n\n\n172315\n\n\ncoriolis/6904117/profiles/BR6904117_004.nc\n\n\n2021-02-08 00:59:14+00:00\n\n\n57.136\n\n\n19.973\n\n\nA\n\n\n834\n\n\nIF\n\n\nMTIME PRES C1PHASE_DOXY C2PHASE_DOXY TEMP_DOXY…\n\n\nRRRRRARRRRRRRRRRRARRRRRRRRAR\n\n\n2021-09-14 15:10:04+00:00\n\n\n172317\n\n\ncoriolis/6904117/profiles/BR6904117_005.nc\n\n\n2021-02-09 12:44:44+00:00\n\n\n57.163\n\n\n20.046\n\n\nA\n\n\n834\n\n\nIF\n\n\nMTIME PRES C1PHASE_DOXY C2PHASE_DOXY TEMP_DOXY…\n\n\nRRRRRARRRRRRRRRRRARRRRRRRRAR\n\n\n2021-09-14 15:10:15+00:00\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n…\n\n\n172643\n\n\ncoriolis/6904117/profiles/BR6904117_168.nc\n\n\n2021-10-12 12:26:21+00:00\n\n\n57.789\n\n\n19.844\n\n\nA\n\n\n834\n\n\nIF\n\n\nMTIME PRES C1PHASE_DOXY C2PHASE_DOXY TEMP_DOXY…\n\n\nRRRRRARRRRRRRRRRRARRRRRRRRAR\n\n\n2021-10-12 16:43:24+00:00\n\n\n172645\n\n\ncoriolis/6904117/profiles/BR6904117_169.nc\n\n\n2021-10-14 00:28:17+00:00\n\n\n57.828\n\n\n19.870\n\n\nA\n\n\n834\n\n\nIF\n\n\nMTIME PRES C1PHASE_DOXY C2PHASE_DOXY TEMP_DOXY…\n\n\nRRRRRARRRRRRRRRRRARRRRRRRRAR\n\n\n2021-10-14 04:40:19+00:00\n\n\n172647\n\n\ncoriolis/6904117/profiles/BR6904117_170.nc\n\n\n2021-10-15 12:30:17+00:00\n\n\n57.816\n\n\n19.850\n\n\nA\n\n\n834\n\n\nIF\n\n\nMTIME PRES C1PHASE_DOXY C2PHASE_DOXY TEMP_DOXY…\n\n\nRRRRRARRRRRRRRRRRARRRRRRRRAR\n\n\n2021-10-15 17:48:45+00:00\n\n\n172649\n\n\ncoriolis/6904117/profiles/BR6904117_171.nc\n\n\n2021-10-17 00:29:12+00:00\n\n\n57.831\n\n\n19.861\n\n\nA\n\n\n834\n\n\nIF\n\n\nMTIME PRES C1PHASE_DOXY C2PHASE_DOXY TEMP_DOXY…\n\n\nRRRRRARRRRRRRRRRRARRRRRRRRAR\n\n\n2021-10-17 04:40:26+00:00\n\n\n172651\n\n\ncoriolis/6904117/profiles/BR6904117_172.nc\n\n\n2021-10-18 13:12:23+00:00\n\n\n57.825\n\n\n19.852\n\n\nA\n\n\n834\n\n\nIF\n\n\nMTIME PRES C1PHASE_DOXY C2PHASE_DOXY TEMP_DOXY…\n\n\nRRRRRARRRRRRRRRRRARRRRRRRRAR\n\n\n2021-10-18 16:39:57+00:00\n\n\n172 rows × 10 columns\n\n\nThe recipe\nAs I understand it, this is what has to happen to the real-time measurements during the CHLA processing:\nCHLA values whose flags are unset are flagged as Flag.PROBABLY_BAD\nCHLA_ADJUSTED values whose flags are unset are flagged as Flag.GOOD\nApply global range test (flag for CHLA and CHLA_ADJUSTED set to Flag.BAD for CHLA values outside the range -0.1 to 100)\nApply the “dark correction” to CHLA values:\nRead the SCIENTIFIC_CALIB_COEFFICIENT variable for the CHLA profile and parse the FLOAT_DARK, and FLOAT_DARK_QC values. It is likely that FLOAT_DARK_QC doesn’t exist in any files yet, so default to Flag.NO_QC. For profiles that haven’t gone through 5 cycles deeper than the “mixed layer depth”, there should be a PRELIM_DARK variable in SCIENTIFIC_CALIB_COEFFICIENT\nIf FLOAT_DARK exists, apply the equation CHLA_ADJUSTED = ((FLUORESCENCE_CHLA-FLOAT_DARK)*SCALE_CHLA)/2.\nIf it doesn’t, calculate try to calculate PRELIM_DARK and look for 4 previous instances of PRELIM_DARK with which FLOAT_DARK/_QC should be calculated. If there are 5 PRELIM_DARK values, calculate FLOAT_DARK/_QC and use that. If there aren’t calculate the adjusted value using PRELIM_DARK.\n\nApply the non-photochemical quenching correction (NPC) to the value calculated by the dark correction. This is the correction for the upper portion of the ocean whereby phytoplankton have a reduced response to UV light because they have already been exposed to it from the sun.\nA few of these steps have details that need to be expanded upon, so let’s do those first\nQC Flags\nIn addition to computing adjusted realtime values, the QC checks for CHLA also flag some values using the Argo flag scheme. One of the challeneges in doing this is that you don’t want to make a data point look “better” by assigning a QC flag (e.g., flag a value as “probably bad” when it’s already been flagged as “bad”). This logic was implemented by the experimental argortqcpy package (here) and I’ve modified it slightly to get all of the flag logic in one happy Python class:\nclass Flag:\n \"\"\"\n Flags for check output. These values are valid values of the\n ``qc`` and ``adjusted_qc`` attributes of a\n :class:`~medsrtqc.core.Trace` object. Utility functions are\n provided as static methods to get the name or value of a flag\n or to update flag values ensuring that values that are already\n marked at a \"worse\" QC level are not inadvertently changed.\n \"\"\"\n\n @staticmethod\n def label(flag):\n \"\"\"Return the label of a QC flag\"\"\"\n return Flag._names[flag]\n\n @staticmethod\n def value(label):\n \"\"\"Return the value of a QC flag\"\"\"\n for value, lab in Flag._names.items():\n if label == lab:\n return value\n raise KeyError(f\"'{label}' is not the name of a QC flag\")\n\n @staticmethod\n def update_safely(qc, to, where=None):\n \"\"\"\n Safely update ``qc`` to the value ``to``. Values that are\n already marked at a \"worse\" QC level are not modified.\n \"\"\"\n where = slice(None) if where is None else where\n flags = qc[where]\n for overridable_flag in Flag._precedence[to]:\n flags[flags == overridable_flag] = to\n qc[where] = flags\n\n NO_QC = b'0'\n GOOD = b'1'\n PROBABLY_GOOD = b'2'\n PROBABLY_BAD = b'3'\n BAD = b'4'\n CHANGED = b'5'\n # '6' not used\n # '7' not used\n ESTIMATED = b'8'\n MISSING = b'9'\n FILL_VALUE = b''\n\n _names = {\n NO_QC: 'NO_QC',\n GOOD: 'GOOD',\n PROBABLY_GOOD: 'PROBABLY_GOOD',\n PROBABLY_BAD: 'PROBABLY_BAD',\n BAD: 'BAD',\n CHANGED: 'CHANGED',\n ESTIMATED: 'ESTIMATED',\n MISSING: 'MISSING',\n FILL_VALUE: 'FILL_VALUE'\n }\n\n _precedence = {\n NO_QC: set(),\n GOOD: {\n NO_QC,\n },\n PROBABLY_GOOD: {\n NO_QC,\n GOOD,\n CHANGED,\n },\n PROBABLY_BAD: {\n NO_QC,\n GOOD,\n PROBABLY_GOOD,\n CHANGED,\n },\n BAD: {\n NO_QC,\n GOOD,\n PROBABLY_GOOD,\n CHANGED,\n PROBABLY_BAD,\n },\n CHANGED: {\n NO_QC,\n },\n ESTIMATED: {\n NO_QC,\n GOOD,\n PROBABLY_GOOD,\n },\n MISSING: {\n NO_QC,\n },\n }\nParsing SCIENTIFIC_CALIB_COEFFICIENT\nThis variable in the NetCDF files looks like this:\nDARK_CHLA = 47, SCALE_CHLA = 0.0073\nand is in string form. It can be parsed by splitting a few times to return a dict() and reconstituted by pasting together a few times.\ndef parse_scientific_calib_coefficient(val):\n parts = val.split(',')\n vals = [part.split('=') for part in parts]\n return {k.strip(): float(v.strip()) for k, v in vals}\n\ndef unparse_calib_coefficient(coefs):\n return ', '.join(f'{k} = {v}' for k, v in coefs.items())\n\ncoefs = parse_scientific_calib_coefficient('DARK_CHLA = 47, SCALE_CHLA = 0.0073')\nprint(coefs)\ncoefs['SOMETHING_ELSE'] = 1.23\nunparse_calib_coefficient(coefs)\n{'DARK_CHLA': 47.0, 'SCALE_CHLA': 0.0073}\n\n\n\n\n\n'DARK_CHLA = 47.0, SCALE_CHLA = 0.0073, SOMETHING_ELSE = 1.23'\nThe mixed layer depth\nAs I understand it, there are as many ways to calculate the mixed layer depth as there are oceanographers. The following is an implementation based on the guidance provided in the draft CHLA QC document. I’m using plt as an argument here so that you can debug this stuff interactively (but skip installing matplotlib in production). The gist is that it looks for density changes > 0.03 below 10 dbar.\nimport gsw\n\ndef calc_mixed_layer_depth(pres, temp, psal, longitude=0, latitude=0, plt=None):\n abs_salinity = gsw.SA_from_SP(psal, pres, longitude, latitude)\n conservative_temp = gsw.CT_from_t(abs_salinity, temp, pres)\n density = gsw.sigma0(abs_salinity, conservative_temp)\n\n if plt:\n plt.plot(density, pres)\n plt.gca().invert_yaxis()\n plt.gca().set_xlabel('sigma0')\n\n mixed_layer_start = (np.diff(density) > 0.03) & (pres[:-1] > 10)\n if not np.any(mixed_layer_start):\n # Can't determine mixed layer depth (no density changes > 0.03 below 10 dbar)\n return None\n\n mixed_layer_start_index = np.where(mixed_layer_start)[0][0]\n mixed_layer_depth = pres[mixed_layer_start_index]\n\n if plt:\n plt.gca().axhline(y = mixed_layer_depth, linestyle='--')\n\n return mixed_layer_depth\nIt’s a little easier to parameterize this in terms of a NetCDFWrapper object since that’s what we get handed.\ndef calc_mixed_layer_depth_nc(nc_core, plt=None):\n if 'PRES' not in nc_core.variables or \\\n 'TEMP' not in nc_core.variables or \\\n 'PSAL' not in nc_core.variables:\n return None\n \n pres = nc_core['PRES'][:][0]\n temp = nc_core['TEMP'][:][0]\n psal = nc_core['PSAL'][:][0]\n\n if 'LONGITUDE' in nc_core.variables and 'LATITUDE' in nc_core.variables:\n longitude = nc_core['LONGITUDE'][:][0]\n latitude = nc_core['LATITUDE'][:][0]\n else:\n longitude = 0\n latitude = 0\n\n return calc_mixed_layer_depth(\n pres, temp, psal, \n longitude=longitude,\n latitude=latitude, \n plt=plt)\n\ncalc_mixed_layer_depth_nc(\n argo.nc('dac/' + profiles.reset_index().file[0]), \n plt=plt)\nDownloading 'https://data-argo.ifremer.fr/dac/coriolis/6904117/profiles/R6904117_001.nc'\n\n\n\n\n\n60.5\npngCalculating FLOAT_DARK\nThe guidance on calculating PRELIM_DARK is that it’s the minimum CHLA value below the mixed layer depth boundary. In Python:\ndef calc_prelim_dark(pres, chla, mixed_layer_depth):\n if mixed_layer_depth is None:\n return None\n \n chla_filter = chla[(pres > mixed_layer_depth) & ~np.isnan(chla)]\n if len(chla_filter) > 0:\n return chla_filter.min()\n else:\n return None\nBecause no PRELIM_DARK values have been calculated since the spec is new, we need something to do this too. Again, it’s a little easier below if we parameterize this in terms of a NetCDFWrapper.\ndef calc_prelim_dark_nc(nc_bgc, nc_core):\n if 'CHLA' not in nc_bgc.variables or 'PRES' not in nc_bgc.variables:\n raise KeyError(f\"'CHLA' or 'PRES' not found\")\n \n # we need a mixed layer depth for this calculation\n mixed_layer_depth = calc_mixed_layer_depth_nc(nc_core)\n if mixed_layer_depth is None:\n return None\n\n chla_prof_i = nc_bgc.param[nc_bgc.param.STATION_PARAMETERS.str.strip() == 'CHLA'] \\\n .reset_index() \\\n .iloc[0] \\\n .N_PROF\n\n pres = nc_bgc['PRES'][:][chla_prof_i]\n chla = nc_bgc['CHLA'][:][chla_prof_i]\n \n return calc_prelim_dark(pres, chla, mixed_layer_depth)\n\ncalc_prelim_dark_nc(\n argo.nc('dac/' + bgc_profiles.reset_index().file[0]),\n argo.nc('dac/' + profiles.reset_index().file[0]))\nDownloading 'https://data-argo.ifremer.fr/dac/coriolis/6904117/profiles/BR6904117_001.nc'\n\n\n\n\n\n0.2263\nCalculating FLOAT_DARK is a little more involved and it’s where we’ll make use of our float/cycle index that we created earlier. We need to start from the first NetCDF file for the float and collect any PRELIM_DARK values from SCIENTIFIC_CALIB_COEFFICIENT that might have been calculated. If there were 3 or fewer, we do nothing. If there are exactly 4, we add our PRELIM_DARK that we just calculated, calculate the mean of the values, and use that. The FLOAT_DARK_QC gets added based on the standard deviation of the PRELIM_DARK values.\ndef accumulate_prelim_dark(bio_prof_files):\n prelim_dark = []\n for nc in argo.nc(bio_prof_files):\n coefs_df = nc.calib[nc.calib.PARAMETER.str.strip() == 'CHLA'] \\\n .filter(['SCIENTIFIC_CALIB_COEFFICIENT'])\n if len(coefs_df) == 0 or 'SCIENTIFIC_CALIB_COEFFICIENT' not in coefs_df:\n continue\n\n coefs = parse_scientific_calib_coefficient(coefs_df.iloc[0][0])\n if 'PRELIM_DARK' not in coefs:\n continue\n \n prelim_dark.append(coefs['PRELIM_DARK'])\n \n return np.array(prelim_dark)\n\naccumulate_prelim_dark(['dac/' + f for f in bgc_profiles.head(10).file])\nDownloading 9 files from 'https://data-argo.ifremer.fr/dac/coriolis/6904117/profiles'\n\n\n\n\n\narray([], dtype=float64)\nAn empty result here is expected since the scheme was just invented.\nThe whole game\nThe rest is not very oceanographic but does involve navigating the structure of BGC variables’ encoding in Argo profile NetCDFs. First, we get a NetCDFWrapper handle and a local filename to the .nc file. In argopandas, a NetCDFWrapper is a thin wrapper around a netCDF4.Dataset that implements a few common accessors in addition to the most common accessors for the dataset (notably, obj.variables and obj['variable name']).\nnc_filename = argo.filename('dac/coriolis/6904117/profiles/BR6904117_171.nc')\nnc_core_filename = argo.filename('dac/coriolis/6904117/profiles/R6904117_171.nc')\nnc = argo.nc(nc_filename)\nnc_core = argo.nc(nc_core_filename)\n\n# make sure we've got a CHLA variable\nif 'CHLA' not in nc.variables or 'PRES' not in nc.variables:\n raise KeyError(f\"'CHLA' or 'PRES' not found in '{nc_filename}'\")\n\n# find the profile index associated with CHLA\nchla_prof_i = nc.param[nc.param.STATION_PARAMETERS.str.strip() == 'CHLA'] \\\n .reset_index() \\\n .iloc[0] \\\n .N_PROF\n\n# get the PRES/CHLA/CHLA_QC series we'll work with\n# (note that there will be trailing NaN values here\n# but we want to keep those because we'll have to keep the\n# size the same to reassign values to a copy later)\npres = nc['PRES'][:][chla_prof_i]\nchla = nc['CHLA'][:][chla_prof_i]\nchla_qc_original = nc['CHLA_QC'][:][chla_prof_i]\nchla_qc = chla_qc_original.copy() # keep original so we can compare!\n\n# create the chla_adjusted and chla_qc variables from originals\nchla_adjusted = chla.copy()\nchla_adjusted_qc = chla_qc.copy()\n\n# reset chla_qc to Flag.NO_QC\nchla_qc[:] = Flag.NO_QC\n\n# plot to verify!\nplt.plot(chla, pres)\nplt.gca().invert_yaxis()\nDownloading 'https://data-argo.ifremer.fr/dac/coriolis/6904117/profiles/BR6904117_171.nc'\nDownloading 'https://data-argo.ifremer.fr/dac/coriolis/6904117/profiles/R6904117_171.nc'\npngThe first step is to set the initial QC values to Flag.PROBABLY_BAD.\nchla_qc[:] = Flag.PROBABLY_BAD\nchla_adjusted_qc[:] = Flag.GOOD\nThen we apply the global range test:\nFlag.update_safely(chla_qc, Flag.BAD, where=(chla < 0.1) | (chla > 100))\nFlag.update_safely(chla_adjusted_qc, Flag.BAD, where=(chla < 0.1) | (chla > 100))\nNext, we go through the steps for the dark correction. This profile was collected before any of this QC was implemented so it doesn’t contain any PRELIM_DARK or FLOAT_DARK. I’ll write out the logic anyway.\ncoefs_df = nc.calib[nc.calib.PARAMETER.str.strip() == 'CHLA'] \\\n .filter(['SCIENTIFIC_CALIB_COEFFICIENT'])\nif len(coefs_df) == 0 or 'SCIENTIFIC_CALIB_COEFFICIENT' not in coefs_df:\n raise ValueError(f\"Can't find 'SCIENTIFIC_CALIB_COEFFICIENT' for 'CHLA' in file '{nc_filename}'\")\n\n# keep original and modified coefs so we know if they need updating\ncoefs = parse_scientific_calib_coefficient(coefs_df.iloc[0][0])\ncoefs_mod = coefs.copy()\n\nif 'FLOAT_DARK' in coefs:\n float_dark_qc = coefs['FLOAT_DARK_QC'] if 'FLOAT_DARK_QC' in coefs else Flag.NO_QC\n chla_adjusted[:] = chla - coefs['FLOAT_DARK']\n Flag.update_safely(chla_adjusted_qc, float_dark_qc)\nelse:\n prelim_dark_acc = accumulate_prelim_dark(['dac/' + f for f in bgc_profiles.file])\n prelim_dark_this = calc_prelim_dark_nc(nc, nc_core)\n if prelim_dark_this is None:\n prelim_dark_this = np.array([])\n else:\n coefs_mod['PRELIM_DARK'] = prelim_dark_this\n prelim_dark_this = np.array([prelim_dark_this])\n \n prelim_dark_acc = np.concatenate([prelim_dark_acc, prelim_dark_this])\n\n if len(prelim_dark_acc) > 0:\n float_dark = prelim_dark_acc.mean()\n float_dark_qc = Flag.PROBABLY_GOOD # not sure how this is actually calculated\n\n chla_adjusted[:] = chla - float_dark\n Flag.update_safely(chla_adjusted_qc, float_dark_qc)\n\n # potentially store in coefs_mod if there are enough PRELIM_DARK values\n if len(prelim_dark_acc) == 5:\n coefs_mod['FLOAT_DARK'] = float_dark\n coefs_mod['FLOAT_DARK_QC'] = float_dark_qc\n else:\n # we can't calculate the adjusted value\n chla_adjusted[:] = np.nan\n\n# should show the adjustment\nprint(chla[::100])\nprint(chla_adjusted[::100])\nDownloading 161 files from 'https://data-argo.ifremer.fr/dac/coriolis/6904117/profiles'\n\n\n[1.1241999864578247 1.1461000442504883 1.292099952697754\n 1.0292999744415283 0.23360000550746918 0.20440000295639038\n 0.21170000731945038 0.21170000731945038 0.23360000550746918\n 0.2773999869823456 0.2847000062465668 0.2847000062465668\n 0.2919999957084656]\n[0.9343999624252319 0.9563000202178955 1.1022999286651611\n 0.8394999504089355 0.04380001127719879 0.014600008726119995\n 0.021900013089179993 0.021900013089179993 0.04380001127719879\n 0.0875999927520752 0.09490001201629639 0.09490001201629639\n 0.10220000147819519]\nThere are no previous PRELIM_DARK values and no FLOAT_DARK value, so this applies the calculation based on the current profile.\nFrom my reading of the document, that’s the operation! What’s left is to figure out what changed and modify the NetCDF. There is probably an object diff library in Python that could do a good job of reporting what changed and I’d encourage readers to implement it! The objects that might have changed are coefs_mod (check differences with coefs) and chla_qc (check differences via chla_qc_original). chla_adjusted and chla_adjusted_qc can be written without checking diffs since in theory a brand new file won’t have those variables anyway.\nI imagine modification would be done by making a copy of the NetCDF file and opening via netCDF4.Dataset() in write mode.\nif coefs_mod != coefs:\n coefs_serialized = unparse_calib_coefficient(coefs_mod)\n # write coefs_searialized to the appropriate location in \n # 'SCIENTIFIC_CALIB_COEFFICIENT'\n\nchla_qc_updated = chla_qc != chla_qc_original\nTests\nIn the course of writing all of this I also wrote some unit tests and it’s worth putting them here so they don’t get lost!\nimport unittest\n\nclass TestFlag(unittest.TestCase):\n\n def test_value(self):\n self.assertEqual(Flag.NO_QC, Flag('NO_QC'))\n self.assertEqual(Flag.label(Flag.NO_QC), 'NO_QC')\n with self.assertRaises(KeyError):\n Flag('not a QC key')\n with self.assertRaises(KeyError):\n Flag.label(b'def not a flag')\n\n def test_update(self):\n qc = np.array([Flag.GOOD, Flag.PROBABLY_BAD, Flag.MISSING])\n Flag.update_safely(qc, to=Flag.BAD)\n self.assertTrue(np.all(qc == np.array([Flag.BAD, Flag.BAD, Flag.MISSING])))\n\n qc = np.array([Flag.GOOD, Flag.PROBABLY_BAD, Flag.MISSING])\n Flag.update_safely(qc, to=Flag.BAD, where=np.array([False, True, False]))\n self.assertTrue(np.all(qc == np.array([Flag.GOOD, Flag.BAD, Flag.MISSING])))\n\n\n\n", "preview": {}, - "last_modified": "2024-02-13T18:11:56+00:00", + "last_modified": "2024-02-21T18:35:47+00:00", "input_file": {} }, { @@ -119,7 +119,7 @@ "categories": [], "contents": "\nOne of the named quality control tests that Argo profile longitude/latitude measurements must undergo is the “Position on land” test. All data assembly centres have their own implementation of this test. However, in migrating some code between languages I noted some IT challenges that may pop up with several approaches (notably, those that require a Python PROJ, GDAL, or GEOS install). This post is going through some options for how to implement that test in both Python and R with varying levels of dependencies. I’ll use the argodata to load all the profile locations which I’ll use to test the various approaches.\n\n\nlibrary(argodata)\nprof <- argo_global_prof()\n\n\n\nUse a vector definition of ‘land’ in R\nThis the the most obvious choice and probably the way that the test is implemented most frequently. The question does arise, though, as to where one gets the polygons for “land”. I would suggest using the Natural Earth 1:10,000,000 ocean data set because it has a clear source/version history and has a reasonable file size for the kinds of accuracy that we need. Most floats are deployed in water over 1000 m deep and aren’t so close to the coat that a higher resolution data set would improve the accuracy of the test. If you need a higher resolution you can use the Global Administration Database which also has a clear source/version history (but much larger file sizes).\n\n\n# download/unzip 1:10,000,000 oceans\ncurl::curl_download(\n \"https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_ocean.zip\",\n \"ne_10m_ocean.zip\"\n)\nunzip(\"ne_10m_ocean.zip\", exdir = \".\")\n\n\n\nIn R the easiest way to go about this is to use the sf package, which you will need to load the files distributed by Natural Earth or GADM. Because of a recent update to sf, you have to omit the CRS values so that the longitude/latitude values are treated as Cartesian. Because the ocean data set was prepared for this use-case in mind, this isn’t a problem.\n\n\nlibrary(sf)\nocean <- read_sf(\"ne_10m_ocean.shp\") %>% \n st_set_crs(NA)\nplot(ocean$geometry, col = \"lightblue\")\n\n\n\n\nIf you want to check “is this point in the ocean”, you can use st_intersects().\n\n\nprofiles <- data.frame(\n id = \"prof1\",\n longitude = c(-65, -60),\n latitude = c(45, 45)\n)\n\nprofiles_sf <- st_as_sf(\n profiles,\n coords = c(\"longitude\", \"latitude\")\n)\n\nst_intersects(profiles_sf, ocean, sparse = FALSE)\n\n\n [,1]\n[1,] FALSE\n[2,] TRUE\n\nThe file size of the shapefile is about 6 MB unzipped, which is fairly reasonable. If you’re in an IT environment where installing R and R packages from CRAN is easy and you can maintain a recent GEOS/PROJ/GDAL stack, you’re good to go! If you can install packages but can’t maintain a system library stack, you can use the above as a “prep script” and distribute a well-known binary representation of the ocean polygon with your code. You can then use the geos package.\n\n\n# in some preparation script...\nocean_wkb <- st_as_binary(ocean$geometry)\nsaveRDS(ocean_wkb, \"ne_10m_ocean.WKB.rds\")\n\n# in your production code\nlibrary(geos)\nocean <- geos_read_wkb(readRDS(\"ne_10m_ocean.WKB.rds\"))\ngeos_intersects_any(\n geos_make_point(profiles$longitude, profiles$latitude),\n ocean\n)\n\n\n[1] FALSE TRUE\n\nUse a raster definition of ‘land’\nAnother option is to use a raster mask (zero or one values) to implement the point-on-land test. The nice part about this is that all you need is the NetCDF library installed (and you were never going to get away with an Argo QC package without it). There is no pre-computed land raster mask available but you can compute one reasonably easily using the ETOPO1. I’m going to prep the NetCDF using the stars package starting from the grid-registered GeoTIFF version of the ETOPO1 data set.\n\n\ncurl::curl_download(\n \"https://www.ngdc.noaa.gov/mgg/global/relief/ETOPO1/data/ice_surface/grid_registered/georeferenced_tiff/ETOPO1_Ice_g_geotiff.zip\",\n \"ETOPO1_Ice_g_geotiff.zip\"\n)\n\nunzip(\"ETOPO1_Ice_g_geotiff.zip\", exdir = \".\")\n\n\n\nI’m using a GeoTIFF version because it’s a little easier to load into R. The stars package takes care of the details but we do need to create the vector of latitude/longitude cell minimum values ourselves. The magic 1/60 here is one arc minute (the resolution of the data set).\n\n\ngrid <- stars::read_stars(\"ETOPO1_Ice_g_geotiff.tif\", proxy = FALSE)\nis_land <- grid > 0\n\n\n\nI’m using ncdf4 to write this but you can (and probably should) use the RNetCDF package because it’s more actively maintained and, in some cases, much faster. Note that the y values are in reverse order (north to south).\n\n\nlibrary(ncdf4)\n\ndim_x <- ncdim_def(\"longitude\", \"degrees\", seq(-180, 180, by = 1/60) - 1/60 / 2)\ndim_y <- ncdim_def(\"latitude\", \"degrees\", rev(seq(-90, 90, by = 1/60) - 1/60 / 2))\nvar_land <- ncvar_def(\n \"is_land\", \"boolean\", \n list(dim_x, dim_y),\n prec = \"byte\", compression = 9\n)\n\nnc <- nc_create(\"ETOPO1_is_land.nc\", vars = list(var_land))\nncvar_put(nc, var_land, vals = is_land[[1]])\nnc_close(nc)\n\n\n\nI turned the compression up big time here because the original grid was 400 MB! That’s unrealistic in terms of data distribution alongside code and way bigger than our compressed WKB version of the Natural Earth ocean boundaries (~3 MB). Compressed the file is just under 1 MB (!!!). To extract a longitude/latitude ‘is land’ value you have to do a tiny bit of math to find the cell index you’re after and then read the value of that cell.\n\n\nnc <- nc_open(\"ETOPO1_is_land.nc\")\nlon_values <- nc$dim$longitude$vals\nlat_values <- nc$dim$latitude$vals\n\ncell_x <- vapply(profiles$longitude, function(lon) which.min(abs(lon - lon_values)), integer(1))\ncell_y <- vapply(profiles$latitude, function(lat) which.min(abs(lat - lat_values)), integer(1))\n\nprof_is_land <- integer(nrow(profiles))\nfor (i in seq_along(prof_is_land)) {\n prof_is_land[i] = ncvar_get(\n nc, \"is_land\",\n start = c(cell_x[i], cell_y[i]),\n count = c(1, 1)\n )\n}\n\nnc_close(nc)\n\nprof_is_land\n\n\n[1] 1 0\n\nLet’s plot the results to see what we’re up against!\n\n\nplot(st_as_sfc(ocean), xlim = c(-70, -60), ylim = c(41, 49))\npoints(profiles[c(\"longitude\", \"latitude\")], pch = ifelse(prof_is_land, 16, 1))\n\n\n\n\nPython implementation\nBoth approaches can be implemented in Python, including the data preparation step (although I think this is easier in R for both). In particular, the NetCDF version results in a small (1 MB), distributable data file that can be included in a Python package and read via netCDF4 or other NetCDF backend. This doesn’t require a GEOS system install and might be eaiser to convince IT folks to work with.\n\n\n\n", "preview": "posts/2021-07-02-classifying-an-arbitrary-longitudelatitude-as-land-or-ocean/classifying-an-arbitrary-longitudelatitude-as-land-or-ocean_files/figure-html5/unnamed-chunk-3-1.png", - "last_modified": "2024-02-13T18:11:56+00:00", + "last_modified": "2024-02-21T18:35:47+00:00", "input_file": {}, "preview_width": 1248, "preview_height": 768 @@ -138,7 +138,7 @@ "categories": [], "contents": "\nIn preparation for writing some real-time quality control code, the question arose of how to check modified Argo files to make sure that they conform to the specification in the Argo User’s Manual. The official tool is hosted on Ifremer and is written in Java. When the latest version is downloaded you can run a shell command that will check one or more files. Running a bash shell, this can be done in a few lines:\n# download and unpack the tool in the current working directory\ncurl https://www.seanoe.org/data/00344/45538/data/83774.tar.gz | tar -xz\n\n# set the working directory\ncd format_control_1-17\n\n# download a test Argo file to check\ncurl -o R4902533_001.nc \\\n https://data-argo.ifremer.fr/dac/meds/4902533/profiles/R4902533_001.nc\n\n# ...and check it\njava -cp ./resources:./jar/formatcheckerClassic-1.17-jar-with-dependencies.jar \\\n -Dapplication.properties=application.properties \\\n -Dfile.encoding=UTF8 \\\n oco.FormatControl \\\n R4902533_001.nc\n\n