-
Notifications
You must be signed in to change notification settings - Fork 1
/
FOSS_data_api_examples.rmd
178 lines (121 loc) · 7.17 KB
/
FOSS_data_api_examples.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
---
title: Using R to Access the Fishery Foreign Trade Data API
author: "Cameron Speir"
date: "8/18/2020"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## NOAA's Foreign Fishery Trade Data
NOAA Fisheries' Office of Science and Technology serves data on seafood imports and exports by product (species and/or product form), country of origin or destination, and U.S. Customs District. More information on these data can be found at this link:
<br>
<https://www.fisheries.noaa.gov/national/sustainable-fisheries/foreign-fishery-trade-data>
<br>
The web interface for querying the data is here:
<br>
<https://www.fisheries.noaa.gov/foss/>
<br>
## General info about APIs
This is not a tutorial about APIs in general. This is a good web tutorial,<br>
<https://www.earthdatascience.org/courses/earth-analytics/get-data-using-apis/intro-to-programmatic-data-access-r/> <br>
There are others out there, too.
## How to Access the trade data API using R
The web interface can be clunky to deal with, if you want to retrieve a relatively large amount of data to analyze. It is much easier to query the data using an API. Here are some notes and basic examples on how to do this using R. The gist of this is, in order to return a specific set of data we need to construct a very specific URL that contains the details of the data we want. The server then returns the requested data in JSON format.
<br>
There is some general information on the FOSS API, with a link to the Oracle REST documentation here:
<https://www.fisheries.noaa.gov/foss/f?p=215:35:1042313548962:::::>
### Packages
You will need two packages. **httr** builds queries and interacts with the API. httr::GET() is essential here. We will also use **jsonlite** to convert JSON formatted responses to something more civilized.
```{r}
library(httr) # query the API
library(jsonlite) # help converting to data frame.
```
### Example 1 - get the the default data, and get it into a dataframe
The url <https://apps-st.fisheries.noaa.gov/ods/foss/trade_data> is the default for accessing the data. Click on the link and you'll land on a page with the first 25 rows of data in the data set. It's not clear to me how this is sorted, but there you go.
<br>
You can get these data in R by using httr:GET()
```{r}
ex_1 <- httr::GET("https://apps-st.fisheries.noaa.gov/ods/foss/trade_data/")
ex_1
httr::http_status(ex_1) # gives details on the status code returned by the server
```
The response contains the status code '200', which means it was able to return some data. The raw data isn't very useful so we use the jsonlite package to covert the JSON formatted response to a list object containing the substance of the response. This substence is contained in a list called "content".
```{r}
ex_1_list <- jsonlite::fromJSON(rawToChar(ex_1$content))
#str(ex_1_list)
```
We now have a list of 6 elements. It's kind of messy to print out, but use the str() statement that is commented out above to see for yourself what this looks like. Visual inspection of this list reveals that "items" is the element containing the actual data.
<br>
We can make the "items" element a data frame to work with.
```{r}
ex_1_df <- ex_1_list$items
#str(ex_1_df)
#head(ex_1_df)
```
Note that one of the variables in this data frame, "links", is a weird variable that contains a list in each row. This makes the data frame very messy to view. You can see this for yourself by running the str() and head() statements that are commented out above.
<br>
Let's get rid of the "links" variable in the data frame we've created to improve readability.
```{r}
ex_1_df$links <- NULL
str(ex_1_df)
head(ex_1_df)
```
### Example 2 - Specify the amount of data returned: the "limit" parameter
By default, queries via the API return 25 rows of data. You can get more by using the query parameter "limit". You can pass stuff to query parameters using the query argument to httr::GET(). The query argument is specified as list.
<br>
Let's get more than the default number of observations by specifing that the limit query parameter is 100.
```{r}
ex_2 <- GET("https://apps-st.fisheries.noaa.gov/ods/foss/trade_data",
query = list(limit = 100))
ex_2
```
This gives us 100 rows of data.
<br>
**Very imporant note:** the maximum number of rows the server will return is **10,000**.
<br>
## Filtering the data
The real usefulness of the API with these data is filtering to get a very specific subset. Here is an example of the URL you would use to get all data from 2011 onwards: https://apps-st.fisheries.noaa.gov/ods/foss/trade_data/?q={"year":{"\$gt": 2010}}
<br>
If you copy and paste that link into a brower, it will return those specific data.
<br>
The grammar for this type of filter is pretty specific and wasn't familar to me. For example, "\$gt" means "greater than". For a list of filter operators you can use, check out "3.2.5.2.1 FilterObject Grammar" at this link: <https://docs.oracle.com/cd/E56351_01/doc.30/e87809/developing-REST-applications.htm#AELIG90103>
<br>
You can filter on all of the variables in the data set:
```{r}
names(ex_1_df)
```
### Example 3 - Salmon exports from Seattle in 2016
We need to filter on four variables:
* "name" contains "SALMON" (case sensitive)
* "source" is equal to "EXP"
* "custom_district_name" is equal to "SEATTLE, WA"
* "year" is equal to 2016
We'll use httr:GET to build this URL: <br>
https://apps-st.fisheries.noaa.gov/ods/foss/trade_data/?q={"name":{"\$like":"%25SALMON%25"},"source":"EXP","custom_district_name":"SEATTLE, WA","year":"2016"}
<br>
I found the httr:GET() syntax used to build queries with this type of filtering grammar a little tricky. The query argument consists of a list with *one* element. The element is named "q". Put all of the filter objects, including the brackets, between single quotes.
```{r}
ex_3 <- GET("https://apps-st.fisheries.noaa.gov/ods/foss/trade_data/",
query = list(q = '{"name":{"$like":"%SALMON%"},"source":"EXP","custom_district_name":"SEATTLE, WA","year":"2016"}'))
ex_3
ex_3_list <- jsonlite::fromJSON(rawToChar(ex_3$content))
ex_3_df <- ex_3_list$items
ex_3_df$links <- NULL
head(ex_3_df)
```
**Important Note**: In the URL above, notice the '"\$like":"%25SALMON%25"' filter. In the filtering syntax that is being used, the notation for finding a string is to use '%' around it. Since '%' is a reserved character in a URL, you have to replace '%' with '%25' when cutting and pasting into a browswer. However, you do NOT need to do this when using httr:GET() to send your request. The '%' is sufficient, no '%25%' is needed.
### Example 4 - Hake exports for Erin
We need to filter on four variables:
* "name" contains "HAKE" (case sensitive)
* "source" is equal to "EXP"
* "year" is greater than 2018
```{r}
ex_4 <- GET("https://apps-st.fisheries.noaa.gov/ods/foss/trade_data/",
query = list(q = '{"name":{"$like":"%HAKE%"},"source":"EXP","year":{"$gt": 2018}}', limit=10000))
ex_4
ex_4_list <- jsonlite::fromJSON(rawToChar(ex_4$content))
ex_4_df <- ex_4_list$items
ex_4_df$links <- NULL
head(ex_4_df)
```