Skip to content

Commit

Permalink
Merge pull request #191 from ICOS-Carbon-Portal/190-update-documentation
Browse files Browse the repository at this point in the history
Update documentation
  • Loading branch information
ZogopZ authored Jul 12, 2024
2 parents 380367e + c662b7d commit 5fe9871
Show file tree
Hide file tree
Showing 10 changed files with 283 additions and 207 deletions.
113 changes: 113 additions & 0 deletions icoscp/docs/authentication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Authentication
To ensure users' licence acceptance when accessing data objects
through the `icoscp` Python library, authentication is required. Users
must either have ICOS Carbon Portal login credentials or
log in to Carbon Portal using another mechanism
[https://cpauth.icos-cp.eu/login/](https://cpauth.icos-cp.eu/login/)
to obtain the token to access ICOS data.

Users also need to read and accept the ICOS Data Licence in their
[user profile](https://cpauth.icos-cp.eu/).

**Metadata-only access does not require authentication.**

Users with direct access to the data files,
namely, **users of our [ICOS Jupyter services](
https://www.icos-cp.eu/data-services/tools/jupyter-notebook), are
not required to configure authentication**.

In order to fetch data, users make their requests to data objects and
must provide an API token to do so. For security reasons the API token is
valid for 100'000 seconds (27 hours) and may therefore need to be refreshed;
this process is automated when using credentials-based authentication (see
below).

Authentication can be initialized in a number of ways. **Please note** that
when using other approaches than the default one (see below), it becomes
necessary to use the `data` value (instance of `DataClient` class) obtained
in the process of authentication bootstrap, rather than the default instance
obtained by import `from icoscp_core.icos import data` used in the code
examples in this documentation.

## By credentials file (default)
This approach should only be used on machines the developer trusts.

A username/password account for the [ICOS](https://cpauth.icos-cp.eu/)
authentication service is required for this. Obfuscated (not readable by
humans) password is stored in a file on the local machine in a **default
user-specific folder**. To initialize this file, run the following code
interactively (only needs to be done once for every machine):

```Python
from icoscp_core.icos import auth
auth.init_config_file()
```

After the initialization step is done in this way, access to the data can be
achieved using both the new `icoscp_core` machinery and the legacy
[Dobj classes](modules.md#dobj).

## By custom credentials file
The developer may wish to use a specific file to store obfuscated
credentials and token cache. In this scenario, data and
metadata access are achieved as follows:

```Python
from icoscp_core.icos import bootstrap
auth, meta, data = bootstrap.fromPasswordFile("<desired path to the file>")
# The next line needs to be run interactively (only once per file).
auth.init_config_file()
```

If the legacy library functionality will be used, the following extra step is
needed as well:

```Python
from icoscp import cpauth
cpauth.init_by(auth)
```

## By authentication token (prototyping)
This option is good for testing, on a public machine or in general. Its
only disadvantage is that the tokens have limited period of validity
(100'000 seconds, less than 28 hours), but this is precisely what makes
it acceptable to include them directly in the Python source code.

The token can be obtained from the "My Account" page ([ICOS](
https://cpauth.icos-cp.eu/)), which can be accessed by logging in
using one of the supported authentication mechanisms (username/password,
university sign-in, OAuth sign in). After this the bootstrapping can be
done as follows:

```Python
from icoscp_core.icos import bootstrap
cookie_token = 'cpauthToken=WzE2OTY2NzQ5OD...'
meta, data = bootstrap.fromCookieToken(cookie_token)
```

If the legacy library functionality will be used, the following extra step is
needed as well:

```Python
from icoscp import cpauth
cpauth.init_by(data.auth)
```

## By explicit credentials (advanced option)
The user may choose to use their own mechanism of providing the
credentials to initialize the authentication. This should be considered
as an advanced option. **(Please do not put your password as clear text
in your Python code!)** This can be achieved as follows:

```Python
from icoscp_core.icos import bootstrap
meta, data = bootstrap.fromCredentials(username_variable, password_containing_variable)
```

If the legacy library functionality will be used, the following extra step is
needed as well:

```Python
from icoscp import cpauth
cpauth.init_by(data.auth)
```
2 changes: 1 addition & 1 deletion icoscp/docs/changelog.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Changelog

## 0.2.0a0
## 0.2.0
- #### cpauth module
- Remove legacy authentication.
- Authenticate via [icoscp_core](https://pypi.org/project/icoscp_core/).
Expand Down
66 changes: 66 additions & 0 deletions icoscp/docs/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Examples

## Monthly CO2 averages

This example uses `icoscp_core` functionality to merge, monthly-average and
plot CO2 molar fraction data from selected stations, sampled at certain
heights. See [discover data types](getting_started.md#discover-data-types)
to find out how to discover a URL for a data type.

```Python
from icoscp_core.icos import data, meta, ATMO_STATION
import pandas as pd
import numpy as np

# Swedish atmospheric stations whose metadata is supplied to the Carbon Portal
# by the Atmospheric Thematic Center
se_atmo_stations = [
s for s in meta.list_stations(ATMO_STATION)
if s.country_code=='SE'
]

# Find basic metadata for CO2 release data sampled at at least 100 m above the ground
se_co2_from_100 = [
dobj for dobj in meta.list_data_objects(
# URL for official ICOS CO2 molar fraction release data
datatype='http://meta.icos-cp.eu/resources/cpmeta/atcCo2L2DataObject',
station=se_atmo_stations
)
if dobj.sampling_height >= 100
]

# prepare an empty pandas DataFrame to merge the data into
merged_co2 = pd.DataFrame(columns=['TIMESTAMP', 'co2'])

# batch-fetch the interesting columns and iterate through the results
for dobj, arrs in data.batch_get_columns_as_arrays(se_co2_from_100, ['TIMESTAMP', 'co2']):
st_uri = dobj.station_uri
# ICOS atmospheric station URIs end with underscore followed by a 3-letter station ID
# this ID is convenient to use as a suffix to rename 'co2' with
station_id = st_uri[st_uri.rfind('_'):]
df = pd.DataFrame(arrs)
# next line would be needed if `keep_bad_data` flag in batch_get_columns_as_arrays was set to True
#df.loc[df['Flag'] != 'O', 'co2'] = np.nan
del df['Flag']
merged_co2 = pd.merge(merged_co2, df, on='TIMESTAMP', how='outer', suffixes=('', station_id))

del merged_co2['co2']

# compute monthly averages
by_month = merged_co2.groupby(pd.Grouper(key='TIMESTAMP', freq='M')).mean().reset_index()

# Let us retrieve column metadata to construct Y-axis plot label

# fetch detailed metadata for one of the data objects (they all have same columns)
dobj_meta = meta.get_dobj_meta(se_co2_from_100[0])

# get 'value type' part of column metadata for co2 column
columns_meta = dobj_meta.specificInfo.columns
co2_value_type = [col for col in columns_meta if col.label=='co2'][0].valueType

# construct the Y-axis label from the value type data structure
co2_axis_label = f'{co2_value_type.self.label} [{co2_value_type.unit}]'

# plotting should work in Jupyter
by_month.plot(x='TIMESTAMP', ylabel=co2_axis_label);
```
10 changes: 6 additions & 4 deletions icoscp/docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,12 @@ Please see [Getting started](getting_started.md) for possible answers to questio
covered here.

### `icoscp_core` is very different from the old `icoscp`, do I have to rewrite everything?
No, your code depending on the older version will continue working. No code got
removed from `icoscp` with release `0.2.0`. But you may benefit from the new
`icoscp_core` features for new developments, and from gradual porting of your
older code to using `icoscp_core`.
No, your code depending on the older version will continue working. Apart from
moving stilt-related functionality to a new dedicated library and overhauling
authentication that was present in later versions of `0.1.x` series, no code got
removed from `icoscp` with release `0.2.0`. But you should use the new
`icoscp_core` features for new developments, and can benefit from gradual
porting of at least some of your older code to using `icoscp_core`.

### How can I retrieve the latest/newest version of a dataset?
```python
Expand Down
123 changes: 7 additions & 116 deletions icoscp/docs/getting_started.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# Getting started
# Getting started with icoscp_core
The examples in this section can be tried on a public Jupyter Hub running
Python3 notebooks, where the library is preinstalled.

Python3 notebooks, where the library is preinstalled, for example
[https://exploredata.icos-cp.eu/](https://exploredata.icos-cp.eu/)

Please, click [here](
Expand All @@ -12,119 +11,6 @@ If run on a standalone machine rather than an ICOS Carbon Portal Jupyter Hub
instance, the data access examples assume that the authentication has been
configured as explained in the next section.

## Authentication
To ensure users' licence acceptance when accessing data objects
through the `icoscp` Python library, authentication is required. Users
must either have ICOS Carbon Portal login credentials or
log in to Carbon Portal using another mechanism
[https://cpauth.icos-cp.eu/login/](https://cpauth.icos-cp.eu/login/)
to obtain the token to access ICOS data.

Users also need to read and accept the ICOS Data Licence in their
[user profile](https://cpauth.icos-cp.eu/).

Metadata-only access does not require authentication.

Users with direct access to the data files,
namely, **users of our [ICOS Jupyter services](
https://www.icos-cp.eu/data-services/tools/jupyter-notebook), are
not required to configure authentication**.

In order to fetch data, users make their requests to data objects and
must provide an API token to do so. For security reasons the API token is
valid for 100'000 seconds (27 hours) and may therefore need to be refreshed;
this process is automated when using credentials-based authentication (see
below).

Authentication can be initialized in a number of ways. **Please note** that
when using other approaches than the default one (see below), it becomes
necessary to use the `data` value (instance of `DataClient` class) obtained
in the process of authentication bootstrap, rather than the default instance
obtained by import `from icoscp_core.icos import data` used in the code
examples in this documentation.

### By credentials file (default)
This approach should only be used on machines the developer trusts.

A username/password account for the [ICOS](https://cpauth.icos-cp.eu/)
authentication service is required for this. Obfuscated (not readable by
humans) password is stored in a file on the local machine in a **default
user-specific folder**. To initialize this file, run the following code
interactively (only needs to be done once for every machine):

```Python
from icoscp_core.icos import auth
auth.init_config_file()
```

After the initialization step is done in this way, access to the data can be
achieved using both the new `icoscp_core` machinery and the legacy
[Dobj classes](modules.md#dobj).

### By custom credentials file
The developer may wish to use a specific file to store obfuscated
credentials and token cache. In this scenario, data and
metadata access are achieved as follows:

```Python
from icoscp_core.icos import bootstrap
auth, meta, data = bootstrap.fromPasswordFile("<desired path to the file>")
# The next line needs to be run interactively (only once per file).
auth.init_config_file()
```

If the legacy library functionality will be used, the following extra step is
needed as well:

```Python
from icoscp import cpauth
cpauth.init_by(auth)
```

### By authentication token (prototyping)
This option is good for testing, on a public machine or in general. Its
only disadvantage is that the tokens have limited period of validity
(100'000 seconds, less than 28 hours), but this is precisely what makes
it acceptable to include them directly in the Python source code.

The token can be obtained from the "My Account" page ([ICOS](
https://cpauth.icos-cp.eu/)), which can be accessed by logging in
using one of the supported authentication mechanisms (username/password,
university sign-in, OAuth sign in). After this the bootstrapping can be
done as follows:

```Python
from icoscp_core.icos import bootstrap
cookie_token = 'cpauthToken=WzE2OTY2NzQ5OD...'
meta, data = bootstrap.fromCookieToken(cookie_token)
```

If the legacy library functionality will be used, the following extra step is
needed as well:

```Python
from icoscp import cpauth
cpauth.init_by(data.auth)
```

### By explicit credentials (advanced option)
The user may choose to use their own mechanism of providing the
credentials to initialize the authentication. This should be considered
as an advanced option. **(Please do not put your password as clear text
in your Python code!)** This can be achieved as follows:

```Python
from icoscp_core.icos import bootstrap
meta, data = bootstrap.fromCredentials(username_variable, password_containing_variable)
```

If the legacy library functionality will be used, the following extra step is
needed as well:

```Python
from icoscp import cpauth
cpauth.init_by(data.auth)
```
## General note on metadata
An important background information on ICOS metadata is that all the
metadata-represented entities (data objects, data types, documents,
Expand Down Expand Up @@ -205,6 +91,11 @@ import pandas as pd
co2_release_data_pd = ( (dobj, pd.DataFrame(arrs)) for dobj, arrs in co2_release_data)
```

## Examples

See [Examples](examples.md#examples) for more lengthy examples using all of the
functionality introduced above.

## Accessing documentation
As this library depends on `icoscp_core`, all the functionality of the latter
can be used, not only the examples from above. It is introduced on the
Expand Down
1 change: 0 additions & 1 deletion icoscp/docs/modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ your python environment you should be able to load the modules with:
- `from icoscp.cpb.dobj import Dobj`
- `from icoscp.station import station`
- `from icoscp.collection import collection`
- `from icoscp.stilt import stiltstation`
- `from icoscp.sparql.runsparql import RunSparql`
- `from icoscp.sparql import sparqls`

Expand Down
8 changes: 5 additions & 3 deletions icoscp/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@ site_name: ICOS Carbon Portal icoscp Library
nav:
- About: index.md
- Installation: install.md
- Getting started: getting_started.md
- Authentication: authentication.md
- Getting started with icoscp_core: getting_started.md
- Examples: examples.md
- Legacy icoscp modules: modules.md
- Legacy icoscp examples: legacy_examples.md
- FAQ: faq.md
- Changelog: changelog.md
- Legacy modules: modules.md
- Legacy examples: legacy_examples.md

theme:
name: readthedocs
Loading

0 comments on commit 5fe9871

Please sign in to comment.