diff --git a/icoscp/docs/authentication.md b/icoscp/docs/authentication.md new file mode 100644 index 0000000..7ac40a4 --- /dev/null +++ b/icoscp/docs/authentication.md @@ -0,0 +1,113 @@ +# Authentication +To ensure users' licence acceptance when accessing data objects +through the `icoscp` Python library, authentication is required. Users +must either have ICOS Carbon Portal login credentials or +log in to Carbon Portal using another mechanism +[https://cpauth.icos-cp.eu/login/](https://cpauth.icos-cp.eu/login/) +to obtain the token to access ICOS data. + +Users also need to read and accept the ICOS Data Licence in their +[user profile](https://cpauth.icos-cp.eu/). + +**Metadata-only access does not require authentication.** + +Users with direct access to the data files, +namely, **users of our [ICOS Jupyter services]( +https://www.icos-cp.eu/data-services/tools/jupyter-notebook), are +not required to configure authentication**. + +In order to fetch data, users make their requests to data objects and +must provide an API token to do so. For security reasons the API token is +valid for 100'000 seconds (27 hours) and may therefore need to be refreshed; +this process is automated when using credentials-based authentication (see +below). + +Authentication can be initialized in a number of ways. **Please note** that +when using other approaches than the default one (see below), it becomes +necessary to use the `data` value (instance of `DataClient` class) obtained +in the process of authentication bootstrap, rather than the default instance +obtained by import `from icoscp_core.icos import data` used in the code +examples in this documentation. + +## By credentials file (default) +This approach should only be used on machines the developer trusts. + +A username/password account for the [ICOS](https://cpauth.icos-cp.eu/) +authentication service is required for this. Obfuscated (not readable by +humans) password is stored in a file on the local machine in a **default +user-specific folder**. To initialize this file, run the following code +interactively (only needs to be done once for every machine): + +```Python +from icoscp_core.icos import auth +auth.init_config_file() +``` + +After the initialization step is done in this way, access to the data can be +achieved using both the new `icoscp_core` machinery and the legacy +[Dobj classes](modules.md#dobj). + +## By custom credentials file +The developer may wish to use a specific file to store obfuscated +credentials and token cache. In this scenario, data and +metadata access are achieved as follows: + +```Python +from icoscp_core.icos import bootstrap +auth, meta, data = bootstrap.fromPasswordFile("") +# The next line needs to be run interactively (only once per file). +auth.init_config_file() +``` + +If the legacy library functionality will be used, the following extra step is +needed as well: + +```Python +from icoscp import cpauth +cpauth.init_by(auth) +``` + +## By authentication token (prototyping) +This option is good for testing, on a public machine or in general. Its +only disadvantage is that the tokens have limited period of validity +(100'000 seconds, less than 28 hours), but this is precisely what makes +it acceptable to include them directly in the Python source code. + +The token can be obtained from the "My Account" page ([ICOS]( +https://cpauth.icos-cp.eu/)), which can be accessed by logging in +using one of the supported authentication mechanisms (username/password, +university sign-in, OAuth sign in). After this the bootstrapping can be +done as follows: + +```Python +from icoscp_core.icos import bootstrap +cookie_token = 'cpauthToken=WzE2OTY2NzQ5OD...' +meta, data = bootstrap.fromCookieToken(cookie_token) +``` + +If the legacy library functionality will be used, the following extra step is +needed as well: + +```Python +from icoscp import cpauth +cpauth.init_by(data.auth) +``` + +## By explicit credentials (advanced option) +The user may choose to use their own mechanism of providing the +credentials to initialize the authentication. This should be considered +as an advanced option. **(Please do not put your password as clear text +in your Python code!)** This can be achieved as follows: + +```Python +from icoscp_core.icos import bootstrap +meta, data = bootstrap.fromCredentials(username_variable, password_containing_variable) +``` + +If the legacy library functionality will be used, the following extra step is +needed as well: + +```Python +from icoscp import cpauth +cpauth.init_by(data.auth) +``` diff --git a/icoscp/docs/changelog.md b/icoscp/docs/changelog.md index b9ff004..c66bec2 100644 --- a/icoscp/docs/changelog.md +++ b/icoscp/docs/changelog.md @@ -1,6 +1,6 @@ # Changelog -## 0.2.0a0 +## 0.2.0 - #### cpauth module - Remove legacy authentication. - Authenticate via [icoscp_core](https://pypi.org/project/icoscp_core/). diff --git a/icoscp/docs/examples.md b/icoscp/docs/examples.md new file mode 100644 index 0000000..8df0597 --- /dev/null +++ b/icoscp/docs/examples.md @@ -0,0 +1,66 @@ +# Examples + +## Monthly CO2 averages + +This example uses `icoscp_core` functionality to merge, monthly-average and +plot CO2 molar fraction data from selected stations, sampled at certain +heights. See [discover data types](getting_started.md#discover-data-types) +to find out how to discover a URL for a data type. + +```Python +from icoscp_core.icos import data, meta, ATMO_STATION +import pandas as pd +import numpy as np + +# Swedish atmospheric stations whose metadata is supplied to the Carbon Portal +# by the Atmospheric Thematic Center +se_atmo_stations = [ + s for s in meta.list_stations(ATMO_STATION) + if s.country_code=='SE' +] + +# Find basic metadata for CO2 release data sampled at at least 100 m above the ground +se_co2_from_100 = [ + dobj for dobj in meta.list_data_objects( + # URL for official ICOS CO2 molar fraction release data + datatype='http://meta.icos-cp.eu/resources/cpmeta/atcCo2L2DataObject', + station=se_atmo_stations + ) + if dobj.sampling_height >= 100 +] + +# prepare an empty pandas DataFrame to merge the data into +merged_co2 = pd.DataFrame(columns=['TIMESTAMP', 'co2']) + +# batch-fetch the interesting columns and iterate through the results +for dobj, arrs in data.batch_get_columns_as_arrays(se_co2_from_100, ['TIMESTAMP', 'co2']): + st_uri = dobj.station_uri + # ICOS atmospheric station URIs end with underscore followed by a 3-letter station ID + # this ID is convenient to use as a suffix to rename 'co2' with + station_id = st_uri[st_uri.rfind('_'):] + df = pd.DataFrame(arrs) + # next line would be needed if `keep_bad_data` flag in batch_get_columns_as_arrays was set to True + #df.loc[df['Flag'] != 'O', 'co2'] = np.nan + del df['Flag'] + merged_co2 = pd.merge(merged_co2, df, on='TIMESTAMP', how='outer', suffixes=('', station_id)) + +del merged_co2['co2'] + +# compute monthly averages +by_month = merged_co2.groupby(pd.Grouper(key='TIMESTAMP', freq='M')).mean().reset_index() + +# Let us retrieve column metadata to construct Y-axis plot label + +# fetch detailed metadata for one of the data objects (they all have same columns) +dobj_meta = meta.get_dobj_meta(se_co2_from_100[0]) + +# get 'value type' part of column metadata for co2 column +columns_meta = dobj_meta.specificInfo.columns +co2_value_type = [col for col in columns_meta if col.label=='co2'][0].valueType + +# construct the Y-axis label from the value type data structure +co2_axis_label = f'{co2_value_type.self.label} [{co2_value_type.unit}]' + +# plotting should work in Jupyter +by_month.plot(x='TIMESTAMP', ylabel=co2_axis_label); +``` \ No newline at end of file diff --git a/icoscp/docs/faq.md b/icoscp/docs/faq.md index 437324e..1e73008 100644 --- a/icoscp/docs/faq.md +++ b/icoscp/docs/faq.md @@ -4,10 +4,12 @@ Please see [Getting started](getting_started.md) for possible answers to questio covered here. ### `icoscp_core` is very different from the old `icoscp`, do I have to rewrite everything? -No, your code depending on the older version will continue working. No code got -removed from `icoscp` with release `0.2.0`. But you may benefit from the new -`icoscp_core` features for new developments, and from gradual porting of your -older code to using `icoscp_core`. +No, your code depending on the older version will continue working. Apart from +moving stilt-related functionality to a new dedicated library and overhauling +authentication that was present in later versions of `0.1.x` series, no code got +removed from `icoscp` with release `0.2.0`. But you should use the new +`icoscp_core` features for new developments, and can benefit from gradual +porting of at least some of your older code to using `icoscp_core`. ### How can I retrieve the latest/newest version of a dataset? ```python diff --git a/icoscp/docs/getting_started.md b/icoscp/docs/getting_started.md index 6fe3df4..e98da01 100644 --- a/icoscp/docs/getting_started.md +++ b/icoscp/docs/getting_started.md @@ -1,7 +1,6 @@ -# Getting started +# Getting started with icoscp_core The examples in this section can be tried on a public Jupyter Hub running -Python3 notebooks, where the library is preinstalled. - +Python3 notebooks, where the library is preinstalled, for example [https://exploredata.icos-cp.eu/](https://exploredata.icos-cp.eu/) Please, click [here]( @@ -12,119 +11,6 @@ If run on a standalone machine rather than an ICOS Carbon Portal Jupyter Hub instance, the data access examples assume that the authentication has been configured as explained in the next section. -## Authentication -To ensure users' licence acceptance when accessing data objects -through the `icoscp` Python library, authentication is required. Users -must either have ICOS Carbon Portal login credentials or -log in to Carbon Portal using another mechanism -[https://cpauth.icos-cp.eu/login/](https://cpauth.icos-cp.eu/login/) -to obtain the token to access ICOS data. - -Users also need to read and accept the ICOS Data Licence in their -[user profile](https://cpauth.icos-cp.eu/). - -Metadata-only access does not require authentication. - -Users with direct access to the data files, -namely, **users of our [ICOS Jupyter services]( -https://www.icos-cp.eu/data-services/tools/jupyter-notebook), are -not required to configure authentication**. - -In order to fetch data, users make their requests to data objects and -must provide an API token to do so. For security reasons the API token is -valid for 100'000 seconds (27 hours) and may therefore need to be refreshed; -this process is automated when using credentials-based authentication (see -below). - -Authentication can be initialized in a number of ways. **Please note** that -when using other approaches than the default one (see below), it becomes -necessary to use the `data` value (instance of `DataClient` class) obtained -in the process of authentication bootstrap, rather than the default instance -obtained by import `from icoscp_core.icos import data` used in the code -examples in this documentation. - -### By credentials file (default) -This approach should only be used on machines the developer trusts. - -A username/password account for the [ICOS](https://cpauth.icos-cp.eu/) -authentication service is required for this. Obfuscated (not readable by -humans) password is stored in a file on the local machine in a **default -user-specific folder**. To initialize this file, run the following code -interactively (only needs to be done once for every machine): - -```Python -from icoscp_core.icos import auth -auth.init_config_file() -``` - -After the initialization step is done in this way, access to the data can be -achieved using both the new `icoscp_core` machinery and the legacy -[Dobj classes](modules.md#dobj). - -### By custom credentials file -The developer may wish to use a specific file to store obfuscated -credentials and token cache. In this scenario, data and -metadata access are achieved as follows: - -```Python -from icoscp_core.icos import bootstrap -auth, meta, data = bootstrap.fromPasswordFile("") -# The next line needs to be run interactively (only once per file). -auth.init_config_file() -``` - -If the legacy library functionality will be used, the following extra step is -needed as well: - -```Python -from icoscp import cpauth -cpauth.init_by(auth) -``` - -### By authentication token (prototyping) -This option is good for testing, on a public machine or in general. Its -only disadvantage is that the tokens have limited period of validity -(100'000 seconds, less than 28 hours), but this is precisely what makes -it acceptable to include them directly in the Python source code. - -The token can be obtained from the "My Account" page ([ICOS]( -https://cpauth.icos-cp.eu/)), which can be accessed by logging in -using one of the supported authentication mechanisms (username/password, -university sign-in, OAuth sign in). After this the bootstrapping can be -done as follows: - -```Python -from icoscp_core.icos import bootstrap -cookie_token = 'cpauthToken=WzE2OTY2NzQ5OD...' -meta, data = bootstrap.fromCookieToken(cookie_token) -``` - -If the legacy library functionality will be used, the following extra step is -needed as well: - -```Python -from icoscp import cpauth -cpauth.init_by(data.auth) -``` - -### By explicit credentials (advanced option) -The user may choose to use their own mechanism of providing the -credentials to initialize the authentication. This should be considered -as an advanced option. **(Please do not put your password as clear text -in your Python code!)** This can be achieved as follows: - -```Python -from icoscp_core.icos import bootstrap -meta, data = bootstrap.fromCredentials(username_variable, password_containing_variable) -``` - -If the legacy library functionality will be used, the following extra step is -needed as well: - -```Python -from icoscp import cpauth -cpauth.init_by(data.auth) -``` ## General note on metadata An important background information on ICOS metadata is that all the metadata-represented entities (data objects, data types, documents, @@ -205,6 +91,11 @@ import pandas as pd co2_release_data_pd = ( (dobj, pd.DataFrame(arrs)) for dobj, arrs in co2_release_data) ``` +## Examples + +See [Examples](examples.md#examples) for more lengthy examples using all of the +functionality introduced above. + ## Accessing documentation As this library depends on `icoscp_core`, all the functionality of the latter can be used, not only the examples from above. It is introduced on the diff --git a/icoscp/docs/modules.md b/icoscp/docs/modules.md index d5be9b0..208155b 100644 --- a/icoscp/docs/modules.md +++ b/icoscp/docs/modules.md @@ -7,7 +7,6 @@ your python environment you should be able to load the modules with: - `from icoscp.cpb.dobj import Dobj` - `from icoscp.station import station` - `from icoscp.collection import collection` -- `from icoscp.stilt import stiltstation` - `from icoscp.sparql.runsparql import RunSparql` - `from icoscp.sparql import sparqls` diff --git a/icoscp/mkdocs.yml b/icoscp/mkdocs.yml index 40aaabd..54afc18 100644 --- a/icoscp/mkdocs.yml +++ b/icoscp/mkdocs.yml @@ -4,11 +4,13 @@ site_name: ICOS Carbon Portal icoscp Library nav: - About: index.md - Installation: install.md - - Getting started: getting_started.md + - Authentication: authentication.md + - Getting started with icoscp_core: getting_started.md + - Examples: examples.md + - Legacy icoscp modules: modules.md + - Legacy icoscp examples: legacy_examples.md - FAQ: faq.md - Changelog: changelog.md - - Legacy modules: modules.md - - Legacy examples: legacy_examples.md theme: name: readthedocs diff --git a/icoscp_stilt/docs/getting_started.md b/icoscp_stilt/docs/getting_started.md new file mode 100644 index 0000000..09f56ab --- /dev/null +++ b/icoscp_stilt/docs/getting_started.md @@ -0,0 +1,80 @@ +# Getting started with the new icoscp_stilt + +The library is published to PyPI. + +As stated in [Background and history](index.md#background-and-history), +the [legacy functionality](modules.md#legacy-modules) is +still available, but for new code the developers are encouraged to consider +the new module `icoscp_stilt.stilt` as the first choice. + +The following code demonstrates the new functionality. + +```Python +from icoscp_stilt import stilt +from icoscp_stilt.const import CP_OBSPACK_CO2_SPEC + +# list of stilt.StiltStation dataclass instances +stations = stilt.list_stations() + +station_info_lookup = {s.id: s for s in stations} + +# example: Hyltemossa station, altitude 150 m +htm_info = station_info_lookup['HTM150'] +htm_info +``` +Here is the output of the above code. + + StiltStation( + id='HTM150', name='Hyltemossa', lat=56.1, lon=13.42, alt=150, + countryCode='SE', + years=[2006, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022], + icosId='HTM', icosHeight=150.0 + ) + +```Python +# years for which the station has calculation results +htm_years = htm_info.years + +# grouped STILT time series results, all columns, as pandas DataFrame +# can be slow when fetching first time after calculation +htm_time_series_result = stilt.fetch_result_ts('HTM150', '2022-01-01', '2022-01-31') + +# list of time-series columns (~60) +ts_columns = htm_time_series_result.columns + +# fetch selected columns only +htm_ts_ch4_basics = stilt.fetch_result_ts('HTM150', '2022-01-01', '2022-01-31', columns=['isodate', 'ch4.stilt', 'metadata']) + +# find months for which calculation was run +stilt.available_months('KRE250', 2022) + +# find all months for all years for which calculation was run +htm_yearmonths = stilt.available_year_months(htm_info) + +# list footprint time slots that were computed for a station within date interval +htm_slots_jan2022 = stilt.list_footprints('HTM150', '2022-01-01', '2022-01-31') + +# load footprint for one time slot +htm_fp_example = stilt.load_footprint('HTM150', htm_slots_jan2022[0]) + +# filter stations +de_stations = [s for s in stations if s.countryCode == 'DE'] + +# fetch observations for the German stations as numpy array dicts +# interesting columns are requested explicitly (all returned otherwise) +# using bare numpy gives maximum performance +de_co2_numpy = stilt.fetch_observations(CP_OBSPACK_CO2_SPEC, de_stations, ['value', 'time']) + +# same as previous example, but returning pandas DataFrames instead +# performance may be worse, especially on Jupyter +de_co2_pandas = stilt.fetch_observations_pandas(CP_OBSPACK_CO2_SPEC, de_stations, ['value', 'time']) +``` + +## Getting help + +All the methods in the new `stilt` module have a Python documentation +accessible by standard means, for example: + +``` +help(stilt.fetch_observations) +``` \ No newline at end of file diff --git a/icoscp_stilt/docs/index.md b/icoscp_stilt/docs/index.md index 76a5da9..501fe03 100644 --- a/icoscp_stilt/docs/index.md +++ b/icoscp_stilt/docs/index.md @@ -32,10 +32,10 @@ from icoscp_stilt import stiltstation ``` Additionally, handling of country-association of the STILT stations is -radically simplified with this new release. This was made possible by a -change on the server side that guaranteed ISO-3166 alpha-2 country code -association with every STILT station. The previously-utilized geo lookup of -countries thus became redundant. Also, in the context of STILT, detailed +simplified in this new release, which was made possible by a +change on the server side that enforced ISO-3166 alpha-2 country code +association with every STILT station. (Geo-lookup of country used previously +thus became redundant). Also, for the STILT station metadata, detailed country metadata was deemed unnecessary, retaining the country code only, with a possibility of country name lookup. This resulted in a potential breaking change for the existing STILT-related Jupyter notebooks, namely @@ -62,81 +62,3 @@ In general, library users are encouraged to switch to using the new functionality (`stilt` module) instead whenever possible (see the code examples below). -## Getting started -The library is published to PyPI. - -As stated above, the [legacy functionality](modules.md#legacy-modules) is -still available, but for new code the developers are encouraged to consider -the new module `icoscp_stilt.stilt` as the first choice. - -The following code demonstrates the new functionality. - -```Python -from icoscp_stilt import stilt -from icoscp_stilt.const import CP_OBSPACK_CO2_SPEC - -# list of stilt.StiltStation dataclass instances -stations = stilt.list_stations() - -station_info_lookup = {s.id: s for s in stations} - -# example: Hyltemossa station, altitude 150 m -htm_info = station_info_lookup['HTM150'] -htm_info -``` -Here is the output of the above code. - - StiltStation( - id='HTM150', name='Hyltemossa', lat=56.1, lon=13.42, alt=150, - countryCode='SE', - years=[2006, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022], - icosId='HTM', icosHeight=150.0 - ) - -```Python -# years for which the station has calculation results -htm_years = htm_info.years - -# grouped STILT time series results, all columns, as pandas DataFrame -# can be slow when fetching first time after calculation -htm_time_series_result = stilt.fetch_result_ts('HTM150', '2022-01-01', '2022-01-31') - -# list of time-series columns (~60) -ts_columns = htm_time_series_result.columns - -# fetch selected columns only -htm_ts_ch4_basics = stilt.fetch_result_ts('HTM150', '2022-01-01', '2022-01-31', columns=['isodate', 'ch4.stilt', 'metadata']) - -# find months for which calculation was run -stilt.available_months('KRE250', 2022) - -# find all months for all years for which calculation was run -htm_yearmonths = stilt.available_year_months(htm_info) - -# list footprint time slots that were computed for a station within date interval -htm_slots_jan2022 = stilt.list_footprints('HTM150', '2022-01-01', '2022-01-31') - -# load footprint for one time slot -htm_fp_example = stilt.load_footprint('HTM150', htm_slots_jan2022[0]) - -# filter stations -de_stations = [s for s in stations if s.countryCode == 'DE'] - -# fetch observations for the German stations as numpy array dicts -# interesting columns are requested explicitly (all returned otherwise) -# using bare numpy gives maximum performance -de_co2_numpy = stilt.fetch_observations(CP_OBSPACK_CO2_SPEC, de_stations, ['value', 'time']) - -# same as previous example, but returning pandas DataFrames instead -# performance may be worse, especially on Jupyter -de_co2_pandas = stilt.fetch_observations_pandas(CP_OBSPACK_CO2_SPEC, de_stations, ['value', 'time']) -``` - -## Getting help - -All the methods in the new `stilt` module have a Python documentation -accessible by standard means, for example: - -``` -help(stilt.fetch_observations) -``` \ No newline at end of file diff --git a/icoscp_stilt/mkdocs.yml b/icoscp_stilt/mkdocs.yml index aad344f..9ebd704 100644 --- a/icoscp_stilt/mkdocs.yml +++ b/icoscp_stilt/mkdocs.yml @@ -3,6 +3,7 @@ site_url: 'https://icos-carbon-portal.github.io/pylib/icoscp_stilt/' site_name: ICOS Carbon Portal icoscp_stilt Library nav: - About: index.md + - Getting started with the new icoscp_stilt: getting_started.md - Legacy modules: modules.md theme: