diff --git a/content/scripts.rst b/content/scripts.rst index b7e9656b..ca04ecd4 100644 --- a/content/scripts.rst +++ b/content/scripts.rst @@ -45,7 +45,7 @@ Jupyter notebooks can be parameterized for instance using `papermill ` and the weather_data file and upload them to your Jupyterlab. The script plots the temperature data for Tapiola in Espoo. The data is originally from `rp5.kz `_ and was slightly adjusted for this lecture. - **Note:** If you haven't downloaded the file directly to your Jupyterlab folder, it will be located in your **Downloads** folder or the folder you selected. In Jupyterlab click on the 'upload file' button, navigate to the folder containing the file and select it to load it into your Jupyterlab folder. + **Hint:** Copy the URL above (right-click) and in JupyterLab, use + File → Open from URL → Paste the URL. It will both download it to + the directory JupyterLab is in and open it for you. - 2. Open a terminal in Jupyter (File → New → Terminal). + 2. Open a terminal in Jupyter: File → New Launcher, then click + "Terminal" there. (if you do it this way, it will be in the right + directory. File → New → Terminal might not be.) 3. Convert the Jupyter script to a Python script by calling:: @@ -81,6 +85,8 @@ Exercises 1 $ python weather_observations.py + + Command line arguments with :data:`sys.argv` -------------------------------------------- @@ -100,29 +106,31 @@ and any further argument (separated by space) is appended to this list, like suc $ # sys.argv[2] is 'B' Lets see how it works: We modify the **weather_observations.py** script such that we allow start -and end times as well as the output file to be passed in as arguments to the function: +and end times as well as the output file to be passed in as arguments +to the function. Open it (find the ``.py`` file from the JupyterLab +file browser) and make these edits: .. code-block:: python - :emphasize-lines: 1,5-6,8,16 + :emphasize-lines: 1,5-6,8,14-15 import sys import pandas as pd - # set start and end time - start_date = pd.to_datetime(sys.argv[1],dayfirst=True) - end_date = pd.to_datetime(sys.argv[2],dayfirst=True) - - output_file_name = sys.argv[3] - + # define the start and end time for the plot + start_date = pd.to_datetime(sys.argv[1], dayfirst=True) + end_date = pd.to_datetime(sys.argv[2], dayfirst=True) ... # select the data weather = weather[weather['Local time'].between(start_date,end_date)] ... + # save the figure + output_file_name = sys.argv[3] fig.savefig(output_file_name) -We can try it out: +We can try it out (see the file ``spring_in_tapiola.png`` made in the +file browser): .. code-block:: console @@ -185,6 +193,7 @@ would show the following message: .. code-block:: console + $ python birthday.py --help usage: birthday.py [-h] [-d DATE] N positional arguments: @@ -201,7 +210,7 @@ Exercises 2 .. challenge:: Scripts-2 1. Take the Python script (``weather_observations.py``) we have written in the preceding exercise and use - :py:mod:`argparse` to specify the input and output files and allow the start and end dates to be set. + :py:mod:`argparse` to specify the input (URL) and output files and allow the start and end dates to be set. * Hint: try not to do it all at once, but add one or two arguments, test, then add more, and so on. * Hint: The input and output filenames make sense as positional arguments, since they must always be given. Input is usually first, then output. @@ -236,6 +245,7 @@ Exercises 2 - We can now process different input files without changing the script. - We can select multiple time ranges without modifying the script. + - We can easily save these commands to know what we did. - This way we can also loop over file patterns (using shell loops or similar) or use the script in a workflow management system and process many files in parallel. - By changing from :data:`sys.argv` to :mod:`argparse` we made the script more robust against @@ -287,9 +297,9 @@ Exercises 3 (optional) .. challenge:: Scripts-3 1. Download the :download:`optionsparser.py ` - function and load it into your working folder in Jupyterlab. + function and load it into your working folder in Jupyterlab (Hint: in JupyterLab, File → Open from URL). Modify the previous script to use a config file parser to read all arguments. The config file is passed in as a single argument on the command line - (using e.g. argparse or sys.argv) still needs to be read from the command line. + (using e.g. :mod:`argparse` or :data:`sys.argv`) still needs to be read from the command line. 2. Run your script with different config files. @@ -303,6 +313,12 @@ Exercises 3 (optional) :language: python :emphasize-lines: 5,9-12,15-27,30,33,36-37,58 +What did this config file parser get us? Now, we have separated the +code from the configuration. We could save all the configuration in +version control - separately and have one script that runs them. If +done right, our work could be much more reproducible and +understandable. + .. admonition:: Further reading diff --git a/resources/code/scripts/weather_observations.ipynb b/resources/code/scripts/weather_observations.ipynb index 4a5e214a..3c1ecd2a 100644 --- a/resources/code/scripts/weather_observations.ipynb +++ b/resources/code/scripts/weather_observations.ipynb @@ -12,12 +12,12 @@ "weather = pd.read_csv(url,comment='#')\n", "\n", "# define the start and end time for the plot \n", - "start_date=pd.to_datetime('01/06/2021',dayfirst=True)\n", - "end_date=pd.to_datetime('01/10/2021',dayfirst=True)\n", + "start_date=pd.to_datetime('01/06/2021', dayfirst=True)\n", + "end_date=pd.to_datetime('01/10/2021', dayfirst=True)\n", "\n", "# The date format in the file is in a day-first format, which matplotlib does nto understand.\n", "# so we need to convert it.\n", - "weather['Local time'] = pd.to_datetime(weather['Local time'],dayfirst=True)\n", + "weather['Local time'] = pd.to_datetime(weather['Local time'], dayfirst=True)\n", "# select the data\n", "weather = weather[weather['Local time'].between(start_date,end_date)]\n" ] diff --git a/resources/code/scripts/weather_observations.py b/resources/code/scripts/weather_observations.py index 8dece545..f18666f4 100644 --- a/resources/code/scripts/weather_observations.py +++ b/resources/code/scripts/weather_observations.py @@ -8,10 +8,10 @@ weather = pd.read_csv(url,comment='#') # define the start and end time for the plot -start_date=pd.to_datetime('01/06/2021',dayfirst=True) -end_date=pd.to_datetime('01/10/2021',dayfirst=True) +start_date=pd.to_datetime('01/06/2021', dayfirst=True) +end_date=pd.to_datetime('01/10/2021', dayfirst=True) #Preprocess the data -weather['Local time'] = pd.to_datetime(weather['Local time'],dayfirst=True) +weather['Local time'] = pd.to_datetime(weather['Local time'], dayfirst=True) # select the data weather = weather[weather['Local time'].between(start_date,end_date)] diff --git a/resources/code/scripts/weather_observations_argparse.py b/resources/code/scripts/weather_observations_argparse.py index ae9036ea..ceabe8d5 100644 --- a/resources/code/scripts/weather_observations_argparse.py +++ b/resources/code/scripts/weather_observations_argparse.py @@ -5,7 +5,7 @@ parser.add_argument("input", type=str, help="Input data file") parser.add_argument("output", type=str, help="Output plot file") parser.add_argument("-s", "--start", default="01/01/2019", type=str, help="Start date in DD/MM/YYYY format") -parser.add_argument("-e", "--end", default="16/10/2021", type=str, help="End date in DD/MM/YYYY format") +parser.add_argument("-e", "--end", default="16/10/2021", type=str, help="End date in DD/MM/YYYY format") args = parser.parse_args() @@ -13,11 +13,11 @@ weather = pd.read_csv(args.input,comment='#') # define the start and end time for the plot -start_date=pd.to_datetime(args.start,dayfirst=True) -end_date=pd.to_datetime(args.end,dayfirst=True) +start_date=pd.to_datetime(args.start, dayfirst=True) +end_date=pd.to_datetime(args.end, dayfirst=True) # preprocess the data -weather['Local time'] = pd.to_datetime(weather['Local time'],dayfirst=True) +weather['Local time'] = pd.to_datetime(weather['Local time'], dayfirst=True) # select the data weather = weather[weather['Local time'].between(start_date,end_date)] diff --git a/resources/code/scripts/weather_observations_config.py b/resources/code/scripts/weather_observations_config.py index c07e7d06..d31c2436 100644 --- a/resources/code/scripts/weather_observations_config.py +++ b/resources/code/scripts/weather_observations_config.py @@ -33,11 +33,11 @@ weather = pd.read_csv(parameters.input,comment='#') # obtain start and end date -start_date=pd.to_datetime(parameters.start,dayfirst=True) -end_date=pd.to_datetime(parameters.end,dayfirst=True) +start_date=pd.to_datetime(parameters.start, dayfirst=True) +end_date=pd.to_datetime(parameters.end, dayfirst=True) # Data preprocessing -weather['Local time'] = pd.to_datetime(weather['Local time'],dayfirst=True) +weather['Local time'] = pd.to_datetime(weather['Local time'], dayfirst=True) # select the data weather = weather[weather['Local time'].between(start_date,end_date)]