From ca59c5ddcaa20ff3eea767e336b3723eb6c0d4ac Mon Sep 17 00:00:00 2001 From: Julien Brun Date: Tue, 19 Mar 2024 13:38:55 -0700 Subject: [PATCH] switch to new data repo and update course description --- index.qmd | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/index.qmd b/index.qmd index 678bd83..b35a0ad 100644 --- a/index.qmd +++ b/index.qmd @@ -1,18 +1,16 @@ --- -title: "Bren MEDS 213: Databases and Data Management" +title: "EDS 213: Databases and Data Management" --- ## Course description -This is an archive of the materials used for a 4-unit, letter-grade course delivered in Spring 2024 as part of the [Master of Environmental Data Science (MEDS)](https://bren.ucsb.edu/masters-programs/master-environmental-data-science) program in the [Bren School of Environmental Science & Management](https://bren.ucsb.edu). It includes PowerPoint presentations, instructor notes, live coding transcripts, supplemental materials and readings, and homework assignments. - The goals of the course were to give MEDS students the skills they need to practically, successfully, and ethically manage their data, and to create, manage, and use relational databases where appropriate. Relational database topics went farther than just SQL queries and included a significant unit on data modeling and database constraints and integrity, in addition to advanced database topics such as triggers and indexes and accessing databases from programming environments. The data management portion tied into the students' capstone projects in a couple places, and included analyzing data from an ethical perspective, creating standards-compliant metadata, and employing data de-identification techniques. The course also included a unit on the Unix command line, with an emphasis on creating reusable Bash scripts, given in the spirit that Bash is a generally useful tool that all data scientists should have at least some familiarity with. For the database portion of the course the [Arctic Shorebird Demographics Network](https://doi.org/10.18739/A2222R68W) dataset, obtained from the [Arctic Data Center](https://arcticdata.io), was used as a running example. While this dataset is not distributed as a relational database (it is packaged as a set of related CSV files), its structure is highly amenable to a relational approach and provides a realistic example of where and why one would want to use a relational database in the Earth and environmental sciences. It also provides just enough complexity to support realistic and complex queries and views. Note that the dataset used in the course, and included in this archive, is a cleaned-up subset of the full dataset. It is necessarily a subset of the full dataset to keep the size and complexity manageable for pedagogical purposes, and it had to be cleaned up because, unfortunately, the full dataset has many errors that would have precluded creating foreign keys. -[DuckDB](https://duckdb.org/) is used as the database platform due to its strict implementation to data types that turn out to be a weakness of teaching with SQLlite last year. DuckDB is a fast in-process analytical personal database. +[DuckDB](https://duckdb.org/) is used as the database platform due to its strict implementation to data types that turned out to be a weakness when teaching with SQLlite last year. DuckDB is a fast in-process analytical personal database. -A class data GitHub repository, linked below, was used as the mechanism for distributing files to students. Each week a new directory of files was added to the repository and the students were asked to pull the repository to their local environment. The repository linked here includes the files for all weeks. +A class data GitHub repository, linked below, is used as the mechanism for distributing data files to students. Each week a new directory of files will be added to the repository and the students will be asked to pull the repository to their local environment to get the updates. ## Instructors @@ -25,6 +23,7 @@ A class data GitHub repository, linked below, was used as the mechanism for dist - Jamie Miller (jkmiller\@ucsb.edu) + ## Schedule - Class: Monday & Wednesday 9:30-10:45 am (NCEAS) @@ -41,7 +40,7 @@ A class data GitHub repository, linked below, was used as the mechanism for dist [Resources](resources.qmd) -[Class data GitHub repository](https://github.com/UCSB-Library-Research-Data-Services/bren-meds213-spring-2023-class-data) +[Class data GitHub repository](https://github.com/UCSB-Library-Research-Data-Services/bren-meds213-spring-2024-class-data) ## Modules @@ -58,3 +57,4 @@ A class data GitHub repository, linked below, was used as the mechanism for dist | 8 | [Sensitive data](modules/week08/index-08.qmd) | | 9 | [Ethical & responsible data mgnt](modules/week09/index-09.qmd) | | 10 | [Data licensing and publication](modules/week10/index-10.qmd) | +