Skip to content

Commit

Permalink
switch to new data repo and update course description
Browse files Browse the repository at this point in the history
  • Loading branch information
brunj7 committed Mar 19, 2024
1 parent 2f48d05 commit ca59c5d
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions index.qmd
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
---
title: "Bren MEDS 213: Databases and Data Management"
title: "EDS 213: Databases and Data Management"
---

## Course description

This is an archive of the materials used for a 4-unit, letter-grade course delivered in Spring 2024 as part of the [Master of Environmental Data Science (MEDS)](https://bren.ucsb.edu/masters-programs/master-environmental-data-science) program in the [Bren School of Environmental Science & Management](https://bren.ucsb.edu). It includes PowerPoint presentations, instructor notes, live coding transcripts, supplemental materials and readings, and homework assignments.

The goals of the course were to give MEDS students the skills they need to practically, successfully, and ethically manage their data, and to create, manage, and use relational databases where appropriate. Relational database topics went farther than just SQL queries and included a significant unit on data modeling and database constraints and integrity, in addition to advanced database topics such as triggers and indexes and accessing databases from programming environments. The data management portion tied into the students' capstone projects in a couple places, and included analyzing data from an ethical perspective, creating standards-compliant metadata, and employing data de-identification techniques. The course also included a unit on the Unix command line, with an emphasis on creating reusable Bash scripts, given in the spirit that Bash is a generally useful tool that all data scientists should have at least some familiarity with.

For the database portion of the course the [Arctic Shorebird Demographics Network](https://doi.org/10.18739/A2222R68W) dataset, obtained from the [Arctic Data Center](https://arcticdata.io), was used as a running example. While this dataset is not distributed as a relational database (it is packaged as a set of related CSV files), its structure is highly amenable to a relational approach and provides a realistic example of where and why one would want to use a relational database in the Earth and environmental sciences. It also provides just enough complexity to support realistic and complex queries and views. Note that the dataset used in the course, and included in this archive, is a cleaned-up subset of the full dataset. It is necessarily a subset of the full dataset to keep the size and complexity manageable for pedagogical purposes, and it had to be cleaned up because, unfortunately, the full dataset has many errors that would have precluded creating foreign keys.

[DuckDB](https://duckdb.org/) is used as the database platform due to its strict implementation to data types that turn out to be a weakness of teaching with SQLlite last year. DuckDB is a fast in-process analytical personal database.
[DuckDB](https://duckdb.org/) is used as the database platform due to its strict implementation to data types that turned out to be a weakness when teaching with SQLlite last year. DuckDB is a fast in-process analytical personal database.

A class data GitHub repository, linked below, was used as the mechanism for distributing files to students. Each week a new directory of files was added to the repository and the students were asked to pull the repository to their local environment. The repository linked here includes the files for all weeks.
A class data GitHub repository, linked below, is used as the mechanism for distributing data files to students. Each week a new directory of files will be added to the repository and the students will be asked to pull the repository to their local environment to get the updates.

## Instructors

Expand All @@ -25,6 +23,7 @@ A class data GitHub repository, linked below, was used as the mechanism for dist

- Jamie Miller (jkmiller\@ucsb.edu)


## Schedule

- Class: Monday & Wednesday 9:30-10:45 am (NCEAS)
Expand All @@ -41,7 +40,7 @@ A class data GitHub repository, linked below, was used as the mechanism for dist

[Resources](resources.qmd)

[Class data GitHub repository](https://github.com/UCSB-Library-Research-Data-Services/bren-meds213-spring-2023-class-data)
[Class data GitHub repository](https://github.com/UCSB-Library-Research-Data-Services/bren-meds213-spring-2024-class-data)


## Modules
Expand All @@ -58,3 +57,4 @@ A class data GitHub repository, linked below, was used as the mechanism for dist
| 8 | [Sensitive data](modules/week08/index-08.qmd) |
| 9 | [Ethical & responsible data mgnt](modules/week09/index-09.qmd) |
| 10 | [Data licensing and publication](modules/week10/index-10.qmd) |

0 comments on commit ca59c5d

Please sign in to comment.