Skip to content

Commit

Permalink
better sections
Browse files Browse the repository at this point in the history
  • Loading branch information
brunj7 committed Mar 19, 2024
1 parent ca59c5d commit ab4180c
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,18 @@ title: "EDS 213: Databases and Data Management"

The goals of the course were to give MEDS students the skills they need to practically, successfully, and ethically manage their data, and to create, manage, and use relational databases where appropriate. Relational database topics went farther than just SQL queries and included a significant unit on data modeling and database constraints and integrity, in addition to advanced database topics such as triggers and indexes and accessing databases from programming environments. The data management portion tied into the students' capstone projects in a couple places, and included analyzing data from an ethical perspective, creating standards-compliant metadata, and employing data de-identification techniques. The course also included a unit on the Unix command line, with an emphasis on creating reusable Bash scripts, given in the spirit that Bash is a generally useful tool that all data scientists should have at least some familiarity with.


## Data

For the database portion of the course the [Arctic Shorebird Demographics Network](https://doi.org/10.18739/A2222R68W) dataset, obtained from the [Arctic Data Center](https://arcticdata.io), was used as a running example. While this dataset is not distributed as a relational database (it is packaged as a set of related CSV files), its structure is highly amenable to a relational approach and provides a realistic example of where and why one would want to use a relational database in the Earth and environmental sciences. It also provides just enough complexity to support realistic and complex queries and views. Note that the dataset used in the course, and included in this archive, is a cleaned-up subset of the full dataset. It is necessarily a subset of the full dataset to keep the size and complexity manageable for pedagogical purposes, and it had to be cleaned up because, unfortunately, the full dataset has many errors that would have precluded creating foreign keys.

A class data GitHub repository, linked below, is used as the mechanism for distributing data files to students. Each week a new directory of files will be added to the repository and the students will be asked to pull the repository to their local environment to get the updates.


## Database

[DuckDB](https://duckdb.org/) is used as the database platform due to its strict implementation to data types that turned out to be a weakness when teaching with SQLlite last year. DuckDB is a fast in-process analytical personal database.

A class data GitHub repository, linked below, is used as the mechanism for distributing data files to students. Each week a new directory of files will be added to the repository and the students will be asked to pull the repository to their local environment to get the updates.

## Instructors

Expand Down

0 comments on commit ab4180c

Please sign in to comment.