diff --git a/.quarto/xref/8469fe7d b/.quarto/xref/8469fe7d index 01a47fc..26daf76 100644 --- a/.quarto/xref/8469fe7d +++ b/.quarto/xref/8469fe7d @@ -1 +1 @@ -{"headings":["about-the-course","instruction-on-how-to-use-the-course-materiterals","contact"],"entries":[]} \ No newline at end of file +{"entries":[],"headings":["about-the-course","instruction-on-how-to-use-the-course-materiterals","contact"]} \ No newline at end of file diff --git a/.quarto/xref/INDEX b/.quarto/xref/INDEX index 921e52a..1127d09 100644 --- a/.quarto/xref/INDEX +++ b/.quarto/xref/INDEX @@ -208,5 +208,8 @@ }, "discussions/M4-2.qmd": { "M4-2.html": "235abdc8" + }, + "discussions/M5-1.qmd": { + "M5-1.html": "598966b3" } } \ No newline at end of file diff --git a/.quarto/xref/ccc2dbef b/.quarto/xref/ccc2dbef index d37356e..4e0f584 100644 --- a/.quarto/xref/ccc2dbef +++ b/.quarto/xref/ccc2dbef @@ -1 +1 @@ -{"entries":[],"headings":["overview","course-goal-objectives","helpful-notes-from-your-professor","required-book","contact-information"]} \ No newline at end of file +{"headings":["overview","course-goal-objectives","helpful-notes-from-your-professor","required-book","contact-information"],"entries":[]} \ No newline at end of file diff --git a/_quarto.yml b/_quarto.yml index 49f146d..8aca1d4 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -121,8 +121,10 @@ website: - text: "Discussion 6" file: discussions/M4-2.qmd - text: "Discussion 7" + file: discussions/M5-1.qmd + - text: "Discussion 8" file: discussions/M5-2.qmd - + - title: "Rubrics" style: "docked" background: light diff --git a/assignments/lab4.qmd b/assignments/lab4.qmd index fc3edc3..5a6f322 100644 --- a/assignments/lab4.qmd +++ b/assignments/lab4.qmd @@ -1,5 +1,5 @@ --- -title: Lab 4 Bia in Modeling +title: Lab 4 Bias in Modeling --- The final lab of the semester dives deeper into bias in machine learning. It has four parts with **bolded questions** for you to answer in each part. It begins with an activity asking you to crop a series of photos. diff --git a/assignments/project_proposal.qmd b/assignments/project_proposal.qmd index 4b356ea..c40b309 100644 --- a/assignments/project_proposal.qmd +++ b/assignments/project_proposal.qmd @@ -2,4 +2,63 @@ title: Big Data Project Proposal --- -Add assignment content here. \ No newline at end of file +At this point in your graduate studies, you’ve read multiple research papers and hopefully recognized a pattern of how they’re written. Although order can vary, most social science disciplines include these sections: Introduction, Purpose, Literature Review, Methods, Findings, Discussion, Limitations. The organization of a project proposal is very similar, except it stops after the Methods section. + +## Requirements + +For this assignment, you’ll write a proposal in **no less than five pages** (1.15” line spacing, 1” margins) and describe how you could conduct a big data project on your topic of interest. Your proposals should include the following sections: + +### I. Introduction + +Treat the introduction as the initial pitch of your idea or a summary of the significance of a research problem. After reading the introduction, I should have an understanding of what you want to do and a sense of why it’s worth doing. + +Think about your introduction as a narrative written in **two to four paragraphs** that succinctly answers the following four questions: + +- What is the central research problem? + +- What is the topic of study related to that research problem? + +- What methods should be used to analyze the research problem? + +- Where does this project fit in the research that’s already been done? What’s its contribution? + +### II. Purpose + +This section explains the context of your proposal and describe in detail why it's important. While there are no prescribed rules, you should: + +- Provide a more detailed explanation about the purpose of the study than what you stated in the introduction. This is particularly important if the problem is complex or multifaceted. + +- Describe the major issues or problems to be addressed by your research. This can be in the form of questions to be addressed or hypotheses. + +- End this section with a short paragraph that describes the organization of the remainder your proposal. + +### III. Literature Review + +Connected to the purpose of your study is a deliberate review and synthesis of prior studies related to your research question. The purpose is to place your project within the larger context of what’s already been done on the topic, while demonstrating that your work is original and innovative. Describe what questions other researchers have asked, what methods they have used, and your understanding of their findings. **This section should be no less than one - two pages.** + +Since a literature review is information dense, it is crucial that this section is well-structured to enable a reader to grasp the key arguments underpinning your proposed study in relation to that of other researchers. You can organize it historically, by methodology, by themes within the subject, etc. You must synthesize the research, discuss what gaps remain, and state how your research fills the gap. + +### IV. Research Design and Methods + +The objective of this section is to convince the reader that your overall research design and proposed methods of analysis will correctly address the problem and that the methods will effectively interpret the potential results. + +Describe the overall research design by building upon and drawing examples from your review of the literature. Consider not only methods that other researchers have used but methods of data gathering that have not been used but could be. Be specific about the methodological approaches you plan to undertake to obtain information, the techniques you will use to analyze the data, and the assessments of external validity that apply. + +When describing the methods, include the following: + +- Describe the data source(s) you will use. Who/what generates the data? What time frames will you use? What’s the unit of analysis (e.g., individuals, events, etc.) + +- Describe how you plan to obtain the data, or how you got it if you already have it. Describe the tools you need to use to get the data (if not downloading it in a structured form.) + +- Give a summary of the cleaning/joining of data that you expect to do before you begin your analysis and any tools/applications you need to do it. + +- Describe the phases of the project that involve analysis. These will include exploratory data analysis and visualization. Some projects may stop here. Others will move further into the data science cycle and require an explanation of modeling using causal inference, machine learning, etc. +- Describe the data products of your project, which may include results of statistical tests, performance analyses of learning algorithms, visualizations of the data or model parameters. + +### V. Challenges/Limitations + +Anticipate and acknowledge any potential barriers and pitfalls in carrying out your research design and explain if you can address them. No method is perfect so you need to describe where you believe challenges may exist in obtaining data or accessing information. + +### VI. Citations + +As with any scholarly research paper, you must cite the sources you used. List only the literature that you actually used or cited in your proposal. **This section does not count toward the five page minimum.** diff --git a/discussions/M5-1.qmd b/discussions/M5-1.qmd new file mode 100644 index 0000000..09b1e5f --- /dev/null +++ b/discussions/M5-1.qmd @@ -0,0 +1,19 @@ +--- +title: "Discussion 7" +--- + +## M5.2 Data Stewardship + +## Post + +Address the following: + +1. Describe two differences between US and EU data privacy laws. What are the implications of these differences for safeguarding your personal information? + +2. As we've learned this semester, your personal data can be used by different entities for economic gain as well as improvements to how we live, work, and play. If you could choose how your data are used, what would you permit versus prohibit and why? + +Discussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words. + +**Due by:** 11/16 11:59 pm EST + +See here for [Rubrics](/resources/rubrics-discussion.qmd) diff --git a/docs/assignments/certification.html b/docs/assignments/certification.html deleted file mode 100644 index 4449967..0000000 --- a/docs/assignments/certification.html +++ /dev/null @@ -1,611 +0,0 @@ - - - - - - - - - -Big Data for Public Good - Data Certification - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- -
- -
- - - - -
- -
-
-

Data Certification

-
- - - -
- - - - -
- - - -
- - -

Add assignment content here.

- - - -
- -
- - - - - \ No newline at end of file diff --git a/docs/assignments/lab4.html b/docs/assignments/lab4.html index 6fe9053..31a3f27 100644 --- a/docs/assignments/lab4.html +++ b/docs/assignments/lab4.html @@ -7,7 +7,7 @@ -Big Data for Public Good - Lab 4 Bia in Modeling +Big Data for Public Good - Lab 4 Bias in Modeling - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- -
- -
- - - - -
- -
-
-

Module 1 Discussions

-
- - - -
- - - - -
- - - -
- - -
-

M1.2 The Value of Big Data

-

Post

-

Address the following:

-
    -
  1. Describe two characteristics of big data and how they apply to data produced or used by your field of interest.

  2. -
  3. Describe ideas presented in two of this week’s readings that you had not considered before.

  4. -
  5. What questions do you have about big data that we can address this week in class?

  6. -
-

Due by: 8/29 11:59 pm EST

-

Discuss

-

Online discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.

-

Due by: 9/3 11:59 pm EST

-
-
-

M1.4 The Value of Big Data

-

Post

-

Address the following:

-
    -
  1. Describe two characteristics of big data and how they apply to data produced or used by your field of interest.

  2. -
  3. Describe ideas presented in two of this week’s readings that you had not considered before.

  4. -
  5. What questions do you have about big data that we can address this week in class?

  6. -
-

Due by: 8/29 11:59 pm EST

-

Discuss

-

Online discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.

-

Due by: 9/3 11:59 pm EST

- - -
- -
- -
- - - - - \ No newline at end of file diff --git a/docs/labs/module2.html b/docs/labs/module2.html deleted file mode 100644 index fde0231..0000000 --- a/docs/labs/module2.html +++ /dev/null @@ -1,556 +0,0 @@ - - - - - - - - - -Big Data for Public Good - Module 2 Discussions - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- -
- -
- - - - -
- -
-
-

Module 2 Discussions

-
- - - -
- - - - -
- - - -
- - - - - -
- -
- - - - - \ No newline at end of file diff --git a/docs/labs/module3.html b/docs/labs/module3.html deleted file mode 100644 index 953f9e6..0000000 --- a/docs/labs/module3.html +++ /dev/null @@ -1,556 +0,0 @@ - - - - - - - - - -Big Data for Public Good - Module 3 Discussions - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- -
- -
- - - - -
- -
-
-

Module 3 Discussions

-
- - - -
- - - - -
- - - -
- - - - - -
- -
- - - - - \ No newline at end of file diff --git a/docs/labs/module4.html b/docs/labs/module4.html deleted file mode 100644 index 3354788..0000000 --- a/docs/labs/module4.html +++ /dev/null @@ -1,556 +0,0 @@ - - - - - - - - - -Big Data for Public Good - Module 4 Discussions - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- -
- -
- - - - -
- -
-
-

Module 4 Discussions

-
- - - -
- - - - -
- - - -
- - - - - -
- -
- - - - - \ No newline at end of file diff --git a/docs/labs/module5.html b/docs/labs/module5.html deleted file mode 100644 index c55d8ad..0000000 --- a/docs/labs/module5.html +++ /dev/null @@ -1,556 +0,0 @@ - - - - - - - - - -Big Data for Public Good - Module 5 Discussions - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- -
- -
- - - - -
- -
-
-

Module 5 Discussions

-
- - - -
- - - - -
- - - -
- - - - - -
- -
- - - - - \ No newline at end of file diff --git a/docs/labs/module6.html b/docs/labs/module6.html deleted file mode 100644 index b3a5a35..0000000 --- a/docs/labs/module6.html +++ /dev/null @@ -1,556 +0,0 @@ - - - - - - - - - -Big Data for Public Good - Module 6 Discussions - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- -
- -
- - - - -
- -
-
-

Module 6 Discussions

-
- - - -
- - - - -
- - - -
- - - - - -
- -
- - - - - \ No newline at end of file diff --git a/docs/labs/overview.html b/docs/labs/overview.html deleted file mode 100644 index 45db89f..0000000 --- a/docs/labs/overview.html +++ /dev/null @@ -1,576 +0,0 @@ - - - - - - - - - -Big Data for Public Good - General Class Discussion - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- -
- -
- - - - -
- -
-
-

General Class Discussion

-
- - - -
- - - - -
- - - -
- - -
-

Class Q & A

-

Please use this section for asking course related questions about assignments, reading, lectures etc. It is likely that if you have a question, others do too.

-
-
-

Class Introduction Activity

-

Find a digital image that represents you. Post it here with the following information:

-
    -
  • Name
  • -
  • Hometown
  • -
  • Degree Program -Describe why you’re taking this course and your research interests.
  • -
- - -
- -
- -
- - - - - \ No newline at end of file diff --git a/docs/modules/module-index.html b/docs/modules/module-index.html deleted file mode 100644 index 4d48784..0000000 --- a/docs/modules/module-index.html +++ /dev/null @@ -1,493 +0,0 @@ - - - - - - - - - -Big Data for Public Good - Module 1 - Introduction to Big Data - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - -
- -
- - -
- - - -
- -
-
-

Module 1 - Introduction to Big Data

-
- - - -
- - - - -
- - -
- -
-

1. 1 Module Overview

-

This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.

-
-
-

Assignments

- - - - - - - - - - - - - - - - - - - - - -
Section & AssignmentDue Date
1.2 Discussion
1.3 Data Certification Plan
1.4 Discussion
-
-
-

1. 2 The Value of Big Data

-

This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.

-
-
-

Transforming Information to Ideas

-

This week we’ll explore the characteristics of big data and its value to the public sector. 

-
-
-

Read

-
    -
  1. Desouza & Smith, Big Data for Social Innovation, Stanford Social Innovation Review, Summer 2014.

  2. -
  3. World Bank Group, Big Data in Action for Government, World Bank Governance Practice, 2017.

  4. -
  5. Meier, P., Digital Humanitarians, Chapter 1, CRC Press, 2015.

  6. -
  7. Pentland, A. Social Physics, Chapter 1, Penguin Group, 2015.

  8. -
-
-
-

Prepare

-

Please prepare to discuss your thoughts on the following prompts:

-
    -
  1. Describe two characteristics of big data and how they apply to data produced or are used by your field of interest.

  2. -
  3. Describe ideas or practices in this week’s readings that you had not considered before.

  4. -
  5. What claims about social physics do you find valid? In what ways would you challenge Pentland’s “promise” of social physics?

  6. -
-
-
-

Attend

-

In person discussion, 1/17 at 4:30-5:30pm

- - -
- -
- -
- - - - \ No newline at end of file diff --git a/docs/modules/module1-0.html b/docs/modules/module1-0.html index 51732f0..7855298 100644 --- a/docs/modules/module1-0.html +++ b/docs/modules/module1-0.html @@ -259,7 +259,7 @@

Module Overview

diff --git a/docs/modules/module1-1.html b/docs/modules/module1-1.html index 5082d03..f4e3240 100644 --- a/docs/modules/module1-1.html +++ b/docs/modules/module1-1.html @@ -259,7 +259,7 @@

1.1 The Value of Big Data

diff --git a/docs/modules/module1-2.html b/docs/modules/module1-2.html index b1baec3..eb038aa 100644 --- a/docs/modules/module1-2.html +++ b/docs/modules/module1-2.html @@ -259,7 +259,7 @@

1.2 Big Data and You

diff --git a/docs/modules/module1-3.html b/docs/modules/module1-3.html index 8e1476e..f3267ab 100644 --- a/docs/modules/module1-3.html +++ b/docs/modules/module1-3.html @@ -259,7 +259,7 @@

1.3 The Challenges of Big Data

diff --git a/docs/modules/module1.html b/docs/modules/module1.html deleted file mode 100644 index b1e88ba..0000000 --- a/docs/modules/module1.html +++ /dev/null @@ -1,654 +0,0 @@ - - - - - - - - - -Big Data for Public Good - Module 1 - Introduction to Big Data - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- -
- -
- - - - -
- -
-
-

Module 1 - Introduction to Big Data

-
- - - -
- - - - -
- - - -
- - -
-

-
-
-

Module Overview

-

This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.

-
-
-

Content

- - - - - - - - - - - - - - - - - - - - - - - - - -
SectionAssignmentDue Date
1.1 The Value of Big DataDiscussion Post8/27
1.2 Big Data and YouData Certification Plan9/3
1.3 The Challenges of Big DataDiscussion Post & Peer Response9/10 & 9/14
- - -
- -
- -
- - - - - \ No newline at end of file diff --git a/docs/modules/module1/module1-1.html b/docs/modules/module1/module1-1.html deleted file mode 100644 index a7639e4..0000000 --- a/docs/modules/module1/module1-1.html +++ /dev/null @@ -1,769 +0,0 @@ - - - - - - - - - -Big Data for Public Good - Module 1 - Introduction to Big Data - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- -
- -
- - - - -
- -
-
-

Module 1 - Introduction to Big Data

-
- - - -
- - - - -
- - - -
- - -
-

Module Overview

-

This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.

-
-
-

Content

- - - - - - - - - - - - - - - - - - - - - -
SectionsDue Date
1.1 Discussion1/17
1.2 Data Certification Plan1/24
1.3 Discussion1/31
-
-
-

Assignments

- - - - - - - - - - - - - - - - - - - - - -
AssignmentsDue Date
1.1 Discussion1/17
1.2 Data Certification Plan1/24
1.3 Discussion1/31
-
-
-

1. 2 The Value of Big Data

-

This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.

-
-
-

Transforming Information to Ideas

-

This week we’ll explore the characteristics of big data and its value to the public sector.

-
-
-

Read

-
    -
  1. Desouza & Smith, Big Data for Social Innovation, Stanford Social Innovation Review, Summer 2014.

  2. -
  3. World Bank Group, Big Data in Action for Government, World Bank Governance Practice, 2017.

  4. -
  5. Meier, P., Digital Humanitarians, Chapter 1, CRC Press, 2015.

  6. -
  7. Pentland, A. Social Physics, Chapter 1, Penguin Group, 2015.

  8. -
-
-
-

Prepare

-

Please prepare to discuss your thoughts on the following prompts:

-
    -
  1. Describe two characteristics of big data and how they apply to data produced or are used by your field of interest.

  2. -
  3. Describe ideas or practices in this week’s readings that you had not considered before.

  4. -
  5. What claims about social physics do you find valid? In what ways would you challenge Pentland’s “promise” of social physics?

  6. -
-
-
-

Attend

-

In person discussion, 1/17 at 4:30-5:30pm

-
-
-

1. 3 Big Data and You

-
-
-

- --- - - - - - -

Find Your Inner Data Geek

- --- - - - - - -
-

Now that you know the characteristics of big data and some of their applications to the public sector, it’s time to consider how they serve your field of interest. Whether you plan to be a data scientist or work with a team of them, you need to understand how big data are collected, analyzed, and communicated. This module starts with an overview of data science to learn the steps in the life cycle of a big data project. It ends with you developing a plan to earn a data certification that exposes you to the tools and techniques of stages in the data science life cycle that fit your skill needs.

-
-
-
-

Read

-
    -
  1. The Data Science Process: A Visual Guide to Standard Procedures in Data Science by Chanin Nantasenamat

  2. -
  3. Hilary: the most poisoned baby name in US history by Hilary Parker

  4. -
-
-
-

Watch

-
    -
  1. How I Would Learn Data Science (If I Had to Start Over) by Ken Jee
  2. -
-
-
-

Complete

-

This week you’ll spend most of your time creating a data certification plan. This assignment requires that you identify a research question and develop a plan to complete workshops over the semester that will expose you to the data science knowledge and skills needed to develop a big data project. By the end of the semester, you’ll submit a proposal that describes how you can answer the research question using big data and data science techniques.

-

You will turn in this plan as your “discussion” this week. But don’t wait until class day to start this assignment. Formulating a good research question takes time, and if you’re struggling with how to use big data in a project that interests you then you need to give me time to help you. Reach out by email to set up a time to talk or post questions to the Class Q & A discussion board. 

-
    -
  • Download the instructions for the data certification plan.

  • -
  • If you need some inspiration, check out this web blog. Or Google “big data and X (your field).” It’s also a good idea to look at some academic journals in your field to see how the techniques of data science are being applied. 

  • -
  • Write me an email or set up a time to talk if you need help.

  • -
  • Submit your data certification plan to the “1.3 Data Certification” assignment folder.

  • -
-

Due by: 11/24 at 11:59pm

-
-
-

1. 4 The Challenges of Big Data

- - -
- -
- -
- - - - - \ No newline at end of file diff --git a/docs/modules/module1/module1-2.html b/docs/modules/module1/module1-2.html deleted file mode 100644 index 911fe3e..0000000 --- a/docs/modules/module1/module1-2.html +++ /dev/null @@ -1,743 +0,0 @@ - - - - - - - - - -Big Data for Public Good - Module 1 - Introduction to Big Data - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- -
- -
- - - - -
- -
-
-

Module 1 - Introduction to Big Data

-
- - - -
- - - - -
- - - -
- - -
-

1. 1 Module Overview

-

This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.

-
-
-

Assignments

- - - - - - - - - - - - - - - - - - - - - -
Section & AssignmentDue Date
1.2 Discussion1/17
1.3 Data Certification Plan1/24
1.4 Discussion1/31
-
-
-

1. 2 The Value of Big Data

-

This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.

-
-
-

Transforming Information to Ideas

-

This week we’ll explore the characteristics of big data and its value to the public sector. 

-
-
-

Read

-
    -
  1. Desouza & Smith, Big Data for Social Innovation, Stanford Social Innovation Review, Summer 2014.

  2. -
  3. World Bank Group, Big Data in Action for Government, World Bank Governance Practice, 2017.

  4. -
  5. Meier, P., Digital Humanitarians, Chapter 1, CRC Press, 2015.

  6. -
  7. Pentland, A. Social Physics, Chapter 1, Penguin Group, 2015.

  8. -
-
-
-

Prepare

-

Please prepare to discuss your thoughts on the following prompts:

-
    -
  1. Describe two characteristics of big data and how they apply to data produced or are used by your field of interest.

  2. -
  3. Describe ideas or practices in this week’s readings that you had not considered before.

  4. -
  5. What claims about social physics do you find valid? In what ways would you challenge Pentland’s “promise” of social physics?

  6. -
-
-
-

Attend

-

In person discussion, 1/17 at 4:30-5:30pm

-
-
-

1. 3 Big Data and You

-
-
-

- --- - - - - - -

Find Your Inner Data Geek

- --- - - - - - -
-

Now that you know the characteristics of big data and some of their applications to the public sector, it’s time to consider how they serve your field of interest. Whether you plan to be a data scientist or work with a team of them, you need to understand how big data are collected, analyzed, and communicated. This module starts with an overview of data science to learn the steps in the life cycle of a big data project. It ends with you developing a plan to earn a data certification that exposes you to the tools and techniques of stages in the data science life cycle that fit your skill needs.

-
-
-
-

Read

-
    -
  1. The Data Science Process: A Visual Guide to Standard Procedures in Data Science by Chanin Nantasenamat

  2. -
  3. Hilary: the most poisoned baby name in US history by Hilary Parker

  4. -
-
-
-

Watch

-
    -
  1. How I Would Learn Data Science (If I Had to Start Over) by Ken Jee
  2. -
-
-
-

Complete

-

This week you’ll spend most of your time creating a data certification plan. This assignment requires that you identify a research question and develop a plan to complete workshops over the semester that will expose you to the data science knowledge and skills needed to develop a big data project. By the end of the semester, you’ll submit a proposal that describes how you can answer the research question using big data and data science techniques.

-

You will turn in this plan as your “discussion” this week. But don’t wait until class day to start this assignment. Formulating a good research question takes time, and if you’re struggling with how to use big data in a project that interests you then you need to give me time to help you. Reach out by email to set up a time to talk or post questions to the Class Q & A discussion board. 

-
    -
  • Download the instructions for the data certification plan.

  • -
  • If you need some inspiration, check out this web blog. Or Google “big data and X (your field).” It’s also a good idea to look at some academic journals in your field to see how the techniques of data science are being applied. 

  • -
  • Write me an email or set up a time to talk if you need help.

  • -
  • Submit your data certification plan to the “1.3 Data Certification” assignment folder.

  • -
-

Due by: 11/24 at 11:59pm

-
-
-

1. 4 The Challenges of Big Data

- - -
- -
- -
- - - - - \ No newline at end of file diff --git a/docs/modules/module1/module1-3.html b/docs/modules/module1/module1-3.html deleted file mode 100644 index 911fe3e..0000000 --- a/docs/modules/module1/module1-3.html +++ /dev/null @@ -1,743 +0,0 @@ - - - - - - - - - -Big Data for Public Good - Module 1 - Introduction to Big Data - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- -
- -
- - - - -
- -
-
-

Module 1 - Introduction to Big Data

-
- - - -
- - - - -
- - - -
- - -
-

1. 1 Module Overview

-

This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.

-
-
-

Assignments

- - - - - - - - - - - - - - - - - - - - - -
Section & AssignmentDue Date
1.2 Discussion1/17
1.3 Data Certification Plan1/24
1.4 Discussion1/31
-
-
-

1. 2 The Value of Big Data

-

This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.

-
-
-

Transforming Information to Ideas

-

This week we’ll explore the characteristics of big data and its value to the public sector. 

-
-
-

Read

-
    -
  1. Desouza & Smith, Big Data for Social Innovation, Stanford Social Innovation Review, Summer 2014.

  2. -
  3. World Bank Group, Big Data in Action for Government, World Bank Governance Practice, 2017.

  4. -
  5. Meier, P., Digital Humanitarians, Chapter 1, CRC Press, 2015.

  6. -
  7. Pentland, A. Social Physics, Chapter 1, Penguin Group, 2015.

  8. -
-
-
-

Prepare

-

Please prepare to discuss your thoughts on the following prompts:

-
    -
  1. Describe two characteristics of big data and how they apply to data produced or are used by your field of interest.

  2. -
  3. Describe ideas or practices in this week’s readings that you had not considered before.

  4. -
  5. What claims about social physics do you find valid? In what ways would you challenge Pentland’s “promise” of social physics?

  6. -
-
-
-

Attend

-

In person discussion, 1/17 at 4:30-5:30pm

-
-
-

1. 3 Big Data and You

-
-
-

- --- - - - - - -

Find Your Inner Data Geek

- --- - - - - - -
-

Now that you know the characteristics of big data and some of their applications to the public sector, it’s time to consider how they serve your field of interest. Whether you plan to be a data scientist or work with a team of them, you need to understand how big data are collected, analyzed, and communicated. This module starts with an overview of data science to learn the steps in the life cycle of a big data project. It ends with you developing a plan to earn a data certification that exposes you to the tools and techniques of stages in the data science life cycle that fit your skill needs.

-
-
-
-

Read

-
    -
  1. The Data Science Process: A Visual Guide to Standard Procedures in Data Science by Chanin Nantasenamat

  2. -
  3. Hilary: the most poisoned baby name in US history by Hilary Parker

  4. -
-
-
-

Watch

-
    -
  1. How I Would Learn Data Science (If I Had to Start Over) by Ken Jee
  2. -
-
-
-

Complete

-

This week you’ll spend most of your time creating a data certification plan. This assignment requires that you identify a research question and develop a plan to complete workshops over the semester that will expose you to the data science knowledge and skills needed to develop a big data project. By the end of the semester, you’ll submit a proposal that describes how you can answer the research question using big data and data science techniques.

-

You will turn in this plan as your “discussion” this week. But don’t wait until class day to start this assignment. Formulating a good research question takes time, and if you’re struggling with how to use big data in a project that interests you then you need to give me time to help you. Reach out by email to set up a time to talk or post questions to the Class Q & A discussion board. 

-
    -
  • Download the instructions for the data certification plan.

  • -
  • If you need some inspiration, check out this web blog. Or Google “big data and X (your field).” It’s also a good idea to look at some academic journals in your field to see how the techniques of data science are being applied. 

  • -
  • Write me an email or set up a time to talk if you need help.

  • -
  • Submit your data certification plan to the “1.3 Data Certification” assignment folder.

  • -
-

Due by: 11/24 at 11:59pm

-
-
-

1. 4 The Challenges of Big Data

- - -
- -
- -
- - - - - \ No newline at end of file diff --git a/docs/modules/module1/module1.html b/docs/modules/module1/module1.html deleted file mode 100644 index 2cdc235..0000000 --- a/docs/modules/module1/module1.html +++ /dev/null @@ -1,654 +0,0 @@ - - - - - - - - - -Big Data for Public Good - Module 1 - Introduction to Big Data - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- -
- -
- - - - -
- -
-
-

Module 1 - Introduction to Big Data

-
- - - -
- - - - -
- - - -
- - -
-

-
-
-

Module Overview

-

This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.

-
-
-

Content

- - - - - - - - - - - - - - - - - - - - - - - - - -
SectionAssignmentDue Date
1.1 The Value of Big DataDiscussion Post8/27
1.2 Big Data and YouData Certification Plan9/3
1.3 The Challenges of Big DataDiscussion Post & Peer Response9/10 & 9/14
- - -
- -
- -
- - - - - \ No newline at end of file diff --git a/docs/modules/module2-1.html b/docs/modules/module2-1.html index e0f54b1..b2402a9 100644 --- a/docs/modules/module2-1.html +++ b/docs/modules/module2-1.html @@ -259,7 +259,7 @@

2.1 Social Data

diff --git a/docs/modules/module2.html b/docs/modules/module2.html deleted file mode 100644 index 8804e34..0000000 --- a/docs/modules/module2.html +++ /dev/null @@ -1,610 +0,0 @@ - - - - - - - - - -Big Data for Public Good - Module 2 - Types of Big Data - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- -
- -
- - - - -
- -
-
-

Module 2 - Types of Big Data

-
- - - -
- - - - -
- - - -
- - - - - -
- -
- - - - - \ No newline at end of file diff --git a/docs/modules/module3.html b/docs/modules/module3.html deleted file mode 100644 index c0bf459..0000000 --- a/docs/modules/module3.html +++ /dev/null @@ -1,436 +0,0 @@ - - - - - - - - - -Big Data for Public Good - Module 3 - Discovery & Insights - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - -
- -
- - -
- - - -
- -
-
-

Module 3 - Discovery & Insights

-
- - - -
- - - - -
- - -
- - - - -
- -
- - - - \ No newline at end of file diff --git a/docs/modules/module4.html b/docs/modules/module4.html deleted file mode 100644 index 7ffdd5c..0000000 --- a/docs/modules/module4.html +++ /dev/null @@ -1,436 +0,0 @@ - - - - - - - - - -Big Data for Public Good - Module 4 - Bias in Big Data - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - -
- -
- - -
- - - -
- -
-
-

Module 4 - Bias in Big Data

-
- - - -
- - - - -
- - -
- - - - -
- -
- - - - \ No newline at end of file diff --git a/docs/modules/module5-0.html b/docs/modules/module5-0.html index ce92ea6..ecaacca 100644 --- a/docs/modules/module5-0.html +++ b/docs/modules/module5-0.html @@ -333,7 +333,14 @@

Module 5 - Data Privacy & Stewardship
@@ -357,8 +364,41 @@

Module 5 - Data Privacy & Stewardship +
+

Introduction

+

Your personal data is everywhere and is being used by private and public entities for purposes you may not like. This module explores issues of data privacy and what options we have for protecting it. It also describes the responsibility that public entities have to collect, manage, and secure it.

+
+
+

Content

+ + + + + + + + + + + + + + + + + + + + + + + + + +
SectionAssignmentDue Date
5.1 Data PrivacyDiscussion Post11/5
5.1 Data PrivacyDiscussion Peer Response11/9
5.2 Data StewardshipDiscussion Post11/16
+

- - - - - - - - - - - - - - - - - - - - - - - -
-
- - -
- -
- - -
- - - -
- -
-
-

Module 5 - Data Privacy & Stewardship

-
- - - -
- - - - -
- - -
- - - - -
- -
- - - - \ No newline at end of file diff --git a/docs/modules/module6-0.html b/docs/modules/module6-0.html index 717e5b4..01e77e8 100644 --- a/docs/modules/module6-0.html +++ b/docs/modules/module6-0.html @@ -7,7 +7,7 @@ -Big Data for Public Good - Module 6 - Course Conclusion +Big Data for Public Good - Course Completion Checklist - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - -
- -
- - -
- - - -
- -
-
-

Module 6 - Course Conclusion

-
- - - -
- - - - -
- - -
- - - - -
- -
- - - - \ No newline at end of file diff --git a/docs/pictures/Picture16.gif b/docs/pictures/Picture16.gif deleted file mode 100644 index 856043d..0000000 Binary files a/docs/pictures/Picture16.gif and /dev/null differ diff --git a/docs/resources/data-certification-plan.html b/docs/resources/data-certification-plan.html deleted file mode 100644 index 1391b19..0000000 --- a/docs/resources/data-certification-plan.html +++ /dev/null @@ -1,666 +0,0 @@ - - - - - - - - - -Big Data for Public Good - Data Certification Plan - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- -
- -
- - - - -
- -
-
-

Data Certification Plan

-
- - - -
- - - - -
- - - -
- - -

This assignment requires you to identify at least five workshops or other modules on data techniques that support your development as a data user. It’s intended to align with the big data project proposal that you’ll develop during the semester.

-

This plan starts by asking you to identify a research question that you can answer using the data skills that you’ll learn in the certification. Since it’s early in the semester, what you propose to do in this plan may change as you learn more about big data and its applications to your interests. However, I’m requiring that you submit this plan now so that I can learn what topics you’re interested in and how the data certification will enhance your data skills.

-

GSU provides multiple opportunities for learning data skills and tools. Some examples include the library’s Research Data Services workshops, LinkedIn Learning, and O’Reilly for Higher Education. You may also use open access courses via Coursera, EdEx, Kaggle, etc. should they meet your needs. Each of the five workshops/modules that you complete must be:

-
    -
  • at least 90 minutes long

  • -
  • certifiable; have documentation of completion dates during this semester

  • -
  • unique and progressive; introduce new skills not included in another workshop in the certification and advance your skills in a domain

  • -
-

Ideally, the series of workshops that you pursue offers a badge or certificate that you can link to your resume or other digital profile to showcase your professional qualifications (e.g., personal website, LinkedIn profile, Handshake profile, etc.) If you’re curating workshops/courses across multiple sources or platforms, you likely won’t be able to earn a badge or certificate. Be aware that some platforms offer free access to courses, but earning a badge/certificate comes with a fee.

-

You may not use trainings or modules that you complete for other courses that you’re taking this semester. You are earning unique credit for this course, so the work you conduct for it must be different than what you may be learning in other classes. This exercise is intended to give you time (and credit) to pursue data knowledge or skills gaps that that aren’t covered in other courses. It’s an opportunity to invest in data skills that you think will be important to your future work.

-

Due to the variety of data workshops and free data courses, it’s impossible for me to know what will interest you or their formats. If you have a question about the suitability of a workshop or course for this assignment, please contact me well before the deadline of this plan.

-

After I review and approve your data certification plan, you will be required to submit a short report after each workshop to identify what you learned, how you’ll use the new skills in your project proposal (or elsewhere), and whether you recommend the workshop to other students. You’ll also submit an example of work that you generated in the training.
-

-
-

Instructions

-

Complete each of the sections below to construct your plan to earn a data certification this semester.  After my review, you may need to modify the plan. Be prepared to resubmit it for final approval before starting.

-
-

I.Ask an Interesting Question

-

Describe the question that you hope to answer with big data. Why does it interest you and/or why is it important?

-

Most research starts with a large question and narrows to a very specific one that can be answered once you know more about what data is available to answer the question and how it can be analyzed. If you’re undecided at this point in the semester what question you’ll propose to answer, describe ones that cluster around the same topic for this assignment.

-

For example, I’m interested in understanding the relationship between needle aversion and COVID vaccination rates. I think that the language and imagery used to encourage vaccination is having the opposite effect. Nearly every advertisement or news segment about COVID vaccination shows a needle going in an arm (up close!) and uses language like “get the shot” or “jab,” which evokes negative feelings in most people (e.g., like going to the dentist or booster shots as a child). For those who also have fears that the vaccination is risky, these images and words reduce the likelihood that they’ll get a vaccine. I hypothesize that the ubiquity and volume of this imagery on TV and social media (big data) is reinforcing needle aversion to an extent that it’s driving down vaccination rates. So, my research question is “Has intense exposure to imagery and language of needles decreased COVID vaccination rates in the United States?” If a relationship exists, removing needle imagery and language from mass media could improve vaccination compliance and lower the threat of COVID to public health.

-
-
-

II. Collect Data

-

Describe the big data that’s available to answer your research question, its source, and its format. Describe how it meets the characteristics of big data, and how you plan to access it.

-
-
-

III. Data Workshops

-

List the workshops you plan to complete by the end of the semester to build your data knowledge and skills to develop a big data project. For each workshop, describe:

-
    -
  • Title and purpose

  • -
  • Length/duration

  • -
  • Organization providing the training

  • -
  • Link to the training

  • -
  • Date(s) you plan to attend/do the training

  • -
  • Where it fits in the data science cycle

  • -
  • How it will enhance your data knowledge and/or skills

  • -
-
-
-

IV. Certification

-

Describe how will you certify completion of each workshop. (Make sure that the badge(s) or certification(s) you may earn will be available to post to iCollege by December 14.)

- - -
-
- -
- -
- - - - - \ No newline at end of file diff --git a/docs/search.json b/docs/search.json index 0662596..1787c3d 100644 --- a/docs/search.json +++ b/docs/search.json @@ -1,1458 +1,213 @@ [ { - "objectID": "index.html", - "href": "index.html", - "title": "Welcome to Big Data for Public Good!", - "section": "", - "text": "Overview\nPublic and nonprofit agencies are beginning to unlock the potential of large-scale data to improve service delivery and inform policy. Computational tools capable of making productive use of big data have proliferated in recent years, drastically decreasing the barriers to entry for these applications. This course explores the practice of using data to improve decision-making and evaluation, including techniques for data collection, analysis, and behavior change. You will learn about the opportunities and challenges of big data and devise a plan to apply them to areas of personal interest within the public sector.\n\n\n\nCourse Goal & Objectives\nBy the end of this course you should be able to:\n\nDefine big data and describe its characteristics;\nDiscuss how public agencies harness large-scale data to inform policy design, increase stakeholder engagement, and improve service delivery.\nRecognize situations when it’s possible to collect data to inform evidenced-based policy decision-making;\nIntelligently consider the social, political, and ethical considerations of using big data and its analytical techniques for public uses.\n\n\n\nHelpful Notes from Your Professor\n\nComplete all readings and watch the videos. They compliment each other, so both are critical to understanding course materials\nDon’t hesitate to contact me with questions or problems at any point during the semester. I will always try to respond to e-mails quickly.\nThe topics in this class are in the news all the time. Don’t hesitate to send me or post articles or videos related to course material.\n\n\n\nRequired Book\nNo books are required for this course. All learning materials are available through the course site.\n\n\nContact Information\nProfessor A Phone: (000) 123-4567 Email: Please contact Dr. Cynthia Searcy at csearcy@gsu.edu for question regarding this open course Office hour: by appointment" - }, - { - "objectID": "about.html", - "href": "about.html", - "title": "About the Big Data for Public Good Open Course", - "section": "", - "text": "About the course\nThis “Big Data for Public Good” course website serves as a public template for higher education instructors in the fields of public administration or public policy teaching about big data fundamentals.\n\n\nInstruction on how to use the course materiterals\nPlease cite properly cite this site when using or adapting the materials.\n\n\nContact\nPlease contact Dr. Cynthia Searcy at csearcy@gsu.edu for question regarding this open course" - }, - { - "objectID": "modules/module1.html", - "href": "modules/module1.html", - "title": "Module 1 - Introduction to Big Data", - "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.", - "crumbs": [ - "Module 1 - Introduction to Big Data" - ] - }, - { - "objectID": "assignments/lab1.html", - "href": "assignments/lab1.html", - "title": "Lab 1", - "section": "", - "text": "This lab introduces you to two data-driven models of neighborhood change. We will use this case study over the semester to discuss things like data needs for predictive models. You will be required to think critically about the data used in the labs, but you will not be responsible for things like the advanced analytical models in the paper. I am approaching the labs with the assumption that you are likely to be new analysts or a manager hiring an analyst, so you just need a high-level understanding of the models in order to participate in the task.\nNeighborhood change is a complicated concept with a lot of loaded terminology. We might think about neighborhoods that are “revitalized”, “gentrified”, that are “stable”, or that “decline”. We could spend an entire semester unpacking all of these constructs, but that is out of scope of the lab. Here we are more interested in how we might make sense of our data, and then once we have meaningful groups how we might use them to make predictions with the data. Can a city forecast how its current neighborhoods are likely to change over the next decade, and can that help with urban planning processes?\nRead the following articles:\n\nGoldstein, I. Market Value Analysis: A Data-Based Approach to Understanding Urban Housing Markets in Board of Governors, Federal Reserve Systems, Putting Data to Work: Data-Driven Approaches to Strengthening Neighborhoods. 2011. pp 49-59\nDelmelle, E. C. (2017). Differentiating pathways of neighborhood change in 50 US metropolitan areas. Environment and planning A, 49(10), 2402-2424.\n\nWe are interested in understanding neighborhood change. These data-driven approaches to the phenomenon use machine-learning algorithms to “discover” coherent communities within the city by grouping census tracks into groups that minimize within-group differences and maximize between-group differences.\nYou can explore one of these algorithms by looking at examples of how botanists might create “species” based upon characteristics of flowers:\nClustering Example\nA data-driven approach to understanding neighborhood change requires us to (1) define “neighborhoods”, or groups of census tracks in the data that are very similar, and (2) use those group characteristics at a point in time to predict how the “neighborhood” might change in the future. Both of the papers present variations on Step (1) above.\nRead the two papers linked on iCollege, then answer the following questions:\n\nHow did each author identify coherent “neighborhoods” (or groups) in each model?\nWould these “neighborhoods” line up with neighborhoods that are defined on a city’s zoning maps?\nDid the two models use the same data to create the groups?\nHow do the labels and descriptions of the groups differ in each model and why?\n\nWrite your responses in a word document and name your file LAB-01-YOUR-LAST-NAME, then submit it via the iCollege assignment folder. Concise and precise answers are preferred to meandering paragraphs! One page would be fine for this assignment.\nConcise and precise answers are preferred to meandering paragraphs! One page would be fine for this assignment.", - "crumbs": [ - "Lab 1" - ] - }, - { - "objectID": "assignments/lab2.html", - "href": "assignments/lab2.html", - "title": "Lab 2", - "section": "", - "text": "Instructions\nIn the first lab we examined two articles (Goldstein, 2019; Delmelle, 2017) where machine learn models were used to identify neighborhood “types” in cities.\nThe authors explained how each type tended to change in different ways over time, so understanding what category or stage a neighborhood belongs to helps the city understand what types of change might be eminent.\nIf you read about the methodologies in both articles, you will see that the advanced machine learning methods they apply are not doing anything more complicated than what many indices developed for well-being or consumer prices do – combine different variables and assign probabilities to each to measure an underlying or latent social construct. The main advantage that a computer has in this process is the ability to try every combination of variables to find the optimal way(s) to group them to create distinct dimensions of the construct you want to measure. Once the computers identify the stable groups, it is up to the human to come up with meaningful titles for each and determine if the groups actually tell us anything useful about the world (just like we would like to know if happiness scores tell us something useful).\nUnderstanding the process of how indices are developed is a stepping stone to using predictive analytics to prescribe outcomes to social or organization phenomena. The hardest part is identifying the right types of data to collect and ensuring they are high quality. You can always hire data scientists to build the models once your organization has begun collecting meaningful data.\nIn the previous lab you read about a set of variables used in a model, so you did not have to think very hard about where that data came from and why they were selected. This lab challenges you to break open the black box and think about the process of identifying data to collect for a project. How do you know which variables are useful? How do you know what data is needed to predict success or some other outcome?\nThe short answer is you don’t, not if you are starting a new project. This lab is about “feature selection” - the process of identifying the data you will need for your project.\nStart by listening to this story about the birth and evolution of a large-scale data-driven social program. Pay particular attention to how the researchers figured out what data they needed for the program.\n“Data is like vegetables. It needs to be fresh, and it needs to be local.”\nHow Iceland Saved Its Teens (23:30)\nHow long did it take Iceland to develop its survey for youth? Did they know what they were looking for when they started?\nFeature Selection\nIf we want to use predictive analytics for a problem, we need to identify the data that is best suited for predicting the outcome. At the beginning of most projects, however, we rarely know which factors will be the biggest drivers of outcomes.\nFor example, which school characteristics best predict student performance? Is it the facilities and technology? The level of funding? Classroom sizes? Training and support provided to teachers? Parent involvement? Peer networks? All of these are plausible drivers of student performance – the most important factors are rarely self-evident in advance of having data to test them all.\n“Feature selection” is data science speak for generating a set of hypotheses and measures about what generates the outcome of interest. In many cases, feature selection is an iterative process of generating hypotheses then determining how to find or collect data to test them.\nFeature selection requires critical thinking and creativity more than technical expertise, but is a core component of any successful data science project.\nLab Takeaway: Most data architects and engineers will not be domain or subject matter experts, so they are not always good at identifying useful features. The best approach is often to assemble people close to the problem, brainstorm a large list of features, collect test data, and see what is working before you encode your data collection process for a large project or organizational need. More learning occurs during this phase of the project than any other.\nPART 1: Predicting Divorce in 3 Minutes\nThe following excerpt from Malcolm Gladwell’s book Blink describes the work of John Gottman, one of the world’s foremost experts on predictors of divorce in marriage. Unlike many scholars that study marriage and divorce in the academic field of counseling, Gottman did not approach the problem using traditional psychological theories and counseling tools. Instead he brought a data-driven approach to the subject and meticulously developed models to predict whether relationships would last or would end in divorce.\nSince the 1980s, Gottman has brought more than three thousand married couples—just like Bill and Sue— into that small room in his “love lab” near the University of Washington campus. Each couple has been videotaped, and the results have been analyzed according to something Gottman dubbed SPAFF (for specific affect), a coding system that has twenty separate categories corresponding to every conceivable emotion that a married couple might express during a conversation. Disgust, for example, is 1, contempt is 2, anger is 7, defensiveness is 10, whining is 11, sadness is 12, stonewalling is 13, neutral is 14, and so on.\nGottman has taught his staff how to read every emotional nuance in people’s facial expressions and how to interpret seemingly ambiguous bits of dialogue. When they watch a marriage videotape, they assign a SPAFF code to every second of the couple’s interaction, so that a fifteen-minute conflict discussion ends up being translated into a row of eighteen hundred numbers—nine hundred for the husband and nine hundred for the wife. The notation “7, 7, 14, 10, 11, 11,” for instance, means that in one six-second stretch, one member of the couple was briefly angry, then neutral, had a moment of defensiveness, and then began whining. Then the data from the electrodes and sensors is factored in, so that the coders know, for example, when the husband’s or the wife’s heart was pounding or when his or her temperature was rising or when either of them was jiggling in his or her seat, and all of that information is fed into a complex equation.\nOn the basis of those calculations, Gottman has proven something remarkable. If he analyzes an hour of a husband and wife talking, he can predict with 95 percent accuracy whether that couple will still be married fifteen years later. If he watches a couple for fifteen minutes, his success rate is around 90 percent.\nRecently, a professor who works with Gottman named Sybil Carrère, who was playing around with some of the videotapes, trying to design a new study, discovered that if they looked at only three minutes of a couple talking, they could still predict with fairly impressive accuracy who was going to get divorced and who was going to make it. The truth of a marriage can be understood in a much shorter time than anyone ever imagined. Gladwell, M. (2006)\nSome key findings from Gottman’s years of research can be summed up by his description of the “Four Horseman of the Apocolypse” - the signs that a relationship is in danger. In this interview he describes the signs and gives examples and demonstrates using video clips of couples:  https://youtu.be/625t8Rr9o6o\nRecall he began with a code book of twenty distinctive emotions that can be conveyed during a conversation. How difficult would it be for you to accurately differentiate between all 20 emotions using only the video clip? How did Gottman develop this system?\nA math major at MIT before he switched to psychology, Gottman developed a coding system that not only tracked the content of speech but the emotional messages that spouses send with minute changes in expressions, vocal tone, and body language. Using facial recognition systems, Gottman’s code accounts for the fact that, for instance, in “coy, playful, or flirtatious interactions,” the lips are often turned down. “It looks like the person is working hard not to smile,” he writes. Conversely, “many ‘smiles’ involve upturned corners of the mouth but are often indices of negative affect.” [ Dissecting Gottman’s Love Lab: Slate Magazine ]\nNote that in this case, “using facial recognition systems” does not refer to computer algorithms, rather just training grad students to watch hours and hours of interviews!\nGottman’s contribution was figuring out how to systemetize data collection about marital conflict. He may have ended up with 20 emotional constructs that were coded in all of the studies, but he no doubt started with dozens more ideas that were intractable to operationalize or not predictive of the outcomes. The list was eventually narrowed to 20, and time was spent on improving the coding protocols so data could be collected consistently.\nTo use the language of data science, Gottman went through a process of feature selection - identifying a set of meaningful variables that have the potential to predict the outcome of interest and looking for which are most useful. Loosely speaking, the more highly correlated a feature (variable) is with the outcome, the better a predictor it will be.\nPart 1 Questions:\n\nGottman’s lab records 15-minute videos of each couple, which sounds like a small amount of data relative to some of the other case studies we have examined this semester. How many data points are generated from those 15 minutes of footage, though? Stated differently, how many observations do the lab scientists record in each 15-minute interview?\nWhat is the measured outcome in the study described by Gladwell? How would that data be collected? And consequently how long did these studies take? Note that in machine learning jargon this would be called the “training dataset” since it includes outcomes that are used to train computer models which features are useful to accurately predict the outcome. The calibrated models can then be used to predict outcomes using data that does not include results.\nDo you think that a marriage counselor working with couples for 30 years would be able to accurately predict those that will get divorced after a 15-minute session 95 percent of the time, relying on intuition from practice alone? What was unique about Gottman’s approach that allowed him to achieve that kind of accuracy?\n\nPart 2: Predicting Home Values\nThe hard part of feature selection is that it’s always fairly easy to generate a large list of candidate variables, and often the only way to know which actually work is to test them all. It is typically hard to predict which variables might be more predictive before collecting data and testing them out.\nConsider the project to reduce harmful levels of binge drinking by youth in Iceland. To identify some key causes of binge drinking they developed literally hundreds of theories, and tested as many as they could. Some explanations were still unexpected:\nThe team has analyzed 99,000 questionnaires from places as far afield as the Faroe Islands, Malta and Romania—as well as South Korea and, very recently, Nairobi and Guinea-Bissau. Broadly, the results show that when it comes to teen substance use, the same protective and risk factors identified in Iceland apply everywhere. There are some differences: in one location (in a country “on the Baltic Sea”), participation in organized sport actually emerged as a risk factor. Further investigation revealed that this was because young ex-military men who were keen on muscle-building drugs, drinking and smoking were running the clubs. Here, then, was a well-defined, immediate, local problem that could be addressed.\nData scientists have grown very skilled at using data to predict home values before houses are listed for sale. Zillow’s median national error rate is under 4%, for example, meaning that more than half of their predictions about home values are within 4% of true selling prices. They are becoming so accurate that Zillow is experimenting with a new service of buying homes based upon their estimates and re-selling them on their platform without realtors ever being involved in order to bypass the painful process of spending 6 months in a house that is for sale.\nHow does Zillow do this? Which variables or features are the best predictors of home values? The variables (“features”) that Zillow uses in their model are reported below. Can you guess the three factors that are most predictive of home value just by reading the list?\nFeature Description\nLabel Description\n\n\n\n\n\n\n\nLabel\nDescription\n\n\n\n\nairconditioningtypeid\nType of cooling system present in the home (if any)\n\n\narchitecturalstyletypeid\nArchitectural style of the home (i.e. ranch, colonial, split-level, etc…)\n\n\nbasementsqft\nFinished living area below or partially below ground level\n\n\nbathroomcnt\nNumber of bathrooms in home including fractional bathrooms\n\n\nbedroomcnt\nNumber of bedrooms in home\n\n\nbuildingqualitytypeid\nOverall assessment of condition of the building from best (lowest) to worst (highest)\n\n\nbuildingclasstypeid\nThe building framing type (steel frame, wood frame, concrete/brick)\n\n\ncalculatedbathnb\nNumber of bathrooms in home including fractional bathroom\n\n\ndecktypeid\nType of deck (if any) present on parcel\n\n\nthreequarterbathnbr\nNumber of 3/4 bathrooms in house (shower + sink + toilet)\n\n\nfinishedfloor1squarefeet\nSize of the finished living area on the first (entry) floor of the home\n\n\ncalculatedfinishedsquarefeet\nCalculated total finished living area of the home\n\n\nfinishedsquarefeet6\nBase unfinished and finished area\n\n\nfinishedsquarefeet12\nFinished living area\n\n\nfinishedsquarefeet13\nPerimeter living area\n\n\nfinishedsquarefeet15\nTotal area\n\n\nfinishedsquarefeet50\nSize of the finished living area on the first (entry) floor of the home\n\n\nfips\nFederal Information Processing Standard code - see https://en.wikipedia.org/wiki/FIPS_county_code for more details\n\n\nfireplacecnt\nNumber of fireplaces in a home (if any)\n\n\nfireplaceflag\nIs a fireplace present in this home\n\n\nfullbathcnt\nNumber of full bathrooms (sink, shower + bathtub, and toilet) present in home\n\n\ngaragecarcnt\nTotal number of garages on the lot including an attached garage\n\n\ngaragetotalsqft\nTotal number of sqft of all garages on lot including an attached garage\n\n\nhashottuborspa\nDoes the home have a hot tub or spa\n\n\nheatingorsystemtypeid\nType of home heating system latitude Latitude of the middle of the parcel multiplied by 10e6\n\n\nlongitude\nLongitude of the middle of the parcel multiplied by 10e6\n\n\nlotsizesquarefeet\nArea of the lot in square feet\n\n\nnumberofstories\nNumber of stories or levels the home has\n\n\nparcelid\nUnique identifier for parcels (lots)\n\n\npoolcnt\nNumber of pools on the lot (if any)\n\n\npoolsizesum\nTotal square footage of all pools on property\n\n\npooltypeid10\nSpa or Hot Tub\n\n\npooltypeid2\nPool with Spa/Hot Tub\n\n\npooltypeid7\nPool without hot tub\n\n\npropertycountylandusecode\nCounty land use code i.e. it’s zoning at the county level\n\n\npropertylandusetypeid\nType of land use the property is zoned for\n\n\npropertyzoningdesc\nDescription of the allowed land uses (zoning) for that property\n\n\nrawcensustractandblock\nCensus tract and block ID combined - also contains blockgroup assignment by extension\n\n\ncensustractandblock\nCensus tract and block ID combined - also contains blockgroup assignment by extension\n\n\nregionidcounty\nCounty in which the property is located\n\n\nregionidcity\nCity in which the property is located (if any)\n\n\nregionidzip\nZip code in which the property is located\n\n\nregionidneighborhood\nNeighborhood in which the property is located\n\n\nroomcnt\nTotal number of rooms in the principal residence\n\n\nstorytypeid\nType of floors in a multi-story house (i.e. basement and main level, split-level, attic, etc.). See tab for details.\n\n\ntypeconstructiontypeid\nWhat type of construction material was used to construct the home\n\n\nunitcnt\nNumber of units of structure (i.e. 2 = duplex, 3 = triplex, etc…)\n\n\nyardbuildingsqft17\nPatio in yard\n\n\nyardbuildingsqft26\nStorage shed/building in yard\n\n\nyearbuilt\nThe Year the principal residence was built\n\n\ntaxvaluedollarcnt\nThe total tax assessed value of the parcel\n\n\nstructuretaxvaluedollarcnt\nThe assessed value of the built structure on the parcel\n\n\nlandtaxvaluedollarcnt\nThe assessed value of the land area of the parcel\n\n\ntaxamount\nThe total property tax assessed for that assessment year\n\n\nassessmentyear\nThe year of the property tax assessment\n\n\ntaxdelinquencyflag\nProperty taxes for this parcel are past due as of 2015\n\n\ntaxdelinquencyyear\nYear for which the unpaid property taxes were due\n\n\n\nZillow tracks an impressive amount of data on homes, but this dataset is far from exhaustive. If we wanted to improve their models by adding new data, which features of homes and neighborhoods would you propose?\nPart 2 Questions:\nIn order to demonstrate what a feature selection exercise might look like, part 2 of this lab asks you to come up with one variable that predicts home value that is not included in the Zillow dataset.\nThe obvious features are already in the data - square footage, number of bedrooms, whether there is a garage and a pool, etc. You need to be a little creative to come up with another feature.\nNote that features in this case might be characteristics of houses themselves, but they also might be characteristics of neighborhoods or cities. These broader characteristics of the community, positive or negative, are baked into the selling price of the home.\nYou only need to think up one other variable that impacts home values. There are hundreds more. In order to verify whether your hunch was correct, you need to identify an academic article or peer-reviewed report that supports your claim. I suggest using Google Scholar, but many search engines work fine.\nFor example, I might hypothesize that high-end restaurants or coffee shops increase the value of homes. After a little searching I found this study: Measuring Gentrification: Using Yelp Data to Quantify Neighborhood Change. It finds that “changes in the local business landscape is a leading indicator of housing price changes. Each additional Starbucks that enters a zip code is associated with a 0.5% increase in housing prices.”\nAfter identifying the feature and an academic source to support its selection, write a paragraph about your predictor and how you think it will impact home values. Include a citation to the article that supports your claim. Combine your response with your answers to part 1 and submit them to iCollege to the Lab2 assignment folder by the due date.", - "crumbs": [ - "Lab 2" - ] - }, - { - "objectID": "assignments/lab3.html", - "href": "assignments/lab3.html", - "title": "Lab 3", - "section": "", -<<<<<<< Updated upstream - "text": "Add Lab 3 assignments here.", -======= - "text": "Overview\nIn the previous lab we examined the practice of feature selection - identifying the proper set of variables to use as inputs into a predictive model. Gottman developed a framework using 20 micro-expressions coded at one-second intervals. Iceland developed a set of variables that help predict whether teens are likely to abuse alcohol.\nA key pattern that emerges from the case studies and a take-away from Lab 2 is that identifying the proper data to include in a model is difficult. Typically, experts start with a large number of measures and eliminate variables as they are tested. It requires patience and attention to detail to identify the right set of variables for a given problem.\nThis week we will explore the process of “feature engineering,” the task of starting from raw data sources and “engineering” new variables by finding ways to code or extract new features. Feature engineering is common when using data like satellite imagery, document archives, or video. It is common that the data was not designed for the task at hand, but it contains rich information that is extremely valuable if it can be extracted.\nFeature engineering is also necessary because predictive models typically require quantitative variables, so data sources like images and text need to be transformed into counts and scales. “Engineering” can be simple or complex. A simple example is computing a new variable, BMI, from two existing ones (height and weight). More complex examples of what this process might look like for unstructured data are below.\nFor example, how can a computer read handwriting? It has to be able to translate easily from an image of hand-written text to some sort of mathematical abstraction of the sentences.\n\n\n\n\n\nIt accomplishes this by isolating words, then isolating individual characters, then using a pixel grid to code which features exist on each character (vertical lines, curved lines, cross-bars, etc.), and making a prediction about which letter might be based upon specific features.\n\n\n\n\n\n(Note, the elements above are concepts from typography that describe fonts, not a list of features from natural language processing, but you get the idea)\nWithout a set of features extracted from the raw image of a character, the computer can’t predict which letter it might be. In this way Lab 4 on Feature Engineering combines elements of Lab 2 on Measurement, and Lab 3 on Feature Selection. You “engineer” or “extract” a feature by first defining it (does the character possess a “bowl”), and then figuring out how the computer will observe or measure that feature.\nThe goal of this lab is to demonstrate a couple of processes of feature engineering from common machine learning and artificial intelligence applications. I would like you to develop an intuitive sense of how engineers approach the creative endeavor of turning raw data into meaningful quantitative measures. Some steps below show how raw data might be processed to make it interpretable (to code characters from text, you first need to isolate individual characters). Others steps show how raw data is translated into a quantitative variable that can be used as an input into a model.\nAs you read through the lab, think about the readings from Social Physics. How did computer scientists approach the task of understanding team performance in organizations? What sorts of data did they collect, and how did they generate quantitative measures from the raw data inputs of environmental sensors and employee smart badges? There are thousands of potential variables you could generate from human interactions in organizations, so how did they decide which measures were important? These themes will be revisited when we look at how Google tried to design the perfect team.\nWe start here with some basic examples of feature engineering in a few task domains.\nOptical Character Recognition\nThe process by which computer programs read handwriting or scan images of text is quite interesting because of how raw image files are converted into structured data. The process is broadly called Optical Character Recognition. short video\nFor these tasks the program must first take an image and identify the different units of text:\n\n\n\n\n\nThese can then be broken into the most basic component parts, characters. The actual predictive models that comprise programs reading text occur on a letter by letter basis, and the paragraphs are reconstructed after all letters are identified:\n\n\n\n\n\nFor these programs to work the computer must be able to effectively isolate each character:\n\n\n\n\n\nImages are a challenging data source because camera resolution, light sources, distance from the subject, and focus can all impact data quality. As a result, the first step in many applications that require computers to extract features from images is to process the image in a way that isolates the important information and standardizes some of the inputs.\nLet’s consider this basic program that allows you to take a picture of a graph, then will generate the underlying data for you. It does this by identifying the trend line, converting it to a pixel grid, then for each horizontal pixel measuring the vertical location of the trend line.\nInput image data can be messy, though. So first we need to standardize it to isolate the trend line.\nRaw image of a graph:\n\n\n\n\n\nStep 1: Convert the colored picture to a grayscale version that emphasizes boundaries of graphics.\n\n\n\n\n\nStep 2: Filter out any data that is below a threshold for opacity or darkness.\n\n\n\n\n\nStep 3: Convert to a negative view to maximize the contrast of the image.\n\n\n\n\n\nThe use of filters in this way is a common first step for models that use images as raw data sources. This will be true for self-driving vehicles that use cameras to collect data streams, or examples from the Digital Humanitarians text that use satellite images to predict the location of damage from a disaster. Next we will look at how these techniques are used to conduct a tree census.\nFor an interesting urban policy application of this approach, thanks to an program built by Microsoft you can now download a database of every single building in the United States that was built by an AI application that was taught to read satellite images.\nDoes anyone recognize this community?\n Trees\nTrees are important for cities, but maintaining a robust urban forest is expensive and challenging. “Trees clean the air and water, reduce stormwater floods, improve building energy use, and mitigate climate change, among other things. For every dollar invested in planting, cities see an average $2.25 return on their investment each year.” cite\nIf we want to decide where to plant trees in our city by bringing a data-driven approach to tree policy, we first need to measure the outcome. “How many trees are in your city? It might seem like a straightforward question, but finding the answer can be a monumental task. New York City’s 2015-2016 tree census, for example, took nearly two years (12,000 hours total) and more than 2,200 volunteers.” cite\nUsing high-resolution images that can capture a wide spectrum of light, data scientists have designed ways to use public data sources and AI to perform this task. Here is a high-resolution image of Washington DC that has eliminated all of the light except that reflecting off of green objects, i.e. plant life in the city (as opposed to buildings and parking lots):\n\n\n\n\n\nEasy enough, right? But wait! Green patches might be grass, shrubs, or flowers. How does the program know NOT to count these green patches as trees?\n\n\n\n\n\nIt turns out that trees reflect green light differently than other plants. If you apply the right filters you can further isolate the trees in the image from the rest of the plants. For example, watch the grass in the park disappear:\n\n\n\n\n\nOr eliminate all of the open green spaces in Boston (the grass at the airport is especially vivid):\n\n\n\n\n\nThese techniques allow us to isolate trees from everything else. But we now have another problem - a green island rarely contains a single tree. How can we isolate individual trees from a group of trees in a cluster?\nThe nice tutorial on tree canopy analysis offers some solutions.\nThe basic recipe for identifying trees within a cluster is to:\n1. Apply a Filter to isolate trees from other elements on the landscape.\n\n\n\n\n\n2. Detect Treetops using an algorithm that can predict the height of each pixel in the image.\n\n\n\n\n\nThis step returns the geographic coordinates of each tree-top in the cluster so that you can see the tree through the forest.\nNote that Lidar uses lasers to enhance digital photographic techniques by including rich measures of light frequency and the ability to triangulate height. Lidar is an expensive technology, but many cities have a database of Lidar images that are open for public use.\n3. Model Canopies by using a geometric tesselation algorithm to predict canopy boundaries and create a new spatial file that contains both tree coordinates, heights, and canopy sizes.\n\n\n\n\n\nThis information would be useful for tracking the number of trees as well as changes in canopies over time.\nFaces\nFacial recognition software has become accurate and ubiquitous. What features does a computer need in order to recognize a person? How are these features engineered from a two-dimensional image of a face?\nFeature Extraction\nSimilar to the programs that read text in images above by identifying text then isolating characters, facial recognition software starts by scanning an image to look for faces. If it finds one it then has a routine to frame the face, then isolate prominent features on the face:\n\n\n\n\n\nSimilar to the hand-writing recognition example, we can apply filters to an image to accentuate specific features. With letters on paper we are trying to maximize the contrast between the ink and the page. With faces, different filters applied to images (or image processing algorithms) will highlight specific facial features.\n\n\n\n\n\nOnce oriented to the face, an algorithm can identify facial landmarks (which is actually similar to how our own brains recognize faces):\n\n\n\n\n\nThe computer translates the landmark view of a small set of prominent features of a face to a grid model that measures the distance between each feature:\n\n\n\n\n\nVoila! We now have a format that can be used to generate quantitative variables describing a face. Each line represents a distance between facial landmarks. The length of each line becomes a distinct feature that can be measured and quantified.\nYou don’t always know the distance from the camera to the face, so you might not be able to predict the actual size of specific features (is my nose big or was the camera just way too close?), but it is easy enough to calculate relative sizes. If you set the distance between the eyes as a measure of one unit, for example, then every other distance on this graph (each line) can be calculated relative to that distance. Thus, you are not identifying individuals based upon the actual size of their noses, but by the relative distances between all features on their face.\nThere are different ways to accomplish this basic process of creating abstract mathematical models of the face. They all return a list of quantitative features that describe an individual face.\nAnd finally, we compare the measurements from a specific face against measurements in a large database of candidate faces. You can do this quickly because you are working with a few dozen measures (distance between eyes, distance between edges of the mouth, distance between edge of mouth to eye, etc.). You would calculate the difference between the face you are trying to identify, and each face in the database by comparing the length of each line.\n\n\n\n\n\nIf the total distance between all of the features falls below a threshold, then the faces are flagged as a match to be examined further by a human, or some action is triggered (unlocking your phone or your front door, for example).\nFeature Selection\nAssuming that the photos are taken in good light with forward-facing subjects and decent resolution cameras, what do you anticipate being a challenge with this process? Facial feature landmarks might vary greatly based upon expressions (or angles of the camera):\n\n\n\n\n\nNote that some features, like distance between the eyes and size of the nose, will be static (i.e., reliable measures). Others, like the edges of the mouth or the size of lips, will be highly dependent upon the expression (i.e., low alpha if we think about facial features as latent constructs).\nStated another way, some features convey more information about the unique identity of an individual than others. The feature selection task is to identify which measures will contain the highest signal-to-noise ratios. The algorithms that match faces can also weight certain features more than others to account for expressions.\nThis paper explores the issue by examining which facial features contain the most information during the recognition task. They do this by examining which features, when changed, render the individual most unrecognizable.\n\n\n\n\n\nStated differently, if you omitted the high-salience features from your model like lip thickness, you would see a large drop in correct matches to the database. If you omit low-salience features like mouth size, you experience a lower impact on the accuracy of facial recognition.\nThis example is meant to give you an intuitive sense of what the computer algorithm might experience as features are “corrupted” in the images and remind you of the importance of the feature selection task.\nSo, if you are trying to escape the country by crossing a border, which feature would you try to disguise?\nLAB QUESTIONS\nPart 1: Letters\nAssume you have designed a program that can effectively isolate individual characters from an image of a license plate:\n\n\n\n\n\nYou now want to develop a machine learning model that will accurately identify each character, so you need to develop a set of features that describe the letters so the computer can begin to tell them apart.\nFor the lab, list three features of characters that could be reliably derived (“engineered”) from an image of a single letter and used as input data for a predictive model that can read a license plate accurately. Note that fonts used by states might vary, so letters and digits will not be identical.\n\n\n\n\n\n\n\n\n\n\nIf you think this sounds challenging, recognize that there are hundreds of potential features (characteristics of the letter). Just look at how many terms hipsters have invented to lovingly diagram their favorite typefaces:\n\n\n\n\n\nOr spend five minutes with a hand-writing analyst.\nIt may be helpful for you to start by describing the difference between these letters to a child.\n\nYou might start by pondering how someone would distinguish the difference between a lowercase “o” and a lowercase “a”.\nHow would you describe the difference between a “1” and an “l”?\nBetween a zero “0” and an upper-case “O”?\nCapital “T” versus lower-case “t”?\n\nPart 2: Using Uniforms to Predict Job Title\nThe blog Toward Data Science describes an interesting machine learning application that predicts which category that an object or person belongs to based upon an image. They demonstrate the software by training it to guess people’s career based upon a picture of them at work:\nFor this tutorial, we have provided a dataset called IdenProf. IdenProf (Identifiable Professionals) is a dataset that contains 11,000 pictures of 10 different professionals that humans can see and recognize their jobs by their mode of dressing.\n\n\n\n\n\nThere are ten professions used in the example:\n\nChef\nDoctor\nEngineer\nFarmer\nFirefighter\nJudge\nMechanic\nPilot\nPolice\nWaiter\n\nThis dataset is split into 9000 pictures (900 pictures for each profession) to train the artificial intelligence model and 2000 pictures (200 pictures for each profession) to test the performance of the artificial intelligence model as it is training. IdenProf has been properly arranged and made ready for training your artificial intelligence model to recognize professionals by their mode of dressing. For reference purposes, if you are using your own image dataset, you must collect at least 500 pictures for each object or scene you want your artificial intelligence model to recognize.\nFor the lab, suggest three features of uniforms that might be used to classify images from a profession. Also include a rule statement about how the feature maps the specific profession.\nFor example:\nRule: If the uniform has stripes on the arms, Prediction the individual will be either a fire fighter or a pilot.\nSome more examples:\n\nIf the uniform includes a skirt the individual is not a farmer, mechanic, or chef (on the job).\nIf the uniform is mostly black, the individual will be a judge, pilot, or police officer.\nIf the uniform is mostly white, the individual will be a chef or a doctor.\nIf the uniform includes a bow tie, the individual will likely be a waiter (or an engineer?).\n\n\n\n\n\n\nPart 3: Home Values\nWe have focused so far on feature engineering examples that use images as the input data, then extract features based upon models of letter style, tree structure, or the topology of faces. In these instances, we are translating from one data type (an image) to a traditional set of quantitative variables in a spreadsheet format (columns represents variables or features, and rows represent observations).\nIt is also common to engineer features from an existing dataset, to create new variables from existing variables. For example, population density is a common variable used in urban policy. This variable requires that you have a measure of the population of an administrative unit (number of people in a census tract) and the total geographic area of that unit. Density is then calculated as people per square mile (or whatever unit you use for area).\nPopulation density is often a more useful variable than the raw population because it tells you something about the average distance between individuals in a city. If you are opening a pizza delivery business, for example, the total population of the city does not matter if they are spread out over a large area. You will be more profitable serving a smaller population that is packed into a tight neighborhood than a large suburb which requires long delivery times and high operating costs. Stated differently, population density is a better predictor of the profitability of your new business.\nThe urban policy research group, Urban Spatial, offers a nice example of this in a model they created to predict gentrification of census tracts using historic census data.\nThey describe several features that they engineer for the model. Similar to other community change models we have examined, they measure characteristics of housing in census tracts to predict how each might change over time.\nOne thing they do differently from previous models, however, is use information about home value contagion. When home values rise in an adjacent census tract it increases the likelihood that home values rise in my census tract. This type of relationship is called a spatial correlation. This might occur because people that are looking to buy in a specific neighborhood might be priced out by rising costs in that neighborhood, so they instead purchase a home in an adjacent neighborhood. Higher demand from the spill-over will drive up prices.\nSimilarly, a drop in prices in a neighborhood can have a contagion effect if buyers looking at purchasing in an adjacent neighborhood get nervous about falling prices and look elsewhere, thus reducing the demand and lowering prices, resulting in a self-fulling prophesy in some instances.\nTo model these processes, you need to take into account information about neighboring communities. The paper Urban Spatial explains how they create (engineer) two variables (features) for their model.\nOne variable is created by calculating the average value of homes in all of the surrounding census tracts. Only those census tracts that are contiguous to your own (they share a border) are included. So, the census tract in the top left corner is excluded in the calculation, for example:\n\n\n\n\n\nThe second variable was created after recognizing that the average value of surrounding homes might not be the best predictor of change. Rather, homeowners typically want to move into hot neighborhoods that have nice amenities and cool people. Since everyone wants to move there, though, these neighborhoods quickly become saturated and overpriced, and buyers spill over into nearby neighborhoods.\nThey identified all of the census tracts in the top 5th of the data (the top 20% of tracts based upon home values) and categorize them as highly-desirable tracts. They then calculate the distance from each tract in the dataset to the nearest highly-desirable tract:\n\n\n\n\n\nIf we are predicting home values for a tract in 2010, the value in 2000 will provide a reference point for where home values start but will not tell you much about whether you expect them to be rising or falling. The value of neighboring tracts, however, is a good predictor (if adjacent tracts have more expensive homes, prices are likely to rise, if they have cheaper homes, prices will likely fall). And the distance to the closest “hot” neighborhood in 2000 will provide a different type of information about possible trends. The two variables that explain trends are both second-order variables that are created from the raw census data.\nFor the Lab: recall that in the previous lab you were required to identify a neighborhood or metropolitan feature that impacts home values (for example, every time a Starbucks opens in a new neighborhood home values increase by 0.5%).\nUsing the feature that you identified in Lab 2 (or another you may need to choose), explain how you would “engineer” it by answering the following questions. You are NOT allowed to use a metric that is already an existing census variable and requires no engineering (i.e., calculation).\n1. What is your unit of analysis? Census tract, zip code, city, etc.\nSome variables are better at a very local scale (crimes tend to impact prices on surrounding blocks but not much further), whereas others are only meaningful at a metro or regional scale (i.e., activity of regional airports).\n2. What type of data do you need to calculate the variable?\nWhat measures are included in your formula? For example, if a new Starbucks impacts home values, you would measure the distance to it for each home. You would need a database of Starbucks in your city with their locations.\n3. What is the process or formula you would follow to create the variable?\nWrite out instructions for calculating your metric in a way that a data engineer could implement. For example, to calculate the distance to the nearest Starbucks:\n\nIdentify the location of each home in the dataset.\nIdentify the nearest Starbucks.\nCalculate the distance between the two points.\n\nNote! These instructions are not specific enough because you can easily calculate the Euclidian distance between two points on a map with a formula, but how often do you travel in a straight line to a destination? Maybe a travel distance time derived from Google maps might be a better metric? Or maybe travel time, rather than distance? Are you assuming people are walking, driving, or taking public transit?\n4. How reliable will your measure be?\nUsing your knowledge about instrument development, do you feel like your new variable will accurately reflect the true underlying latent construct you are trying to measure? For example, using the programs described in the lab, I could fairly accurately measure the total tree canopy cover for a specific neighborhood (the average is about 20% coverage in most cities). If I were trying to create a hipster scale to measure how cool a neighborhood is by identifying how many local menus include craft beer, my scale might be less reliable (largely because craft beer is mainstream, not cool enough).\nLooking Ahead\nThe goal behind the design of the last three labs was (1) to encourage you to think about how the data that we use every day and that has a big impact on our lives is created, and (2) to demystify machine learning and artificial intelligence. These exercises were designed to give you insight into the black box of predictive analytics, remote sensors, and artificial intelligence.\nYou do not need to know how to build a car from scratch to be a good driver. Similarly, you do not need to be a data scientist to incorporate these new tools into your future career. Your job as a future analyst or manager, for example, may be to determine whether automation or a predictive model would add value to your organization. If so, you can hire an expert to build the models for you.\nThat said, a little bit of vocabulary will help you write a call for proposals, interview potential firms, and manage the process along the way. The processes of feature selection and measurement can be challenging for an outside expert that doesn’t intimately understand your program. You can add a lot of value by collaborating with the experts during the process to identify latent constructs, select features that are important to the question at hand, and discussing how the data may need to be engineered for analysis and/or modeling.", ->>>>>>> Stashed changes - "crumbs": [ - "Lab 3" - ] - }, - { - "objectID": "assignments/lab4.html", - "href": "assignments/lab4.html", - "title": "Lab 4", + "objectID": "resources/rubrics-written.html", + "href": "resources/rubrics-written.html", + "title": "Written Assignment Rubrics", "section": "", - "text": "Add Lab 4 assignments here.", + "text": "Rubric Name: Grading Rubric_Written Assignment\n\n\n\n\n\n\n\n\n\n\nCriteria\nAdvanced\nProficient\nDeveloping\nInsufficient\n\n\nAssignment Specific Grading Items\n40 points\nAssignment instructions are followed.\nRequired items and formatting are addressed comprehensively and accurately.\n35 points\nAssignment instructions are mostly followed.\nRequired items and formatting are addressed accurately.\n30 points\nAssignment instructions largely are not followed or don’t address many requirements.\n20 points\nAssignment instructions are not followed or attempts are largely inaccurate.\n\n\nCritical Analysis (Understanding of Topic)\n40 points\nContent displays excellent comprehension, application, and/or analysis of the topic and underlying concepts.\nIncludes correct use of terminology.\n35 points\nContent presents basic comprehension, application, and/or analysis of the topic and underlying concepts.\nIncludes correct use of terminology.\n30 points\nContent presents little to no comprehension, application, and/or analysis of the topic and underlying concepts.\nIncludes incorrect use of terminology.\n20 points\nContent presents no comprehension, application, and/or analysis of the topic and underlying concepts.\nIntegrates no relevant evidence to support important points.\n\n\nWriting & Clarity\n20 points\nWriting is clear, well-written and free of grammatical and spelling mistakes. Required citations are included and correct.\n15 points\nWriting is clear, with a few grammatical and spelling mistakes. Required citations are included and correct.\n10 points\nWriting is clear but there are many spelling and grammatical mistakes. Required citations are incorrect or some are missing.\n5 points\nWriting is unclear or diff", "crumbs": [ - "Lab 4" + "Written Assignment Rubrics" ] }, { - "objectID": "assignments/lab5.html", - "href": "assignments/lab5.html", - "title": "Lab 5", - "section": "", - "text": "Add Lab 5 assignments here." - }, - { - "objectID": "assignments/lab6.html", - "href": "assignments/lab6.html", - "title": "Lab 6", + "objectID": "resources/grade-policy.html", + "href": "resources/grade-policy.html", + "title": "Assignment Grade and Policies", "section": "", - "text": "Add Lab 6 assignments here." + "text": "(Based on a Fall semester calendar)\n\n\n\n\nAssignments\n\n\n\n\n\n\nDiscussions (30%)\n\nDiscussions posts involve your critical assessments of course content. As an asynchronous class at the graduate level, my expectations for their quality are high.\n\n\n\nLabs (20%)\n\nThe course includes four labs that ask you to think critically about big data and its characteristics. The labs are largely non-technical.\nYou are expected to complete all four labs by their due dates. Each lab is worth 5% of your final grade.\n\n\n\nData Certification (20%)\n\nAppreciation of big data includes knowing how to use it. Although this isn’t a technical course, you are required to build competency in data tools that fit your interests and need for future work.\nYou will curate a series of five data courses via GSU’s Research Data Services or an online learning platform to earn a data certification. Ideally these courses help you to develop the big data project proposal that is the final assessment for this course.\nYou must complete all five data courses to earn a certification and credit for this assessment.\n\n\n\nBig Data Project Proposal (30%)\n\nThroughout the course you will use the data science process to design a big data project of interest to you. It will start with asking an interesting question and culminate in developing a model using big data to answer the question.\nAs you’ll learn, big data projects are tedious and take time to conduct responsibly. This assessment asks you to design a project, not implement it.\nAs a proposal, you will design the big data project in stages throughout the semester as we cover them in class. Most of the stages will be shared with the class through online discussion posts for peer and instructor feedback.\nYou will complete a proposal of at least five pages, which is worth 30% of your final grade.\n\n\n\n\n\n\n\n\nPolicies\n\n\n\n\n\n\nLate Assignments\n\nNo unexcused late assignments will be accepted without penalty. All deadlines will be posted on iCollege.\nIf you anticipate missing assignment deadline due to a university approved reason, please write me an email ASAP. Excused absences/missed work include legal obligations (jury duty, military orders); religious obligation; death or illness of a student’s immediate family; and illness that is too severe or contagious to attend class/do classwork. All excused absences require written documentation from an authoritative source verifying the absence by date and reason.\n\n\n\nWithdrawal Policy\n\nLast day to withdraw from this class is February 28, 20xx.\nStudents who withdraw after the midpoint of each term will not be eligible for a “W” except in cases of Emergency Withdrawal.\nImportant University dates can be found at https://registrar.gsu.edu/registration/semester-calendars-exam-schedules/\n\n\n\nPolicy on Academic Honesty\n\nReview GSU’s policy about cheating, plagiarism, and the repercussions if you do not operate with academic honesty. Note: Lack of knowledge of this policy is not an acceptable defense to any charge of academic dishonesty.\n\n\n\nStudent Code of Conduct\n\nReview the following at the link to the Student Code of Conduct:\n\nStudent Rights and Obligations\nGeneral Conduct Policies and Procedures\nAcademic Conduct Policies and Procedures\nAdministrative Policies" }, { - "objectID": "discussions/module1.html", - "href": "discussions/module1.html", + "objectID": "project/Project-overview.html", + "href": "project/Project-overview.html", "title": "Module 1 Discussions", "section": "", - "text": "Post\nAddress the following:\n\nDescribe two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data that we can address this week in class?\n\nDue by: 8/29 11:59 pm EST\nDiscuss\nOnline discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.\nDue by: 9/3 11:59 pm EST", - "crumbs": [ - "Module 1 Discussions" - ] - }, - { - "objectID": "discussions/module2.html", - "href": "discussions/module2.html", - "title": "ASDFASDF", - "section": "", - "text": "asdfasdf" - }, - { - "objectID": "discussions/module3.html", - "href": "discussions/module3.html", - "title": "ASDFASDF", - "section": "", - "text": "asdfasdf" - }, - { - "objectID": "discussions/module4.html", - "href": "discussions/module4.html", - "title": "ASDFASDF", - "section": "", - "text": "asdfasdf" - }, - { - "objectID": "discussions/module5.html", - "href": "discussions/module5.html", - "title": "ASDFASDF", - "section": "", - "text": "asdfasdf" - }, - { - "objectID": "discussions/module6.html", - "href": "discussions/module6.html", - "title": "ASDFASDF", - "section": "", - "text": "asdfasdf" - }, - { - "objectID": "modules/module2.html", - "href": "modules/module2.html", - "title": "Module 2", - "section": "", - "text": "asdfasdf" - }, - { - "objectID": "modules/module3.html", - "href": "modules/module3.html", - "title": "ASDFASDF", - "section": "", - "text": "asdfasdf" - }, - { - "objectID": "modules/module4.html", - "href": "modules/module4.html", - "title": "ASDFASDF", - "section": "", - "text": "asdfasdf" - }, - { - "objectID": "modules/module5.html", - "href": "modules/module5.html", - "title": "ASDFASDF", - "section": "", - "text": "asdfasdf" - }, - { - "objectID": "modules/module6.html", - "href": "modules/module6.html", - "title": "ASDFASDF", - "section": "", - "text": "asdfasdf" - }, - { - "objectID": "modules/module-index.html", - "href": "modules/module-index.html", - "title": "Module 1 - Introduction to Big Data", - "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." + "text": "Post\nAddress the following:\n\nDescribe two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data that we can address this week in class?\n\nDue by: 8/29 11:59 pm EST\nDiscuss\nOnline discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.\nDue by: 9/3 11:59 pm EST" }, { - "objectID": "modules/module-index.html#module-overview", - "href": "modules/module-index.html#module-overview", - "title": "Module 1 - Introduction to Big Data", + "objectID": "project/Project-overview.html#m1.2-the-value-of-big-data", + "href": "project/Project-overview.html#m1.2-the-value-of-big-data", + "title": "Module 1 Discussions", "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." - }, - { - "objectID": "modules/module-index.html#assignments", - "href": "modules/module-index.html#assignments", - "title": "Module 1 - Introduction to Big Data", - "section": "Assignments", - "text": "Assignments\n\n\n\nSection & Assignment\nDue Date\n\n\n\n\n1.2 Discussion\n\n\n\n1.3 Data Certification Plan\n\n\n\n1.4 Discussion" - }, - { - "objectID": "modules/module-index.html#the-value-of-big-data", - "href": "modules/module-index.html#the-value-of-big-data", - "title": "Module 1 - Introduction to Big Data", - "section": "1. 2 The Value of Big Data", - "text": "1. 2 The Value of Big Data\nThis module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." + "text": "Post\nAddress the following:\n\nDescribe two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data that we can address this week in class?\n\nDue by: 8/29 11:59 pm EST\nDiscuss\nOnline discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.\nDue by: 9/3 11:59 pm EST" }, { - "objectID": "modules/module-index.html#transforming-information-to-ideas", - "href": "modules/module-index.html#transforming-information-to-ideas", - "title": "Module 1 - Introduction to Big Data", - "section": "Transforming Information to Ideas", - "text": "Transforming Information to Ideas\nThis week we’ll explore the characteristics of big data and its value to the public sector." + "objectID": "project/Project-overview.html#m1.4-the-value-of-big-data", + "href": "project/Project-overview.html#m1.4-the-value-of-big-data", + "title": "Module 1 Discussions", + "section": "M1.4 The Value of Big Data", + "text": "M1.4 The Value of Big Data\nPost\nAddress the following:\n\nDescribe two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data that we can address this week in class?\n\nDue by: 8/29 11:59 pm EST\nDiscuss\nOnline discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.\nDue by: 9/3 11:59 pm EST" }, { - "objectID": "modules/module-index.html#read", - "href": "modules/module-index.html#read", - "title": "Module 1 - Introduction to Big Data", + "objectID": "modules/module5-2.html#read", + "href": "modules/module5-2.html#read", + "title": "Module 5 - Data Privacy & Stewardship", "section": "Read", - "text": "Read\n\nDesouza & Smith, Big Data for Social Innovation, Stanford Social Innovation Review, Summer 2014.\nWorld Bank Group, Big Data in Action for Government, World Bank Governance Practice, 2017.\nMeier, P., Digital Humanitarians, Chapter 1, CRC Press, 2015.\nPentland, A. Social Physics, Chapter 1, Penguin Group, 2015." - }, - { - "objectID": "modules/module-index.html#prepare", - "href": "modules/module-index.html#prepare", - "title": "Module 1 - Introduction to Big Data", - "section": "Prepare", - "text": "Prepare\nPlease prepare to discuss your thoughts on the following prompts:\n\nDescribe two characteristics of big data and how they apply to data produced or are used by your field of interest.\nDescribe ideas or practices in this week’s readings that you had not considered before.\nWhat claims about social physics do you find valid? In what ways would you challenge Pentland’s “promise” of social physics?" - }, - { - "objectID": "modules/module-index.html#attend", - "href": "modules/module-index.html#attend", - "title": "Module 1 - Introduction to Big Data", - "section": "Attend", - "text": "Attend\nIn person discussion, 1/17 at 4:30-5:30pm" - }, - { - "objectID": "modules/module1.html#module-overview", - "href": "modules/module1.html#module-overview", - "title": "Module 1 - Introduction to Big Data", - "section": "Module Overview", - "text": "Module Overview\nThis module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." + "text": "Read", + "crumbs": [ + "Module 5 - Data Privacy & Stewardship", + "5.2 Data Stewardship" + ] }, { - "objectID": "modules/module1.html#assignments", - "href": "modules/module1.html#assignments", - "title": "Module 1 - Introduction to Big Data", - "section": "Assignments", - "text": "Assignments\n\n\n\nAssignments\nDue Date\n\n\n\n\n1.1 Discussion\n1/17\n\n\n1.2 Data Certification Plan\n1/24\n\n\n1.3 Discussion\n1/31", + "objectID": "modules/module5-2.html#post-discussion-1", + "href": "modules/module5-2.html#post-discussion-1", + "title": "Module 5 - Data Privacy & Stewardship", + "section": "Post Discussion 1", + "text": "Post Discussion 1\nAddress the following:\n\nDescribe at least two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data from the readings this week that I can address in my weekly announcement?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in text using APA style. Posts should range between 400-500 words.\nDue by: 8/27 11:59 pm EST", "crumbs": [ - "Module 1 - Introduction to Big Data" + "Module 5 - Data Privacy & Stewardship", + "5.2 Data Stewardship" ] }, { - "objectID": "modules/module1.html#the-value-of-big-data", - "href": "modules/module1.html#the-value-of-big-data", - "title": "Module 1 - Introduction to Big Data", - "section": "1. 2 The Value of Big Data", - "text": "1. 2 The Value of Big Data\nThis module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.", + "objectID": "modules/module4-1.html#read", + "href": "modules/module4-1.html#read", + "title": "4.1 Data Quality", + "section": "Read", + "text": "Read\n\nHillman, Jane, Data quality and AI safety: 4 ways bad data affects AI and how to avoid it. Prolific. 2023. **Note: Prolific is a survey based application that claims to use ethical, careful techniques for selecting its pool of potential respondents to academic surveys. It has an agenda, but this article largely is an good overview of why quality data is important in AI environments.**\nGovernment Data Quality Hub, The Government Data Quality Framework. Data quality principles. December 2020. (Also read one of the case studies at the end of the section.)", "crumbs": [ - "Module 1 - Introduction to Big Data" + "Module 4 - Bias in Big Data", + "4.1 Data Quality" ] }, { - "objectID": "modules/module1.html#transforming-information-to-ideas", - "href": "modules/module1.html#transforming-information-to-ideas", - "title": "Module 1 - Introduction to Big Data", - "section": "Transforming Information to Ideas", - "text": "Transforming Information to Ideas\nThis week we’ll explore the characteristics of big data and its value to the public sector.", + "objectID": "modules/module4-1.html#post", + "href": "modules/module4-1.html#post", + "title": "4.1 Data Quality", + "section": "Post", + "text": "Post\nAddress the following in the 4.1 Data Quality discussion board:\n\nPoor data quality and data practices can lead to downstream bias that impacts individuals and decision-making. What examples from the readings (or your own reading) do you consider most harmful? Overblown?\nGovernments around the world are developing frameworks like the reading from the United Kingdom for improving data integrity and reducing bias. Based on your own experience as a user and/or analyst of data, what elements of the framework do you think are least understood or practiced by researchers/government agencies? How might they improve?\nFind an example of a recent data quality problem in the news that resulted in bias from big data. What was the origin of the bias and did the media address how to prevent it? Describe what you find surprising about the problem. (Make sure to cite the article.)\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words.\nDue by: 10/15 11:59 pm EST", "crumbs": [ - "Module 1 - Introduction to Big Data" + "Module 4 - Bias in Big Data", + "4.1 Data Quality" ] }, { - "objectID": "modules/module1.html#read", - "href": "modules/module1.html#read", - "title": "Module 1 - Introduction to Big Data", + "objectID": "modules/module3-2.html#read", + "href": "modules/module3-2.html#read", + "title": "3.2 Machine Learning & Prediction", "section": "Read", - "text": "Read\n\nDesouza & Smith, Big Data for Social Innovation, Stanford Social Innovation Review, Summer 2014.\nWorld Bank Group, Big Data in Action for Government, World Bank Governance Practice, 2017.\nMeier, P., Digital Humanitarians, Chapter 1, CRC Press, 2015.\nPentland, A. Social Physics, Chapter 1, Penguin Group, 2015.", + "text": "Read\n\nBrown, S. Machine Learning, Explained. MIT Sloan School of Management. 2021.\nMaini, V. Machine Learning for Humans. 2017. *Read Introduction** and dive deeper into types of ML per your interests.\nOffice for Artificial Intelligence (UK.GOV), 2020. A guide to using AI in the public sector. *Read p. 1-20. Pay attention to the examples of uses for ML and consider how your big data project fits with the method(s).*", "crumbs": [ - "Module 1 - Introduction to Big Data" + "Module 3 - Discovery & Insights", + "3.2 Machine Learning & Prediction" ] }, { - "objectID": "modules/module1.html#prepare", - "href": "modules/module1.html#prepare", - "title": "Module 1 - Introduction to Big Data", - "section": "Prepare", - "text": "Prepare\nPlease prepare to discuss your thoughts on the following prompts:\n\nDescribe two characteristics of big data and how they apply to data produced or are used by your field of interest.\nDescribe ideas or practices in this week’s readings that you had not considered before.\nWhat claims about social physics do you find valid? In what ways would you challenge Pentland’s “promise” of social physics?", + "objectID": "modules/module3-2.html#respond", + "href": "modules/module3-2.html#respond", + "title": "3.2 Machine Learning & Prediction", + "section": "Respond", + "text": "Respond\nYou posted an annotated bibliography for your big data project that summarizes 7-10 articles/reports related to your research topic. This exercise prepares you for the literature review section of your proposal and exposes you to data and methods used by other researchers to answer your research question (or a related one.)\nThis week you are assigned to small groups to offer feedback on your peers’ questions and progress on their big data proposals. Read two of your group members’ annotated bibliographies and provide constructive comments on strengths and weaknesses. Please make sure that everyone in your group gets at least one set of comments.\nDue by: 10/5 at 11:59pm EST", "crumbs": [ - "Module 1 - Introduction to Big Data" + "Module 3 - Discovery & Insights", + "3.2 Machine Learning & Prediction" ] }, { - "objectID": "modules/module1.html#attend", - "href": "modules/module1.html#attend", - "title": "Module 1 - Introduction to Big Data", - "section": "Attend", - "text": "Attend\nIn person discussion, 1/17 at 4:30-5:30pm", + "objectID": "modules/module3-2.html#complete", + "href": "modules/module3-2.html#complete", + "title": "3.2 Machine Learning & Prediction", + "section": "Complete", + "text": "Complete\nLab 3 unpacks an essential process of machine learning called feature engineering. Sometimes data as “features” nicely fit the construct that needs to be measured (e.g., height as measured in inches or cm). Other times, the data must be converted to a form that can be useful for prediction. For example, when you’re trying to use digital data to predict images that contain cats, the machine needs to know what features to look for to distinguish a cat from a dog. This lab pulls back the curtain on how these processes are engineered with digital data to give you insights into how machines “learn” to predict X. The lab is largely instructional; however, as you learn about these processes, consider how they influence the output of machine learning and how they can introduce errors.\nDue by: 10/8 at 11:59pm EST", "crumbs": [ - "Module 1 - Introduction to Big Data" + "Module 3 - Discovery & Insights", + "3.2 Machine Learning & Prediction" ] }, { - "objectID": "syllabus.html", - "href": "syllabus.html", - "title": "Course Syllabus", - "section": "", - "text": "View course ![syllabus](syllabus.pdf)" - }, - { - "objectID": "discussions/general.html", - "href": "discussions/general.html", - "title": "General Class Discussion", + "objectID": "modules/module3-0.html", + "href": "modules/module3-0.html", + "title": "Module 3 - Discovery & Insights", "section": "", - "text": "Please use this section for asking course related questions about assignments, reading, lectures etc. It is likely that if you have a question, others do too.", + "text": "This model explores use cases of big data for insights and prediction. We’ll spend the first week learning about open data and its value to the public, followed by two weeks learning about algorithms and machine learning.", "crumbs": [ - "General Class Discussion" + "Module 3 - Discovery & Insights" ] }, { - "objectID": "discussions/general.html#class-q-a", - "href": "discussions/general.html#class-q-a", - "title": "General Class Discussion", + "objectID": "modules/module3-0.html#introduction", + "href": "modules/module3-0.html#introduction", + "title": "Module 3 - Discovery & Insights", "section": "", - "text": "Please use this section for asking course related questions about assignments, reading, lectures etc. It is likely that if you have a question, others do too.", + "text": "This model explores use cases of big data for insights and prediction. We’ll spend the first week learning about open data and its value to the public, followed by two weeks learning about algorithms and machine learning.", "crumbs": [ - "General Class Discussion" + "Module 3 - Discovery & Insights" ] }, { - "objectID": "discussions/general.html#class-introduction-activity", - "href": "discussions/general.html#class-introduction-activity", - "title": "General Class Discussion", - "section": "Class Introduction Activity", - "text": "Class Introduction Activity\nFind a digital image that represents you. Post it here with the following information:\n\nName\nHometown\nDegree Program -Describe why you’re taking this course and your research interests.", + "objectID": "modules/module3-0.html#content", + "href": "modules/module3-0.html#content", + "title": "Module 3 - Discovery & Insights", + "section": "Content", + "text": "Content\n\n\n\n\n\n\n\n\nSection\nAssignment\nDue Date\n\n\n\n\n3.1 Open Data and Discovery\nLab 2\n9/21\n\n\n3.2 Machine Learning & Prediction\nDiscussion Post\n9/24\n\n\n3.2 Machine Learning & Prediction\nDiscussion Peer Response\n10/5\n\n\n3.2 Machine Learning & Prediction\nLab 3\n10/8", "crumbs": [ - "General Class Discussion" + "Module 3 - Discovery & Insights" ] }, { - "objectID": "discussions/module1.html#m1.2-the-value-of-big-data", - "href": "discussions/module1.html#m1.2-the-value-of-big-data", - "title": "Module 1 Discussions", - "section": "", - "text": "Post\nAddress the following:\n\nDescribe two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data that we can address this week in class?\n\nDue by: 8/29 11:59 pm EST\nDiscuss\nOnline discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.\nDue by: 9/3 11:59 pm EST", + "objectID": "modules/module2-1.html#read", + "href": "modules/module2-1.html#read", + "title": "2.1 Social Data", + "section": "Read", + "text": "Read\n\nBeheshti, A. et al. 2022. Social Data Analytics. Chapter 1 Social Data Analytics; Challenges and Opportunities. p 1-17\nDong, X., Morales, A.J., Jahani, E. et al. 2020. Segregated interactions in urban and online space. EPJ Data Sci. 9, 20.\nMcCosker, A., Farmer, J., and Soltani Panah, A. 2020. Community Responses to Family Violence: Charting Policy Outcomes using Novel Data Sources, Text Mining and Topic Modelling. Swinburne University of Technology, Melbourne.", "crumbs": [ - "Module 1 Discussions" + "Module 2 - Types of big data", + "2.1 Social Data" ] }, { - "objectID": "discussions/module1.html#m1.4-the-value-of-big-data", - "href": "discussions/module1.html#m1.4-the-value-of-big-data", - "title": "Module 1 Discussions", - "section": "M1.4 The Value of Big Data", - "text": "M1.4 The Value of Big Data\nPost\nAddress the following:\n\nDescribe two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data that we can address this week in class?\n\nDue by: 8/29 11:59 pm EST\nDiscuss\nOnline discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.\nDue by: 9/3 11:59 pm EST", + "objectID": "modules/module2-1.html#complete", + "href": "modules/module2-1.html#complete", + "title": "2.1 Social Data", + "section": "Complete", + "text": "Complete\nYour first lab assignment is due on 9/21. The labs are non-technical, but they do require you to consider the origins of data, how they perform as measures of individual/social constructs, and how data scientists make choices about data to use in their models and the effects of those choices.\nThis lab introduces you to two data-driven models of neighborhood change. We will use this case study over the semester to discuss things like data needs for predictive models. You will be required to think critically about the data used in the labs, but you will not be responsible for things like the advanced analytical models in the paper. I am approaching the labs with the assumption that you are likely to be new analysts or a manager hiring an analyst, so you just need a high-level understanding of the models in order to participate in the task.\nTo prepare for the lab, you should read these two report (**note the page numbers assigned**):\n\nGoldstein, I. Market Value Analysis: A Data-Based Approach to Understanding Urban Housing Markets in Board of Governors, Federal Reserve Systems, Putting Data to Work: Data-Driven Approaches to Strengthening Neighborhoods. 2011. pp 49-59\n\nDelmelle, E. C. (2017). Differentiating pathways of neighborhood change in 50 US metropolitan areas. Environment and planning A, 49(10), 2402-2424.\n\nRead the instructions for Lab 1 and post your answers to the questions posed in the lab to the 2.2 Lab 1 assignment folder.\nDue by: 9/21 11:59pm EST", "crumbs": [ - "Module 1 Discussions" + "Module 2 - Types of big data", + "2.1 Social Data" ] }, { - "objectID": "resources.html", - "href": "resources.html", - "title": "Resources", - "section": "", - "text": "Add class resources here." - }, - { - "objectID": "assignments/project_proposal.html", - "href": "assignments/project_proposal.html", - "title": "Big Data Project Proposal", - "section": "", - "text": "Add assignment content here." - }, - { - "objectID": "assignments/certification.html", - "href": "assignments/certification.html", - "title": "Data Certification", - "section": "", - "text": "Add assignment content here." - }, - { - "objectID": "resources/rubrics-written.html", - "href": "resources/rubrics-written.html", - "title": "Written Assignment Rubrics", - "section": "", - "text": "Rubric Name: Grading Rubric_Written Assignment\n\n\n\n\n\n\n\n\n\n\nCriteria\nAdvanced\nProficient\nDeveloping\nInsufficient\n\n\nAssignment Specific Grading Items\n40 points\nAssignment instructions are followed.\nRequired items and formatting are addressed comprehensively and accurately.\n35 points\nAssignment instructions are mostly followed.\nRequired items and formatting are addressed accurately.\n30 points\nAssignment instructions largely are not followed or don’t address many requirements.\n20 points\nAssignment instructions are not followed or attempts are largely inaccurate.\n\n\nCritical Analysis (Understanding of Topic)\n40 points\nContent displays excellent comprehension, application, and/or analysis of the topic and underlying concepts.\nIncludes correct use of terminology.\n35 points\nContent presents basic comprehension, application, and/or analysis of the topic and underlying concepts.\nIncludes correct use of terminology.\n30 points\nContent presents little to no comprehension, application, and/or analysis of the topic and underlying concepts.\nIncludes incorrect use of terminology.\n20 points\nContent presents no comprehension, application, and/or analysis of the topic and underlying concepts.\nIntegrates no relevant evidence to support important points.\n\n\nWriting & Clarity\n20 points\nWriting is clear, well-written and free of grammatical and spelling mistakes. Required citations are included and correct.\n15 points\nWriting is clear, with a few grammatical and spelling mistakes. Required citations are included and correct.\n10 points\nWriting is clear but there are many spelling and grammatical mistakes. Required citations are incorrect or some are missing.\n5 points\nWriting is unclear or diff", + "objectID": "modules/module1-3.html#read", + "href": "modules/module1-3.html#read", + "title": "1.3 The Challenges of Big Data", + "section": "Read", + "text": "Read\n\nO’Neil, C. On Being a Data Skeptic. O’Reilly Press. 2014\nNoble, S. Algorithms of Oppression, Introduction. NYU Press. 2018.\nPayton, T. and Claypoole T. Privacy in the Age of Big Data. 2023.**Note: Follow hyperlink, then select “Click to preview” above the image of the book. Read the Introduction and Chapter 1, p. 1-26; Note that the last page is missing from this free source.)", "crumbs": [ - "Written Assignment Rubrics" + "Module 1 - Introduction to Big Data", + "1.3 The Challenges of Big Data" ] }, { - "objectID": "discussions/overview.html", - "href": "discussions/overview.html", - "title": "General Class Discussion", - "section": "", - "text": "Please use this section for asking course related questions about assignments, reading, lectures etc. It is likely that if you have a question, others do too.", + "objectID": "modules/module1-3.html#post", + "href": "modules/module1-3.html#post", + "title": "1.3 The Challenges of Big Data", + "section": "Post", + "text": "Post\nAddress the following in Discussion 2:\n\nUsing all three of this week’s readings, describe the challenges of big data that you feel are urgent to address by the public and why.\nWhat issues do the authors address that you don’t feel are urgent? Why?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words.\nDue by: 9/10 11:59 pm EST", "crumbs": [ - "General Class Discussion" + "Module 1 - Introduction to Big Data", + "1.3 The Challenges of Big Data" ] }, { - "objectID": "discussions/overview.html#class-q-a", - "href": "discussions/overview.html#class-q-a", - "title": "General Class Discussion", - "section": "", - "text": "Please use this section for asking course related questions about assignments, reading, lectures etc. It is likely that if you have a question, others do too.", + "objectID": "modules/module1-3.html#respond", + "href": "modules/module1-3.html#respond", + "title": "1.3 The Challenges of Big Data", + "section": "Respond", + "text": "Respond\nYou are assigned to a small discussion group this week. Read the posts of your peers and critique two group members’ concern/lack of concern about big data. Is the concern outweighed by the benefits of the data/its use? What’s missing from your peers’ argument?\nDue by: 9/14 11:59 pm EST", "crumbs": [ - "General Class Discussion" + "Module 1 - Introduction to Big Data", + "1.3 The Challenges of Big Data" ] }, { - "objectID": "discussions/overview.html#class-introduction-activity", - "href": "discussions/overview.html#class-introduction-activity", - "title": "General Class Discussion", - "section": "Class Introduction Activity", - "text": "Class Introduction Activity\nFind a digital image that represents you. Post it here with the following information:\n\nName\nHometown\nDegree Program -Describe why you’re taking this course and your research interests.", + "objectID": "modules/module1-1.html#read", + "href": "modules/module1-1.html#read", + "title": "1.1 The Value of Big Data", + "section": "Read", + "text": "Read\n\nDesouza & Smith, Big Data for Social Innovation, Stanford Social Innovation Review, Summer 2014.\nU.S. Census Bureau, Big Data, 2022.\nMeier, P., Digital Humanitarians, Chapter 1, CRC Press, 2015.\nPentland, A., Social Physics, Chapter 1, Penguin Group, 2015.", "crumbs": [ - "General Class Discussion" - ] - }, - { - "objectID": "labs/module1.html", - "href": "labs/module1.html", - "title": "Module 1 Discussions", - "section": "", - "text": "Post\nAddress the following:\n\nDescribe two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data that we can address this week in class?\n\nDue by: 8/29 11:59 pm EST\nDiscuss\nOnline discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.\nDue by: 9/3 11:59 pm EST" - }, - { - "objectID": "labs/module1.html#m1.2-the-value-of-big-data", - "href": "labs/module1.html#m1.2-the-value-of-big-data", - "title": "Module 1 Discussions", - "section": "", - "text": "Post\nAddress the following:\n\nDescribe two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data that we can address this week in class?\n\nDue by: 8/29 11:59 pm EST\nDiscuss\nOnline discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.\nDue by: 9/3 11:59 pm EST" - }, - { - "objectID": "labs/module1.html#m1.4-the-value-of-big-data", - "href": "labs/module1.html#m1.4-the-value-of-big-data", - "title": "Module 1 Discussions", - "section": "M1.4 The Value of Big Data", - "text": "M1.4 The Value of Big Data\nPost\nAddress the following:\n\nDescribe two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data that we can address this week in class?\n\nDue by: 8/29 11:59 pm EST\nDiscuss\nOnline discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.\nDue by: 9/3 11:59 pm EST" - }, - { - "objectID": "labs/overview.html", - "href": "labs/overview.html", - "title": "General Class Discussion", - "section": "", - "text": "Please use this section for asking course related questions about assignments, reading, lectures etc. It is likely that if you have a question, others do too." - }, - { - "objectID": "labs/overview.html#class-q-a", - "href": "labs/overview.html#class-q-a", - "title": "General Class Discussion", - "section": "", - "text": "Please use this section for asking course related questions about assignments, reading, lectures etc. It is likely that if you have a question, others do too." - }, - { - "objectID": "labs/overview.html#class-introduction-activity", - "href": "labs/overview.html#class-introduction-activity", - "title": "General Class Discussion", - "section": "Class Introduction Activity", - "text": "Class Introduction Activity\nFind a digital image that represents you. Post it here with the following information:\n\nName\nHometown\nDegree Program -Describe why you’re taking this course and your research interests." - }, - { - "objectID": "modules/module1.html#the-value-of-big-data-1", - "href": "modules/module1.html#the-value-of-big-data-1", - "title": "Module 1 - Introduction to Big Data", - "section": "1. 2 The Value of Big Data", - "text": "1. 2 The Value of Big Data", - "crumbs": [ - "Module 1 - Introduction to Big Data" - ] - }, - { - "objectID": "modules/module1.html#read-1", - "href": "modules/module1.html#read-1", - "title": "Module 1 - Introduction to Big Data", - "section": "Read", - "text": "Read\n\nThe Data Science Process: A Visual Guide to Standard Procedures in Data Science by Chanin Nantasenamat\nHilary: the most poisoned baby name in US history by Hilary Parker", - "crumbs": [ - "Module 1 - Introduction to Big Data" - ] - }, - { - "objectID": "modules/module1.html#watch", - "href": "modules/module1.html#watch", - "title": "Module 1 - Introduction to Big Data", - "section": "Watch", - "text": "Watch\n\nHow I Would Learn Data Science (If I Had to Start Over) by Ken Jee", - "crumbs": [ - "Module 1 - Introduction to Big Data" - ] - }, - { - "objectID": "modules/module1.html#complete", - "href": "modules/module1.html#complete", - "title": "Module 1 - Introduction to Big Data", - "section": "Complete", - "text": "Complete\nThis week you’ll spend most of your time creating a data certification plan. This assignment requires that you identify a research question and develop a plan to complete workshops over the semester that will expose you to the data science knowledge and skills needed to develop a big data project. By the end of the semester, you’ll submit a proposal that describes how you can answer the research question using big data and data science techniques.\nYou will turn in this plan as your “discussion” this week. But don’t wait until class day to start this assignment. Formulating a good research question takes time, and if you’re struggling with how to use big data in a project that interests you then you need to give me time to help you. Reach out by email to set up a time to talk or post questions to the Class Q & A discussion board. \n\nDownload the instructions for the data certification plan.\nIf you need some inspiration, check out this web blog. Or Google “big data and X (your field).” It’s also a good idea to look at some academic journals in your field to see how the techniques of data science are being applied. \nWrite me an email or set up a time to talk if you need help.\nSubmit your data certification plan to the “1.3 Data Certification” assignment folder.\n\nDue by: 11/24 at 11:59pm", - "crumbs": [ - "Module 1 - Introduction to Big Data" - ] - }, - { - "objectID": "project/Project-overview.html", - "href": "project/Project-overview.html", - "title": "Module 1 Discussions", - "section": "", - "text": "Post\nAddress the following:\n\nDescribe two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data that we can address this week in class?\n\nDue by: 8/29 11:59 pm EST\nDiscuss\nOnline discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.\nDue by: 9/3 11:59 pm EST" - }, - { - "objectID": "project/Project-overview.html#m1.2-the-value-of-big-data", - "href": "project/Project-overview.html#m1.2-the-value-of-big-data", - "title": "Module 1 Discussions", - "section": "", - "text": "Post\nAddress the following:\n\nDescribe two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data that we can address this week in class?\n\nDue by: 8/29 11:59 pm EST\nDiscuss\nOnline discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.\nDue by: 9/3 11:59 pm EST" - }, - { - "objectID": "project/Project-overview.html#m1.4-the-value-of-big-data", - "href": "project/Project-overview.html#m1.4-the-value-of-big-data", - "title": "Module 1 Discussions", - "section": "M1.4 The Value of Big Data", - "text": "M1.4 The Value of Big Data\nPost\nAddress the following:\n\nDescribe two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data that we can address this week in class?\n\nDue by: 8/29 11:59 pm EST\nDiscuss\nOnline discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.\nDue by: 9/3 11:59 pm EST" - }, - { - "objectID": "modules/module1.html#section", - "href": "modules/module1.html#section", - "title": "Module 1 - Introduction to Big Data", - "section": "", - "text": "Find Your Inner Data Geek\n\n\n\n\n\n\n\n\n\n\nNow that you know the characteristics of big data and some of their applications to the public sector, it’s time to consider how they serve your field of interest. Whether you plan to be a data scientist or work with a team of them, you need to understand how big data are collected, analyzed, and communicated. This module starts with an overview of data science to learn the steps in the life cycle of a big data project. It ends with you developing a plan to earn a data certification that exposes you to the tools and techniques of stages in the data science life cycle that fit your skill needs.", - "crumbs": [ - "Module 1 - Introduction to Big Data" - ] - }, - { - "objectID": "resources/data-certification-plan.html", - "href": "resources/data-certification-plan.html", - "title": "Data Certification Plan", - "section": "", - "text": "This assignment requires you to identify at least five workshops or other modules on data techniques that support your development as a data user. It’s intended to align with the big data project proposal that you’ll develop during the semester.\nThis plan starts by asking you to identify a research question that you can answer using the data skills that you’ll learn in the certification. Since it’s early in the semester, what you propose to do in this plan may change as you learn more about big data and its applications to your interests. However, I’m requiring that you submit this plan now so that I can learn what topics you’re interested in and how the data certification will enhance your data skills.\nGSU provides multiple opportunities for learning data skills and tools. Some examples include the library’s Research Data Services workshops, LinkedIn Learning, and O’Reilly for Higher Education. You may also use open access courses via Coursera, EdEx, Kaggle, etc. should they meet your needs. Each of the five workshops/modules that you complete must be:\nIdeally, the series of workshops that you pursue offers a badge or certificate that you can link to your resume or other digital profile to showcase your professional qualifications (e.g., personal website, LinkedIn profile, Handshake profile, etc.) If you’re curating workshops/courses across multiple sources or platforms, you likely won’t be able to earn a badge or certificate. Be aware that some platforms offer free access to courses, but earning a badge/certificate comes with a fee.\nYou may not use trainings or modules that you complete for other courses that you’re taking this semester. You are earning unique credit for this course, so the work you conduct for it must be different than what you may be learning in other classes. This exercise is intended to give you time (and credit) to pursue data knowledge or skills gaps that that aren’t covered in other courses. It’s an opportunity to invest in data skills that you think will be important to your future work.\nDue to the variety of data workshops and free data courses, it’s impossible for me to know what will interest you or their formats. If you have a question about the suitability of a workshop or course for this assignment, please contact me well before the deadline of this plan.\nAfter I review and approve your data certification plan, you will be required to submit a short report after each workshop to identify what you learned, how you’ll use the new skills in your project proposal (or elsewhere), and whether you recommend the workshop to other students. You’ll also submit an example of work that you generated in the training." - }, - { - "objectID": "resources/data-certification-plan.html#i.ask-an-interesting-question", - "href": "resources/data-certification-plan.html#i.ask-an-interesting-question", - "title": "Data Certification Plan", - "section": "I.Ask an Interesting Question", - "text": "I.Ask an Interesting Question\nDescribe the question that you hope to answer with big data. Why does it interest you and/or why is it important?\nMost research starts with a large question and narrows to a very specific one that can be answered once you know more about what data is available to answer the question and how it can be analyzed. If you’re undecided at this point in the semester what question you’ll propose to answer, describe ones that cluster around the same topic for this assignment.\nFor example, I’m interested in understanding the relationship between needle aversion and COVID vaccination rates. I think that the language and imagery used to encourage vaccination is having the opposite effect. Nearly every advertisement or news segment about COVID vaccination shows a needle going in an arm (up close!) and uses language like “get the shot” or “jab,” which evokes negative feelings in most people (e.g., like going to the dentist or booster shots as a child). For those who also have fears that the vaccination is risky, these images and words reduce the likelihood that they’ll get a vaccine. I hypothesize that the ubiquity and volume of this imagery on TV and social media (big data) is reinforcing needle aversion to an extent that it’s driving down vaccination rates. So, my research question is “Has intense exposure to imagery and language of needles decreased COVID vaccination rates in the United States?” If a relationship exists, removing needle imagery and language from mass media could improve vaccination compliance and lower the threat of COVID to public health." - }, - { - "objectID": "resources/data-certification-plan.html#ii.-collect-data", - "href": "resources/data-certification-plan.html#ii.-collect-data", - "title": "Data Certification Plan", - "section": "II. Collect Data", - "text": "II. Collect Data\nDescribe the big data that’s available to answer your research question, its source, and its format. Describe how it meets the characteristics of big data, and how you plan to access it." - }, - { - "objectID": "resources/data-certification-plan.html#iii.-data-workshops", - "href": "resources/data-certification-plan.html#iii.-data-workshops", - "title": "Data Certification Plan", - "section": "III. Data Workshops", - "text": "III. Data Workshops\nList the workshops you plan to complete by the end of the semester to build your data knowledge and skills to develop a big data project. For each workshop, describe:\n\nTitle and purpose\nLength/duration\nOrganization providing the training\nLink to the training\nDate(s) you plan to attend/do the training\nWhere it fits in the data science cycle\nHow it will enhance your data knowledge and/or skills" - }, - { - "objectID": "resources/data-certification-plan.html#iv.-certification", - "href": "resources/data-certification-plan.html#iv.-certification", - "title": "Data Certification Plan", - "section": "IV. Certification", - "text": "IV. Certification\nDescribe how will you certify completion of each workshop. (Make sure that the badge(s) or certification(s) you may earn will be available to post to iCollege by December 14.)" - }, - { - "objectID": "about.html#instruction-on-how-to-use-the-course-materiterals", - "href": "about.html#instruction-on-how-to-use-the-course-materiterals", - "title": "About the Big Data for Public Good Open Course", - "section": "Instruction on how to use the course materiterals", - "text": "Instruction on how to use the course materiterals" - }, - { - "objectID": "about.html#contact", - "href": "about.html#contact", - "title": "About the Big Data for Public Good Open Course", - "section": "Contact", - "text": "Contact" - }, - { - "objectID": "resources/about-course-project.html#instruction-on-how-to-use-the-course-materiterals", - "href": "resources/about-course-project.html#instruction-on-how-to-use-the-course-materiterals", - "title": "About the Big Data for Public Good Open Course", - "section": "Instruction on how to use the course materiterals", - "text": "Instruction on how to use the course materiterals" - }, - { - "objectID": "resources/about-course-project.html#contact", - "href": "resources/about-course-project.html#contact", - "title": "About the Big Data for Public Good Open Course", - "section": "Contact", - "text": "Contact" - }, - { - "objectID": "resources/grade-policy.html", - "href": "resources/grade-policy.html", - "title": "Assignment Grade and Policies", - "section": "", - "text": "(Based on a Fall semester calendar)\n\n\n\n\nAssignments\n\n\n\n\n\n\nDiscussions (30%)\n\nDiscussions posts involve your critical assessments of course content. As an asynchronous class at the graduate level, my expectations for their quality are high.\n\n\n\nLabs (20%)\n\nThe course includes four labs that ask you to think critically about big data and its characteristics. The labs are largely non-technical.\nYou are expected to complete all four labs by their due dates. Each lab is worth 5% of your final grade.\n\n\n\nData Certification (20%)\n\nAppreciation of big data includes knowing how to use it. Although this isn’t a technical course, you are required to build competency in data tools that fit your interests and need for future work.\nYou will curate a series of five data courses via GSU’s Research Data Services or an online learning platform to earn a data certification. Ideally these courses help you to develop the big data project proposal that is the final assessment for this course.\nYou must complete all five data courses to earn a certification and credit for this assessment.\n\n\n\nBig Data Project Proposal (30%)\n\nThroughout the course you will use the data science process to design a big data project of interest to you. It will start with asking an interesting question and culminate in developing a model using big data to answer the question.\nAs you’ll learn, big data projects are tedious and take time to conduct responsibly. This assessment asks you to design a project, not implement it.\nAs a proposal, you will design the big data project in stages throughout the semester as we cover them in class. Most of the stages will be shared with the class through online discussion posts for peer and instructor feedback.\nYou will complete a proposal of at least five pages, which is worth 30% of your final grade.\n\n\n\n\n\n\n\n\nPolicies\n\n\n\n\n\n\nLate Assignments\n\nNo unexcused late assignments will be accepted without penalty. All deadlines will be posted on iCollege.\nIf you anticipate missing assignment deadline due to a university approved reason, please write me an email ASAP. Excused absences/missed work include legal obligations (jury duty, military orders); religious obligation; death or illness of a student’s immediate family; and illness that is too severe or contagious to attend class/do classwork. All excused absences require written documentation from an authoritative source verifying the absence by date and reason.\n\n\n\nWithdrawal Policy\n\nLast day to withdraw from this class is February 28, 20xx.\nStudents who withdraw after the midpoint of each term will not be eligible for a “W” except in cases of Emergency Withdrawal.\nImportant University dates can be found at https://registrar.gsu.edu/registration/semester-calendars-exam-schedules/\n\n\n\nPolicy on Academic Honesty\n\nReview GSU’s policy about cheating, plagiarism, and the repercussions if you do not operate with academic honesty. Note: Lack of knowledge of this policy is not an acceptable defense to any charge of academic dishonesty.\n\n\n\nStudent Code of Conduct\n\nReview the following at the link to the Student Code of Conduct:\n\nStudent Rights and Obligations\nGeneral Conduct Policies and Procedures\nAcademic Conduct Policies and Procedures\nAdministrative Policies" - }, - { - "objectID": "assignments/data-certification-plan.html", - "href": "assignments/data-certification-plan.html", - "title": "Data Certification Plan", - "section": "", - "text": "This assignment requires you to identify at least five workshops or other modules on data techniques that support your development as a data user. It’s intended to align with the big data project proposal that you’ll develop during the semester.\nThis plan starts by asking you to identify a research question that you can answer using the data skills that you’ll learn in the certification. Since it’s early in the semester, what you propose to do in this plan may change as you learn more about big data and its applications to your interests. However, I’m requiring that you submit this plan now so that I can learn what topics you’re interested in and how the data certification will enhance your data skills.\nGSU provides multiple opportunities for learning data skills and tools. Some examples include the library’s Research Data Services workshops, LinkedIn Learning, and O’Reilly for Higher Education. You may also use open access courses via Coursera, EdEx, Kaggle, etc. should they meet your needs. Each of the five workshops/modules that you complete must be:\nIdeally, the series of workshops that you pursue offers a badge or certificate that you can link to your resume or other digital profile to showcase your professional qualifications (e.g., personal website, LinkedIn profile, Handshake profile, etc.) If you’re curating workshops/courses across multiple sources or platforms, you likely won’t be able to earn a badge or certificate. Be aware that some platforms offer free access to courses, but earning a badge/certificate comes with a fee.\nYou may not use trainings or modules that you complete for other courses that you’re taking this semester. You are earning unique credit for this course, so the work you conduct for it must be different than what you may be learning in other classes. This exercise is intended to give you time (and credit) to pursue data knowledge or skills gaps that that aren’t covered in other courses. It’s an opportunity to invest in data skills that you think will be important to your future work.\nDue to the variety of data workshops and free data courses, it’s impossible for me to know what will interest you or their formats. If you have a question about the suitability of a workshop or course for this assignment, please contact me well before the deadline of this plan.\nAfter I review and approve your data certification plan, you will be required to submit a short report after each workshop to identify what you learned, how you’ll use the new skills in your project proposal (or elsewhere), and whether you recommend the workshop to other students. You’ll also submit an example of work that you generated in the training." - }, - { - "objectID": "assignments/data-certification-plan.html#i.ask-an-interesting-question", - "href": "assignments/data-certification-plan.html#i.ask-an-interesting-question", - "title": "Data Certification Plan", - "section": "I.Ask an Interesting Question", - "text": "I.Ask an Interesting Question\nDescribe the question that you hope to answer with big data. Why does it interest you and/or why is it important?\nMost research starts with a large question and narrows to a very specific one that can be answered once you know more about what data is available to answer the question and how it can be analyzed. If you’re undecided at this point in the semester what question you’ll propose to answer, describe ones that cluster around the same topic for this assignment.\nFor example, I’m interested in understanding the relationship between needle aversion and COVID vaccination rates. I think that the language and imagery used to encourage vaccination is having the opposite effect. Nearly every advertisement or news segment about COVID vaccination shows a needle going in an arm (up close!) and uses language like “get the shot” or “jab,” which evokes negative feelings in most people (e.g., like going to the dentist or booster shots as a child). For those who also have fears that the vaccination is risky, these images and words reduce the likelihood that they’ll get a vaccine. I hypothesize that the ubiquity and volume of this imagery on TV and social media (big data) is reinforcing needle aversion to an extent that it’s driving down vaccination rates. So, my research question is “Has intense exposure to imagery and language of needles decreased COVID vaccination rates in the United States?” If a relationship exists, removing needle imagery and language from mass media could improve vaccination compliance and lower the threat of COVID to public health." - }, - { - "objectID": "assignments/data-certification-plan.html#ii.-collect-data", - "href": "assignments/data-certification-plan.html#ii.-collect-data", - "title": "Data Certification Plan", - "section": "II. Collect Data", - "text": "II. Collect Data\nDescribe the big data that’s available to answer your research question, its source, and its format. Describe how it meets the characteristics of big data, and how you plan to access it." - }, - { - "objectID": "assignments/data-certification-plan.html#iii.-data-workshops", - "href": "assignments/data-certification-plan.html#iii.-data-workshops", - "title": "Data Certification Plan", - "section": "III. Data Workshops", - "text": "III. Data Workshops\nList the workshops you plan to complete by the end of the semester to build your data knowledge and skills to develop a big data project. For each workshop, describe:\n\nTitle and purpose\nLength/duration\nOrganization providing the training\nLink to the training\nDate(s) you plan to attend/do the training\nWhere it fits in the data science cycle\nHow it will enhance your data knowledge and/or skills" - }, - { - "objectID": "assignments/data-certification-plan.html#iv.-certification", - "href": "assignments/data-certification-plan.html#iv.-certification", - "title": "Data Certification Plan", - "section": "IV. Certification", - "text": "IV. Certification\nDescribe how will you certify completion of each workshop. (Make sure that the badge(s) or certification(s) you may earn will be available to post to iCollege by December 14.)" - }, - { - "objectID": "resources/grade-policy.html#read-and-remember", - "href": "resources/grade-policy.html#read-and-remember", - "title": "Assignment Grade and Policies", - "section": "", - "text": "Discussion (30%)\n\nDiscussions will be in person or online, depending on the week.\nOnline posts are due by 11:59pm on Mondays.\nYour grade is based on completing 15 discussions." - }, - { - "objectID": "resources/grade-policy.html#section", - "href": "resources/grade-policy.html#section", - "title": "Assignment Grade and Policies", - "section": "", - "text": "Discussions (30%)\n\nDiscussions will be in person or online, depending on the week.\nOnline posts are due by 11:59pm on Mondays.\nYour grade is based on completing 15 discussions.\n\n\n\nLabs (20%)\n\nThe course includes four labs that ask you to think critically about big data and its characteristics. The labs are largely non-technical.\nYou are expected to complete all four labs by their due dates. Each lab is worth 5% of your final grade.\n\n\n\nData Certification (20%)\n\nAppreciation of big data includes knowing how to use it. Although this isn’t a technical course, you are required to build competency in data tools that fit your interests and need for future work.\nYou will curate a series of five data courses via GSU’s Research Data Services or an online learning platform to earn a data certification. Ideally these courses help you to develop the big data project proposal that is the final assessment for this course.\nYou must complete all five data courses to earn a certification and credit for this assessment.\n\n\n\nBig Data Project Proposal (30%)\n\nThroughout the course you will use the data science process to design a big data project of interest to you. It will start with asking an interesting question and culminate in developing a model using big data to answer the question.\nAs you’ll learn, big data projects are tedious and take time to conduct responsibly. This assessment asks you to design a project, not implement it.\nAs a proposal, you will design the big data project in stages throughout the semester as we cover them in class. Most of the stages will be shared for instructor feedback prior to the due date of the final proposal.\nYou will complete a proposal and present it to me as the final for the course. The paper and PPT will compose 30% of your final grade.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nDiscussions (30%)\n\nDiscussions will be in person or online, depending on the week.\nOnline posts are due by 11:59pm on Mondays.\nYour grade is based on completing 15 discussions.\n\n\n\nLabs (20%)\n\nThe course includes four labs that ask you to think critically about big data and its characteristics. The labs are largely non-technical.\nYou are expected to complete all four labs by their due dates. Each lab is worth 5% of your final grade.\n\n\n\nData Certification (20%)\n\nAppreciation of big data includes knowing how to use it. Although this isn’t a technical course, you are required to build competency in data tools that fit your interests and need for future work.\nYou will curate a series of five data courses via GSU’s Research Data Services or an online learning platform to earn a data certification. Ideally these courses help you to develop the big data project proposal that is the final assessment for this course.\nYou must complete all five data courses to earn a certification and credit for this assessment.\n\n\n\nBig Data Project Proposal (30%)\n\nThroughout the course you will use the data science process to design a big data project of interest to you. It will start with asking an interesting question and culminate in developing a model using big data to answer the question.\nAs you’ll learn, big data projects are tedious and take time to conduct responsibly. This assessment asks you to design a project, not implement it.\nAs a proposal, you will design the big data project in stages throughout the semester as we cover them in class. Most of the stages will be shared for instructor feedback prior to the due date of the final proposal.\nYou will complete a proposal and present it to me as the final for the course. The paper and PPT will compose 30% of your final grade." - }, - { - "objectID": "resources/grade-policy.html#based-on-a-spring-semester-calendar", - "href": "resources/grade-policy.html#based-on-a-spring-semester-calendar", - "title": "Assignment Grade and Policies", - "section": "", - "text": "Discussions (30%)\n\nDiscussions will be in person or online, depending on the week.\nOnline posts are due by 11:59pm on Mondays.\nYour grade is based on completing 15 discussions.\n\n\n\nLabs (20%)\n\nThe course includes four labs that ask you to think critically about big data and its characteristics. The labs are largely non-technical.\nYou are expected to complete all four labs by their due dates. Each lab is worth 5% of your final grade.\n\n\n\nData Certification (20%)\n\nAppreciation of big data includes knowing how to use it. Although this isn’t a technical course, you are required to build competency in data tools that fit your interests and need for future work.\nYou will curate a series of five data courses via GSU’s Research Data Services or an online learning platform to earn a data certification. Ideally these courses help you to develop the big data project proposal that is the final assessment for this course.\nYou must complete all five data courses to earn a certification and credit for this assessment.\n\n\n\nBig Data Project Proposal (30%)\n\nThroughout the course you will use the data science process to design a big data project of interest to you. It will start with asking an interesting question and culminate in developing a model using big data to answer the question.\nAs you’ll learn, big data projects are tedious and take time to conduct responsibly. This assessment asks you to design a project, not implement it.\nAs a proposal, you will design the big data project in stages throughout the semester as we cover them in class. Most of the stages will be shared for instructor feedback prior to the due date of the final proposal.\nYou will complete a proposal and present it to me as the final for the course. The paper and PPT will compose 30% of your final grade.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nDiscussions (30%)\n\nDiscussions will be in person or online, depending on the week.\nOnline posts are due by 11:59pm on Mondays.\nYour grade is based on completing 15 discussions.\n\n\n\nLabs (20%)\n\nThe course includes four labs that ask you to think critically about big data and its characteristics. The labs are largely non-technical.\nYou are expected to complete all four labs by their due dates. Each lab is worth 5% of your final grade.\n\n\n\nData Certification (20%)\n\nAppreciation of big data includes knowing how to use it. Although this isn’t a technical course, you are required to build competency in data tools that fit your interests and need for future work.\nYou will curate a series of five data courses via GSU’s Research Data Services or an online learning platform to earn a data certification. Ideally these courses help you to develop the big data project proposal that is the final assessment for this course.\nYou must complete all five data courses to earn a certification and credit for this assessment.\n\n\n\nBig Data Project Proposal (30%)\n\nThroughout the course you will use the data science process to design a big data project of interest to you. It will start with asking an interesting question and culminate in developing a model using big data to answer the question.\nAs you’ll learn, big data projects are tedious and take time to conduct responsibly. This assessment asks you to design a project, not implement it.\nAs a proposal, you will design the big data project in stages throughout the semester as we cover them in class. Most of the stages will be shared for instructor feedback prior to the due date of the final proposal.\nYou will complete a proposal and present it to me as the final for the course. The paper and PPT will compose 30% of your final grade." - }, - { - "objectID": "resources/rubrics-discussion.html", - "href": "resources/rubrics-discussion.html", - "title": "Discussion Rubrics", - "section": "", - "text": "Criteria\nAdvanced\nProficient\nDeveloping\nEmerging\n\n\nAssignment Specific Grading Items\n5 points\nAssignment questions and required items are addressed comprehensively, accurately and specifically. Offers strong support for arguments or points.\n4 points\nMost questions and requirements are addressed accurately and specifically. Support for arguments or points may be weak.\n3 points\nMissing answers to questions or doesn’t address many requirements. Support for arguments or points is weak.\n2 points\nDoes not address questions or requirements, or attempts are largely inaccurate.\n\n\nCritical Analysis (Understanding of Topic)\n10 points\nWriting displays an excellent understanding of the required readings and underlying concepts including correct use of terminology. Integrates lectures, readings, or relevant research, or specific real-life application (work experience, prior coursework, etc.) to support important points.\n8 points\nWriting repeats and summarizes basic, correct information, but does not apply enough lectures, readings, or relevant research or specific real-life applications\n6 points\nWriting shows little or no evidence that class lectures and readings were understood. Writings are largely personal opinions or feelings, or “I agree” or “Great idea,” without supporting statements with concepts from the readings, outside resources, relevant research, or specific real-life application.\n4 points\nWriting lacks evidence of critical analysis, and poor use of supportive evidence.\n\n\nWriting & Clarity\n5 points\nAnswers are clear, well-written and free of grammatical and spelling mistakes. No errors in APA style. Scholarly style. Cites all data obtained from other sources. APA citation style is used in both text and/or bibliography.\n4 points\nAnswers are clear, a few small grammatical and spelling mistakes. Rare errors in APA style that do not detract from the paper. Scholarly style. Cites most data obtained from other sources.\n3 points\nAnswers are clear but there are many spelling and grammatical mistakes. Errors in APA style are noticeable. Cites some data obtained from other sources. Citation style is either inconsistent or incorrect.\n2 points\nAnswers are unclear or difficult to understand, and/or there are many spelling and grammatical mistakes. Errors in APA style detracts substantially from the paper. Word choice is informal in tone. Does not cite sources.\n\n\n\nTotal Score of Grading Rubric_Discussion Posts, / 20", - "crumbs": [ - "Discussion Rubrics" - ] - }, - { - "objectID": "modules/module1.html#big-data-and-you", - "href": "modules/module1.html#big-data-and-you", - "title": "Module 1 - Introduction to Big Data", - "section": "1. 3 Big Data and You", - "text": "1. 3 Big Data and You", - "crumbs": [ - "Module 1 - Introduction to Big Data" - ] - }, - { - "objectID": "modules/module1.html#the-challenges-of-big-data", - "href": "modules/module1.html#the-challenges-of-big-data", - "title": "Module 1 - Introduction to Big Data", - "section": "1. 4 The Challenges of Big Data", - "text": "1. 4 The Challenges of Big Data", - "crumbs": [ - "Module 1 - Introduction to Big Data" - ] - }, - { - "objectID": "modules/module1-3.html", - "href": "modules/module1-3.html", - "title": "1.3 The Challenges of Big Data", - "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." - }, - { - "objectID": "modules/module1-3.html#module-overview", - "href": "modules/module1-3.html#module-overview", - "title": "1.3 The Challenges of Big Data", - "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." - }, - { - "objectID": "modules/module1-3.html#assignments", - "href": "modules/module1-3.html#assignments", - "title": "1.3 The Challenges of Big Data", - "section": "Assignments", - "text": "Assignments\n\n\n\nSection & Assignment\nDue Date\n\n\n\n\n1.2 Discussion\n1/17\n\n\n1.3 Data Certification Plan\n1/24\n\n\n1.4 Discussion\n1/31" - }, - { - "objectID": "modules/module1-3.html#the-value-of-big-data", - "href": "modules/module1-3.html#the-value-of-big-data", - "title": "1.3 The Challenges of Big Data", - "section": "1. 2 The Value of Big Data", - "text": "1. 2 The Value of Big Data\nThis module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." - }, - { - "objectID": "modules/module1-3.html#transforming-information-to-ideas", - "href": "modules/module1-3.html#transforming-information-to-ideas", - "title": "1.3 The Challenges of Big Data", - "section": "Transforming Information to Ideas", - "text": "Transforming Information to Ideas\nThis week we’ll explore the characteristics of big data and its value to the public sector." - }, - { - "objectID": "modules/module1-3.html#read", - "href": "modules/module1-3.html#read", - "title": "1.3 The Challenges of Big Data", - "section": "Read", - "text": "Read\n\nO’Neil, C. On Being a Data Skeptic. O’Reilly Press. 2014\nNoble, S. Algorithms of Oppression, Introduction. NYU Press. 2018.\nPayton, T. and Claypoole T. Privacy in the Age of Big Data. 2023.**Note: Follow hyperlink, then select “Click to preview” above the image of the book. Read the Introduction and Chapter 1, p. 1-26; Note that the last page is missing from this free source.)", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.3 The Challenges of Big Data" - ] - }, - { - "objectID": "modules/module1-3.html#prepare", - "href": "modules/module1-3.html#prepare", - "title": "1.3 The Challenges of Big Data", - "section": "Prepare", - "text": "Prepare\nPlease prepare to discuss your thoughts on the following prompts:\n\nDescribe two characteristics of big data and how they apply to data produced or are used by your field of interest.\nDescribe ideas or practices in this week’s readings that you had not considered before.\nWhat claims about social physics do you find valid? In what ways would you challenge Pentland’s “promise” of social physics?" - }, - { - "objectID": "modules/module1-3.html#attend", - "href": "modules/module1-3.html#attend", - "title": "1.3 The Challenges of Big Data", - "section": "Attend", - "text": "Attend\nIn person discussion, 1/17 at 4:30-5:30pm" - }, - { - "objectID": "modules/module1-3.html#big-data-and-you", - "href": "modules/module1-3.html#big-data-and-you", - "title": "1.3 The Challenges of Big Data", - "section": "1. 3 Big Data and You", - "text": "1. 3 Big Data and You" - }, - { - "objectID": "modules/module1-3.html#section", - "href": "modules/module1-3.html#section", - "title": "1.3 The Challenges of Big Data", - "section": "", - "text": "Find Your Inner Data Geek\n\n\n\n\n\n\n\n\n\n\nNow that you know the characteristics of big data and some of their applications to the public sector, it’s time to consider how they serve your field of interest. Whether you plan to be a data scientist or work with a team of them, you need to understand how big data are collected, analyzed, and communicated. This module starts with an overview of data science to learn the steps in the life cycle of a big data project. It ends with you developing a plan to earn a data certification that exposes you to the tools and techniques of stages in the data science life cycle that fit your skill needs." - }, - { - "objectID": "modules/module1-3.html#read-1", - "href": "modules/module1-3.html#read-1", - "title": "1.3 The Challenges of Big Data", - "section": "Read", - "text": "Read\n\nThe Data Science Process: A Visual Guide to Standard Procedures in Data Science by Chanin Nantasenamat\nHilary: the most poisoned baby name in US history by Hilary Parker" - }, - { - "objectID": "modules/module1-3.html#watch", - "href": "modules/module1-3.html#watch", - "title": "1.3 The Challenges of Big Data", - "section": "Watch", - "text": "Watch\n\nHow I Would Learn Data Science (If I Had to Start Over) by Ken Jee" - }, - { - "objectID": "modules/module1-3.html#complete", - "href": "modules/module1-3.html#complete", - "title": "1.3 The Challenges of Big Data", - "section": "Complete", - "text": "Complete\nThis week you’ll spend most of your time creating a data certification plan. This assignment requires that you identify a research question and develop a plan to complete workshops over the semester that will expose you to the data science knowledge and skills needed to develop a big data project. By the end of the semester, you’ll submit a proposal that describes how you can answer the research question using big data and data science techniques.\nYou will turn in this plan as your “discussion” this week. But don’t wait until class day to start this assignment. Formulating a good research question takes time, and if you’re struggling with how to use big data in a project that interests you then you need to give me time to help you. Reach out by email to set up a time to talk or post questions to the Class Q & A discussion board. \n\nDownload the instructions for the data certification plan.\nIf you need some inspiration, check out this web blog. Or Google “big data and X (your field).” It’s also a good idea to look at some academic journals in your field to see how the techniques of data science are being applied. \nWrite me an email or set up a time to talk if you need help.\nSubmit your data certification plan to the “1.3 Data Certification” assignment folder.\n\nDue by: 11/24 at 11:59pm" - }, - { - "objectID": "modules/module1-3.html#the-challenges-of-big-data", - "href": "modules/module1-3.html#the-challenges-of-big-data", - "title": "1.3 The Challenges of Big Data", - "section": "1. 4 The Challenges of Big Data", - "text": "1. 4 The Challenges of Big Data" - }, - { - "objectID": "modules/module1-1.html", - "href": "modules/module1-1.html", - "title": "1.1 The Value of Big Data", - "section": "", - "text": "|:————————————:| | ## Transforming Information to Ideas | |————————————–|", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" - ] - }, - { - "objectID": "modules/module1-1.html#module-overview", - "href": "modules/module1-1.html#module-overview", - "title": "1.1 The Value of Big Data", - "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" - ] - }, - { - "objectID": "modules/module1-1.html#content", - "href": "modules/module1-1.html#content", - "title": "1.1 The Value of Big Data", - "section": "Content", - "text": "Content\n\n\n\nSections\nDue Date\n\n\n\n\n1.1 Discussion\n1/17\n\n\n1.2 Data Certification Plan\n1/24\n\n\n1.3 Discussion\n1/31", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" - ] - }, - { - "objectID": "modules/module1-1.html#assignments", - "href": "modules/module1-1.html#assignments", - "title": "1.1 The Value of Big Data", - "section": "Assignments", - "text": "Assignments\n\n\n\nAssignments\nDue Date\n\n\n\n\n1.1 Discussion\n1/17\n\n\n1.2 Data Certification Plan\n1/24\n\n\n1.3 Discussion\n1/31", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" - ] - }, - { - "objectID": "modules/module1-1.html#the-value-of-big-data", - "href": "modules/module1-1.html#the-value-of-big-data", - "title": "1.1 The Value of Big Data", - "section": "1. 2 The Value of Big Data", - "text": "1. 2 The Value of Big Data\nThis module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" - ] - }, - { - "objectID": "modules/module1-1.html#transforming-information-to-ideas", - "href": "modules/module1-1.html#transforming-information-to-ideas", - "title": "1.1 The Value of Big Data", - "section": "", - "text": "The course starts this week with readings about the characteristics of big data and their value to the public sector. Consider how your field is mining big data to shape public policy and improve service delivery.", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" - ] - }, - { - "objectID": "modules/module1-1.html#read", - "href": "modules/module1-1.html#read", - "title": "1.1 The Value of Big Data", - "section": "Read", - "text": "Read\n\nDesouza & Smith, Big Data for Social Innovation, Stanford Social Innovation Review, Summer 2014.\nU.S. Census Bureau, Big Data, 2022.\nMeier, P., Digital Humanitarians, Chapter 1, CRC Press, 2015.\nPentland, A., Social Physics, Chapter 1, Penguin Group, 2015.", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" - ] - }, - { - "objectID": "modules/module1-1.html#prepare", - "href": "modules/module1-1.html#prepare", - "title": "1.1 The Value of Big Data", - "section": "Prepare", - "text": "Prepare\nPlease prepare to discuss your thoughts on the following prompts:\n\nDescribe two characteristics of big data and how they apply to data produced or are used by your field of interest.\nDescribe ideas or practices in this week’s readings that you had not considered before.\nWhat claims about social physics do you find valid? In what ways would you challenge Pentland’s “promise” of social physics?", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" - ] - }, - { - "objectID": "modules/module1-1.html#attend", - "href": "modules/module1-1.html#attend", - "title": "1.1 The Value of Big Data", - "section": "Attend", - "text": "Attend\nIn person discussion, 1/17 at 4:30-5:30pm", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" - ] - }, - { - "objectID": "modules/module1-1.html#big-data-and-you", - "href": "modules/module1-1.html#big-data-and-you", - "title": "1.1 The Value of Big Data", - "section": "1. 3 Big Data and You", - "text": "1. 3 Big Data and You", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" - ] - }, - { - "objectID": "modules/module1-1.html#section", - "href": "modules/module1-1.html#section", - "title": "1.1 The Value of Big Data", - "section": "", - "text": "Find Your Inner Data Geek\n\n\n\n\n\n\n\n\n\n\nNow that you know the characteristics of big data and some of their applications to the public sector, it’s time to consider how they serve your field of interest. Whether you plan to be a data scientist or work with a team of them, you need to understand how big data are collected, analyzed, and communicated. This module starts with an overview of data science to learn the steps in the life cycle of a big data project. It ends with you developing a plan to earn a data certification that exposes you to the tools and techniques of stages in the data science life cycle that fit your skill needs.", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" - ] - }, - { - "objectID": "modules/module1-1.html#read-1", - "href": "modules/module1-1.html#read-1", - "title": "1.1 The Value of Big Data", - "section": "Read", - "text": "Read\n\nThe Data Science Process: A Visual Guide to Standard Procedures in Data Science by Chanin Nantasenamat\nHilary: the most poisoned baby name in US history by Hilary Parker", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" - ] - }, - { - "objectID": "modules/module1-1.html#watch", - "href": "modules/module1-1.html#watch", - "title": "1.1 The Value of Big Data", - "section": "Watch", - "text": "Watch\n\nHow I Would Learn Data Science (If I Had to Start Over) by Ken Jee", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" - ] - }, - { - "objectID": "modules/module1-1.html#complete", - "href": "modules/module1-1.html#complete", - "title": "1.1 The Value of Big Data", - "section": "Complete", - "text": "Complete\nThis week you’ll spend most of your time creating a data certification plan. This assignment requires that you identify a research question and develop a plan to complete workshops over the semester that will expose you to the data science knowledge and skills needed to develop a big data project. By the end of the semester, you’ll submit a proposal that describes how you can answer the research question using big data and data science techniques.\nYou will turn in this plan as your “discussion” this week. But don’t wait until class day to start this assignment. Formulating a good research question takes time, and if you’re struggling with how to use big data in a project that interests you then you need to give me time to help you. Reach out by email to set up a time to talk or post questions to the Class Q & A discussion board. \n\nDownload the instructions for the data certification plan.\nIf you need some inspiration, check out this web blog. Or Google “big data and X (your field).” It’s also a good idea to look at some academic journals in your field to see how the techniques of data science are being applied. \nWrite me an email or set up a time to talk if you need help.\nSubmit your data certification plan to the “1.3 Data Certification” assignment folder.\n\nDue by: 11/24 at 11:59pm", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" - ] - }, - { - "objectID": "modules/module1-1.html#the-challenges-of-big-data", - "href": "modules/module1-1.html#the-challenges-of-big-data", - "title": "1.1 The Value of Big Data", - "section": "1. 4 The Challenges of Big Data", - "text": "1. 4 The Challenges of Big Data", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" - ] - }, - { - "objectID": "modules/module1-2.html", - "href": "modules/module1-2.html", - "title": "1.2 Big Data and You", - "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.2 Big Data and You" - ] - }, - { - "objectID": "modules/module1-2.html#module-overview", - "href": "modules/module1-2.html#module-overview", - "title": "1.2 Big Data and You", - "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.2 Big Data and You" - ] - }, - { - "objectID": "modules/module1-2.html#assignments", - "href": "modules/module1-2.html#assignments", - "title": "1.2 Big Data and You", - "section": "Assignments", - "text": "Assignments\n\n\n\nSection & Assignment\nDue Date\n\n\n\n\n1.2 Discussion\n1/17\n\n\n1.3 Data Certification Plan\n1/24\n\n\n1.4 Discussion\n1/31", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.2 Big Data and You" - ] - }, - { - "objectID": "modules/module1-2.html#the-value-of-big-data", - "href": "modules/module1-2.html#the-value-of-big-data", - "title": "1.2 Big Data and You", - "section": "1. 2 The Value of Big Data", - "text": "1. 2 The Value of Big Data\nThis module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.2 Big Data and You" - ] - }, - { - "objectID": "modules/module1-2.html#transforming-information-to-ideas", - "href": "modules/module1-2.html#transforming-information-to-ideas", - "title": "1.2 Big Data and You", - "section": "Transforming Information to Ideas", - "text": "Transforming Information to Ideas\nThis week we’ll explore the characteristics of big data and its value to the public sector.", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.2 Big Data and You" - ] - }, - { - "objectID": "modules/module1-2.html#read", - "href": "modules/module1-2.html#read", - "title": "1.2 Big Data and You", - "section": "Read", - "text": "Read\n\nThe Data Science Process: A Visual Guide to Standard Procedures in Data Science by Chanin Nantasenamat\nHilary: the most poisoned baby name in US history by Hilary Parker", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.2 Big Data and You" - ] - }, - { - "objectID": "modules/module1-2.html#prepare", - "href": "modules/module1-2.html#prepare", - "title": "1.2 Big Data and You", - "section": "Prepare", - "text": "Prepare\nPlease prepare to discuss your thoughts on the following prompts:\n\nDescribe two characteristics of big data and how they apply to data produced or are used by your field of interest.\nDescribe ideas or practices in this week’s readings that you had not considered before.\nWhat claims about social physics do you find valid? In what ways would you challenge Pentland’s “promise” of social physics?", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.2 Big Data and You" - ] - }, - { - "objectID": "modules/module1-2.html#attend", - "href": "modules/module1-2.html#attend", - "title": "1.2 Big Data and You", - "section": "Attend", - "text": "Attend\nIn person discussion, 1/17 at 4:30-5:30pm", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.2 Big Data and You" - ] - }, - { - "objectID": "modules/module1-2.html#big-data-and-you", - "href": "modules/module1-2.html#big-data-and-you", - "title": "1.2 Big Data and You", - "section": "1. 3 Big Data and You", - "text": "1. 3 Big Data and You", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.2 Big Data and You" - ] - }, - { - "objectID": "modules/module1-2.html#section", - "href": "modules/module1-2.html#section", - "title": "1.2 Big Data and You", - "section": "", - "text": "Find Your Inner Data Geek\n\n\n\n\n\n\n\n\n\n\nNow that you know the characteristics of big data and some of their applications to the public sector, it’s time to consider how they serve your field of interest. Whether you plan to be a data scientist or work with a team of them, you need to understand how big data are collected, analyzed, and communicated. This module starts with an overview of data science to learn the steps in the life cycle of a big data project. It ends with you developing a plan to earn a data certification that exposes you to the tools and techniques of stages in the data science life cycle that fit your skill needs.", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.2 Big Data and You" - ] - }, - { - "objectID": "modules/module1-2.html#read-1", - "href": "modules/module1-2.html#read-1", - "title": "1.2 Big Data and You", - "section": "Read", - "text": "Read\n\nThe Data Science Process: A Visual Guide to Standard Procedures in Data Science by Chanin Nantasenamat\nHilary: the most poisoned baby name in US history by Hilary Parker", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.2 Big Data and You" - ] - }, - { - "objectID": "modules/module1-2.html#watch", - "href": "modules/module1-2.html#watch", - "title": "1.2 Big Data and You", - "section": "Watch", - "text": "Watch\n\nHow I Would Learn Data Science (If I Had to Start Over) by Ken Jee", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.2 Big Data and You" - ] - }, - { - "objectID": "modules/module1-2.html#complete", - "href": "modules/module1-2.html#complete", - "title": "1.2 Big Data and You", - "section": "Complete", - "text": "Complete\nThis week you’ll spend most of your time creating a plan to earn a data certification by the end of the semester. This assignment requires that you identify a research question and develop a plan to complete workshops over the semester that will expose you to the data science knowledge and skills needed to develop a big data project. By the end of the semester, you’ll submit a proposal that describes how you can answer the research question using big data and data science techniques.\nYou may be a novice to big data terms and sources at this point in the semester, but you probably have questions in your field that can be explored with data. For example, you may be interested in pollution levels of major cities and what conditions lead to poor air quality days. While structured data like measures of ozone levels, particulate matter, and nitrogen oxides can be used to identify poor air quality, big data generated from the sources of pollutants can be used to predict poor air quality days.  Vehicles, traffic lights, navigation apps, satellites, etc. generate high velocity, high volume, and unstructured data that can be harnessed to predict poor air quality days. With these data, you can explore questions like, “What impact could limiting semi-trucks driving on I-75 between 6am and 6pm have on air quality in Atlanta?” The purpose of the data certification is to learn about the types of big data and analysis techniques needed to answer the question that you formulate from your field of interest.\nFormulating a good research question takes time and is iterative as you learn about types of data and techniques. This exercise gets you started on crafting a question and finding tutorials that can help you develop a project plan to answer it. It requires that you identify five, 90 minute tutorials that can tailor your learning about big data to answer the research question that you develop. To complete the assignment for this week:\n\nRead the instructions for the data certification plan.\nIf you need some inspiration, Google “big data and X (your field).” It’s also a good idea to look at some academic journals in your field to see how the techniques of data science are being applied.\nWrite me an email or set up a time to talk if you need help.\nSubmit your data certification plan to corresponding assignment folder.\n\nDue by: 9/3 at 11:59pm", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.2 Big Data and You" - ] - }, - { - "objectID": "modules/module1-2.html#the-challenges-of-big-data", - "href": "modules/module1-2.html#the-challenges-of-big-data", - "title": "1.2 Big Data and You", - "section": "1. 4 The Challenges of Big Data", - "text": "1. 4 The Challenges of Big Data", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.2 Big Data and You" - ] - }, - { - "objectID": "modules/module1.html#content", - "href": "modules/module1.html#content", - "title": "Module 1 - Introduction to Big Data", - "section": "Content", - "text": "Content\n\n\n\nSection\nAssignment\nDue Date\n\n\n\n\n1.1 The Value of Big Data\nDiscussion Post\n8/27\n\n\n1.2 Big Data and You\nData Certification Plan\n9/3\n\n\n1.3 The Challenges of Big Data\nDiscussion Post & Peer Response\n9/10 & 9/14" - }, - { - "objectID": "modules/module1/module1.html#module-overview", - "href": "modules/module1/module1.html#module-overview", - "title": "Module 1 - Introduction to Big Data", - "section": "Module Overview", - "text": "Module Overview\nThis module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." - }, - { - "objectID": "modules/module1/module1.html#content", - "href": "modules/module1/module1.html#content", - "title": "Module 1 - Introduction to Big Data", - "section": "Content", - "text": "Content\n\n\n\nSection\nAssignment\nDue Date\n\n\n\n\n1.1 The Value of Big Data\nDiscussion Post\n8/27\n\n\n1.2 Big Data and You\nData Certification Plan\n9/3\n\n\n1.3 The Challenges of Big Data\nDiscussion Post & Peer Response\n9/10 & 9/14" - }, - { - "objectID": "modules/module1/module1-2.html", - "href": "modules/module1/module1-2.html", - "title": "Module 1 - Introduction to Big Data", - "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." - }, - { - "objectID": "modules/module1/module1-2.html#module-overview", - "href": "modules/module1/module1-2.html#module-overview", - "title": "Module 1 - Introduction to Big Data", - "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." - }, - { - "objectID": "modules/module1/module1-2.html#assignments", - "href": "modules/module1/module1-2.html#assignments", - "title": "Module 1 - Introduction to Big Data", - "section": "Assignments", - "text": "Assignments\n\n\n\nSection & Assignment\nDue Date\n\n\n\n\n1.2 Discussion\n1/17\n\n\n1.3 Data Certification Plan\n1/24\n\n\n1.4 Discussion\n1/31" - }, - { - "objectID": "modules/module1/module1-2.html#the-value-of-big-data", - "href": "modules/module1/module1-2.html#the-value-of-big-data", - "title": "Module 1 - Introduction to Big Data", - "section": "1. 2 The Value of Big Data", - "text": "1. 2 The Value of Big Data\nThis module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." - }, - { - "objectID": "modules/module1/module1-2.html#transforming-information-to-ideas", - "href": "modules/module1/module1-2.html#transforming-information-to-ideas", - "title": "Module 1 - Introduction to Big Data", - "section": "Transforming Information to Ideas", - "text": "Transforming Information to Ideas\nThis week we’ll explore the characteristics of big data and its value to the public sector." - }, - { - "objectID": "modules/module1/module1-2.html#read", - "href": "modules/module1/module1-2.html#read", - "title": "Module 1 - Introduction to Big Data", - "section": "Read", - "text": "Read\n\nDesouza & Smith, Big Data for Social Innovation, Stanford Social Innovation Review, Summer 2014.\nWorld Bank Group, Big Data in Action for Government, World Bank Governance Practice, 2017.\nMeier, P., Digital Humanitarians, Chapter 1, CRC Press, 2015.\nPentland, A. Social Physics, Chapter 1, Penguin Group, 2015." - }, - { - "objectID": "modules/module1/module1-2.html#prepare", - "href": "modules/module1/module1-2.html#prepare", - "title": "Module 1 - Introduction to Big Data", - "section": "Prepare", - "text": "Prepare\nPlease prepare to discuss your thoughts on the following prompts:\n\nDescribe two characteristics of big data and how they apply to data produced or are used by your field of interest.\nDescribe ideas or practices in this week’s readings that you had not considered before.\nWhat claims about social physics do you find valid? In what ways would you challenge Pentland’s “promise” of social physics?" - }, - { - "objectID": "modules/module1/module1-2.html#attend", - "href": "modules/module1/module1-2.html#attend", - "title": "Module 1 - Introduction to Big Data", - "section": "Attend", - "text": "Attend\nIn person discussion, 1/17 at 4:30-5:30pm" - }, - { - "objectID": "modules/module1/module1-2.html#big-data-and-you", - "href": "modules/module1/module1-2.html#big-data-and-you", - "title": "Module 1 - Introduction to Big Data", - "section": "1. 3 Big Data and You", - "text": "1. 3 Big Data and You" - }, - { - "objectID": "modules/module1/module1-2.html#section", - "href": "modules/module1/module1-2.html#section", - "title": "Module 1 - Introduction to Big Data", - "section": "", - "text": "Find Your Inner Data Geek\n\n\n\n\n\n\n\n\n\n\nNow that you know the characteristics of big data and some of their applications to the public sector, it’s time to consider how they serve your field of interest. Whether you plan to be a data scientist or work with a team of them, you need to understand how big data are collected, analyzed, and communicated. This module starts with an overview of data science to learn the steps in the life cycle of a big data project. It ends with you developing a plan to earn a data certification that exposes you to the tools and techniques of stages in the data science life cycle that fit your skill needs." - }, - { - "objectID": "modules/module1/module1-2.html#read-1", - "href": "modules/module1/module1-2.html#read-1", - "title": "Module 1 - Introduction to Big Data", - "section": "Read", - "text": "Read\n\nThe Data Science Process: A Visual Guide to Standard Procedures in Data Science by Chanin Nantasenamat\nHilary: the most poisoned baby name in US history by Hilary Parker" - }, - { - "objectID": "modules/module1/module1-2.html#watch", - "href": "modules/module1/module1-2.html#watch", - "title": "Module 1 - Introduction to Big Data", - "section": "Watch", - "text": "Watch\n\nHow I Would Learn Data Science (If I Had to Start Over) by Ken Jee" - }, - { - "objectID": "modules/module1/module1-2.html#complete", - "href": "modules/module1/module1-2.html#complete", - "title": "Module 1 - Introduction to Big Data", - "section": "Complete", - "text": "Complete\nThis week you’ll spend most of your time creating a data certification plan. This assignment requires that you identify a research question and develop a plan to complete workshops over the semester that will expose you to the data science knowledge and skills needed to develop a big data project. By the end of the semester, you’ll submit a proposal that describes how you can answer the research question using big data and data science techniques.\nYou will turn in this plan as your “discussion” this week. But don’t wait until class day to start this assignment. Formulating a good research question takes time, and if you’re struggling with how to use big data in a project that interests you then you need to give me time to help you. Reach out by email to set up a time to talk or post questions to the Class Q & A discussion board. \n\nDownload the instructions for the data certification plan.\nIf you need some inspiration, check out this web blog. Or Google “big data and X (your field).” It’s also a good idea to look at some academic journals in your field to see how the techniques of data science are being applied. \nWrite me an email or set up a time to talk if you need help.\nSubmit your data certification plan to the “1.3 Data Certification” assignment folder.\n\nDue by: 11/24 at 11:59pm" - }, - { - "objectID": "modules/module1/module1-2.html#the-challenges-of-big-data", - "href": "modules/module1/module1-2.html#the-challenges-of-big-data", - "title": "Module 1 - Introduction to Big Data", - "section": "1. 4 The Challenges of Big Data", - "text": "1. 4 The Challenges of Big Data" - }, - { - "objectID": "modules/module1/module1-1.html", - "href": "modules/module1/module1-1.html", - "title": "Module 1 - Introduction to Big Data", - "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." - }, - { - "objectID": "modules/module1/module1-1.html#module-overview", - "href": "modules/module1/module1-1.html#module-overview", - "title": "Module 1 - Introduction to Big Data", - "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." - }, - { - "objectID": "modules/module1/module1-1.html#content", - "href": "modules/module1/module1-1.html#content", - "title": "Module 1 - Introduction to Big Data", - "section": "Content", - "text": "Content\n\n\n\nSections\nDue Date\n\n\n\n\n1.1 Discussion\n1/17\n\n\n1.2 Data Certification Plan\n1/24\n\n\n1.3 Discussion\n1/31" - }, - { - "objectID": "modules/module1/module1-1.html#assignments", - "href": "modules/module1/module1-1.html#assignments", - "title": "Module 1 - Introduction to Big Data", - "section": "Assignments", - "text": "Assignments\n\n\n\nAssignments\nDue Date\n\n\n\n\n1.1 Discussion\n1/17\n\n\n1.2 Data Certification Plan\n1/24\n\n\n1.3 Discussion\n1/31" - }, - { - "objectID": "modules/module1/module1-1.html#the-value-of-big-data", - "href": "modules/module1/module1-1.html#the-value-of-big-data", - "title": "Module 1 - Introduction to Big Data", - "section": "1. 2 The Value of Big Data", - "text": "1. 2 The Value of Big Data\nThis module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." - }, - { - "objectID": "modules/module1/module1-1.html#transforming-information-to-ideas", - "href": "modules/module1/module1-1.html#transforming-information-to-ideas", - "title": "Module 1 - Introduction to Big Data", - "section": "Transforming Information to Ideas", - "text": "Transforming Information to Ideas\nThis week we’ll explore the characteristics of big data and its value to the public sector." - }, - { - "objectID": "modules/module1/module1-1.html#read", - "href": "modules/module1/module1-1.html#read", - "title": "Module 1 - Introduction to Big Data", - "section": "Read", - "text": "Read\n\nDesouza & Smith, Big Data for Social Innovation, Stanford Social Innovation Review, Summer 2014.\nWorld Bank Group, Big Data in Action for Government, World Bank Governance Practice, 2017.\nMeier, P., Digital Humanitarians, Chapter 1, CRC Press, 2015.\nPentland, A. Social Physics, Chapter 1, Penguin Group, 2015." - }, - { - "objectID": "modules/module1/module1-1.html#prepare", - "href": "modules/module1/module1-1.html#prepare", - "title": "Module 1 - Introduction to Big Data", - "section": "Prepare", - "text": "Prepare\nPlease prepare to discuss your thoughts on the following prompts:\n\nDescribe two characteristics of big data and how they apply to data produced or are used by your field of interest.\nDescribe ideas or practices in this week’s readings that you had not considered before.\nWhat claims about social physics do you find valid? In what ways would you challenge Pentland’s “promise” of social physics?" - }, - { - "objectID": "modules/module1/module1-1.html#attend", - "href": "modules/module1/module1-1.html#attend", - "title": "Module 1 - Introduction to Big Data", - "section": "Attend", - "text": "Attend\nIn person discussion, 1/17 at 4:30-5:30pm" - }, - { - "objectID": "modules/module1/module1-1.html#big-data-and-you", - "href": "modules/module1/module1-1.html#big-data-and-you", - "title": "Module 1 - Introduction to Big Data", - "section": "1. 3 Big Data and You", - "text": "1. 3 Big Data and You" - }, - { - "objectID": "modules/module1/module1-1.html#section", - "href": "modules/module1/module1-1.html#section", - "title": "Module 1 - Introduction to Big Data", - "section": "", - "text": "Find Your Inner Data Geek\n\n\n\n\n\n\n\n\n\n\nNow that you know the characteristics of big data and some of their applications to the public sector, it’s time to consider how they serve your field of interest. Whether you plan to be a data scientist or work with a team of them, you need to understand how big data are collected, analyzed, and communicated. This module starts with an overview of data science to learn the steps in the life cycle of a big data project. It ends with you developing a plan to earn a data certification that exposes you to the tools and techniques of stages in the data science life cycle that fit your skill needs." - }, - { - "objectID": "modules/module1/module1-1.html#read-1", - "href": "modules/module1/module1-1.html#read-1", - "title": "Module 1 - Introduction to Big Data", - "section": "Read", - "text": "Read\n\nThe Data Science Process: A Visual Guide to Standard Procedures in Data Science by Chanin Nantasenamat\nHilary: the most poisoned baby name in US history by Hilary Parker" - }, - { - "objectID": "modules/module1/module1-1.html#watch", - "href": "modules/module1/module1-1.html#watch", - "title": "Module 1 - Introduction to Big Data", - "section": "Watch", - "text": "Watch\n\nHow I Would Learn Data Science (If I Had to Start Over) by Ken Jee" - }, - { - "objectID": "modules/module1/module1-1.html#complete", - "href": "modules/module1/module1-1.html#complete", - "title": "Module 1 - Introduction to Big Data", - "section": "Complete", - "text": "Complete\nThis week you’ll spend most of your time creating a data certification plan. This assignment requires that you identify a research question and develop a plan to complete workshops over the semester that will expose you to the data science knowledge and skills needed to develop a big data project. By the end of the semester, you’ll submit a proposal that describes how you can answer the research question using big data and data science techniques.\nYou will turn in this plan as your “discussion” this week. But don’t wait until class day to start this assignment. Formulating a good research question takes time, and if you’re struggling with how to use big data in a project that interests you then you need to give me time to help you. Reach out by email to set up a time to talk or post questions to the Class Q & A discussion board. \n\nDownload the instructions for the data certification plan.\nIf you need some inspiration, check out this web blog. Or Google “big data and X (your field).” It’s also a good idea to look at some academic journals in your field to see how the techniques of data science are being applied. \nWrite me an email or set up a time to talk if you need help.\nSubmit your data certification plan to the “1.3 Data Certification” assignment folder.\n\nDue by: 11/24 at 11:59pm" - }, - { - "objectID": "modules/module1/module1-1.html#the-challenges-of-big-data", - "href": "modules/module1/module1-1.html#the-challenges-of-big-data", - "title": "Module 1 - Introduction to Big Data", - "section": "1. 4 The Challenges of Big Data", - "text": "1. 4 The Challenges of Big Data" - }, - { - "objectID": "modules/module1/module1-3.html", - "href": "modules/module1/module1-3.html", - "title": "Module 1 - Introduction to Big Data", - "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." - }, - { - "objectID": "modules/module1/module1-3.html#module-overview", - "href": "modules/module1/module1-3.html#module-overview", - "title": "Module 1 - Introduction to Big Data", - "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." - }, - { - "objectID": "modules/module1/module1-3.html#assignments", - "href": "modules/module1/module1-3.html#assignments", - "title": "Module 1 - Introduction to Big Data", - "section": "Assignments", - "text": "Assignments\n\n\n\nSection & Assignment\nDue Date\n\n\n\n\n1.2 Discussion\n1/17\n\n\n1.3 Data Certification Plan\n1/24\n\n\n1.4 Discussion\n1/31" - }, - { - "objectID": "modules/module1/module1-3.html#the-value-of-big-data", - "href": "modules/module1/module1-3.html#the-value-of-big-data", - "title": "Module 1 - Introduction to Big Data", - "section": "1. 2 The Value of Big Data", - "text": "1. 2 The Value of Big Data\nThis module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study." - }, - { - "objectID": "modules/module1/module1-3.html#transforming-information-to-ideas", - "href": "modules/module1/module1-3.html#transforming-information-to-ideas", - "title": "Module 1 - Introduction to Big Data", - "section": "Transforming Information to Ideas", - "text": "Transforming Information to Ideas\nThis week we’ll explore the characteristics of big data and its value to the public sector." - }, - { - "objectID": "modules/module1/module1-3.html#read", - "href": "modules/module1/module1-3.html#read", - "title": "Module 1 - Introduction to Big Data", - "section": "Read", - "text": "Read\n\nDesouza & Smith, Big Data for Social Innovation, Stanford Social Innovation Review, Summer 2014.\nWorld Bank Group, Big Data in Action for Government, World Bank Governance Practice, 2017.\nMeier, P., Digital Humanitarians, Chapter 1, CRC Press, 2015.\nPentland, A. Social Physics, Chapter 1, Penguin Group, 2015." - }, - { - "objectID": "modules/module1/module1-3.html#prepare", - "href": "modules/module1/module1-3.html#prepare", - "title": "Module 1 - Introduction to Big Data", - "section": "Prepare", - "text": "Prepare\nPlease prepare to discuss your thoughts on the following prompts:\n\nDescribe two characteristics of big data and how they apply to data produced or are used by your field of interest.\nDescribe ideas or practices in this week’s readings that you had not considered before.\nWhat claims about social physics do you find valid? In what ways would you challenge Pentland’s “promise” of social physics?" - }, - { - "objectID": "modules/module1/module1-3.html#attend", - "href": "modules/module1/module1-3.html#attend", - "title": "Module 1 - Introduction to Big Data", - "section": "Attend", - "text": "Attend\nIn person discussion, 1/17 at 4:30-5:30pm" - }, - { - "objectID": "modules/module1/module1-3.html#big-data-and-you", - "href": "modules/module1/module1-3.html#big-data-and-you", - "title": "Module 1 - Introduction to Big Data", - "section": "1. 3 Big Data and You", - "text": "1. 3 Big Data and You" - }, - { - "objectID": "modules/module1/module1-3.html#section", - "href": "modules/module1/module1-3.html#section", - "title": "Module 1 - Introduction to Big Data", - "section": "", - "text": "Find Your Inner Data Geek\n\n\n\n\n\n\n\n\n\n\nNow that you know the characteristics of big data and some of their applications to the public sector, it’s time to consider how they serve your field of interest. Whether you plan to be a data scientist or work with a team of them, you need to understand how big data are collected, analyzed, and communicated. This module starts with an overview of data science to learn the steps in the life cycle of a big data project. It ends with you developing a plan to earn a data certification that exposes you to the tools and techniques of stages in the data science life cycle that fit your skill needs." - }, - { - "objectID": "modules/module1/module1-3.html#read-1", - "href": "modules/module1/module1-3.html#read-1", - "title": "Module 1 - Introduction to Big Data", - "section": "Read", - "text": "Read\n\nThe Data Science Process: A Visual Guide to Standard Procedures in Data Science by Chanin Nantasenamat\nHilary: the most poisoned baby name in US history by Hilary Parker" - }, - { - "objectID": "modules/module1/module1-3.html#watch", - "href": "modules/module1/module1-3.html#watch", - "title": "Module 1 - Introduction to Big Data", - "section": "Watch", - "text": "Watch\n\nHow I Would Learn Data Science (If I Had to Start Over) by Ken Jee" - }, - { - "objectID": "modules/module1/module1-3.html#complete", - "href": "modules/module1/module1-3.html#complete", - "title": "Module 1 - Introduction to Big Data", - "section": "Complete", - "text": "Complete\nThis week you’ll spend most of your time creating a data certification plan. This assignment requires that you identify a research question and develop a plan to complete workshops over the semester that will expose you to the data science knowledge and skills needed to develop a big data project. By the end of the semester, you’ll submit a proposal that describes how you can answer the research question using big data and data science techniques.\nYou will turn in this plan as your “discussion” this week. But don’t wait until class day to start this assignment. Formulating a good research question takes time, and if you’re struggling with how to use big data in a project that interests you then you need to give me time to help you. Reach out by email to set up a time to talk or post questions to the Class Q & A discussion board. \n\nDownload the instructions for the data certification plan.\nIf you need some inspiration, check out this web blog. Or Google “big data and X (your field).” It’s also a good idea to look at some academic journals in your field to see how the techniques of data science are being applied. \nWrite me an email or set up a time to talk if you need help.\nSubmit your data certification plan to the “1.3 Data Certification” assignment folder.\n\nDue by: 11/24 at 11:59pm" - }, - { - "objectID": "modules/module1/module1-3.html#the-challenges-of-big-data", - "href": "modules/module1/module1-3.html#the-challenges-of-big-data", - "title": "Module 1 - Introduction to Big Data", - "section": "1. 4 The Challenges of Big Data", - "text": "1. 4 The Challenges of Big Data" - }, - { - "objectID": "modules/module1-0.html#module-overview", - "href": "modules/module1-0.html#module-overview", - "title": "Big Data for Public Good", - "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.", - "crumbs": [ - "Module 1 - Introduction to Big Data", - "Module Overview" - ] - }, - { - "objectID": "modules/module1-0.html#content", - "href": "modules/module1-0.html#content", - "title": "Module Overview", - "section": "Content", - "text": "Content\n\n\n\n\n\n\n\n\nSection\nAssignment\nDue Date\n\n\n\n\n1.1 The Value of Big Data\nDiscussion Post\n8/27\n\n\n1.2 Big Data and You\nData Certification Plan\n9/3\n\n\n1.3 The Challenges of Big Data\nDiscussion Post & Peer Response\n9/10 & 9/14", - "crumbs": [ - "Module 1 - Introduction to Big Data" - ] - }, - { - "objectID": "modules/module1-0.html", - "href": "modules/module1-0.html", - "title": "Module Overview", - "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.", - "crumbs": [ - "Module 1 - Introduction to Big Data" - ] - }, - { - "objectID": "modules/module1-0.html#introduction", - "href": "modules/module1-0.html#introduction", - "title": "Module Overview", - "section": "", - "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.", - "crumbs": [ - "Module 1 - Introduction to Big Data" + "Module 1 - Introduction to Big Data", + "1.1 The Value of Big Data" ] }, { @@ -1467,44 +222,50 @@ ] }, { - "objectID": "modules/module1-1.html#post-discussion-1", - "href": "modules/module1-1.html#post-discussion-1", - "title": "1.1 The Value of Big Data", - "section": "Post Discussion 1", - "text": "Post Discussion 1\nAddress the following:\n\nDescribe at least two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data from the readings this week that I can address in my weekly announcement?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in text using APA style. Posts should range between 400-500 words.\nDue by: 8/27 11:59 pm EST", + "objectID": "index.html", + "href": "index.html", + "title": "Welcome to Big Data for Public Good!", + "section": "", + "text": "Overview\nPublic and nonprofit agencies are beginning to unlock the potential of large-scale data to improve service delivery and inform policy. Computational tools capable of making productive use of big data have proliferated in recent years, drastically decreasing the barriers to entry for these applications. This course explores the practice of using data to improve decision-making and evaluation, including techniques for data collection, analysis, and behavior change. You will learn about the opportunities and challenges of big data and devise a plan to apply them to areas of personal interest within the public sector.\n\n\n\nCourse Goal & Objectives\nBy the end of this course you should be able to:\n\nDefine big data and describe its characteristics;\nDiscuss how public agencies harness large-scale data to inform policy design, increase stakeholder engagement, and improve service delivery.\nRecognize situations when it’s possible to collect data to inform evidenced-based policy decision-making;\nIntelligently consider the social, political, and ethical considerations of using big data and its analytical techniques for public uses.\n\n\n\nHelpful Notes from Your Professor\n\nComplete all readings and watch the videos. They compliment each other, so both are critical to understanding course materials\nDon’t hesitate to contact me with questions or problems at any point during the semester. I will always try to respond to e-mails quickly.\nThe topics in this class are in the news all the time. Don’t hesitate to send me or post articles or videos related to course material.\n\n\n\nRequired Book\nNo books are required for this course. All learning materials are available through the course site.\n\n\nContact Information\nProfessor A Phone: (000) 123-4567 Email: Please contact Dr. Cynthia Searcy at csearcy@gsu.edu for question regarding this open course Office hour: by appointment" + }, + { + "objectID": "discussions/M4-2.html#post", + "href": "discussions/M4-2.html#post", + "title": "Discussion 6", + "section": "Post", + "text": "Post\nO’Neil’s article (2023) reports on the voices of five women from tech fields who have been exposing problems with bias and discrimination in algorithms for years. Hendrycks and colleagues (2023) released a report earlier this month warning that AI poses  catastrophic risks to humanity.  Answer the following:\n\nAfter reading the O’Neil article, explore further the work of either Buolamwini, Chowdhury, Gangadharan, Gebru, or Noble. Describe the site/content you reviewed, why you chose it, and what you learned.\nThe report by Hendrycks and colleagues (2023) underpins the AI risks of the “AI Doomers” as described in the O’Neil article (2023). One of the signers of the Statement on AI risks is Geoffry Hinton, an Emeritus Professor of Computer Science at the University of Toronto. He is quoted by O’Neil as saying “I believe that the possibility that digital intelligence will become much smarter than humans and will replace us as the apex intelligence is a more serious threat to humanity than bias and discrimination, even though bias and discrimination are happening now and need to be confronted urgently.” Do you agree? Why or why not?\nObermeyer and colleagues (2021) provide practical steps that organizations can take to diminish harmful effect of AI. Which of these overlap with calls from Buolamwini, Chowdhury, Gangadharan, Gebru, Noble, and the “AI Doomers”? Which steps do you think are most urgent and why?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words.\nDue by: 10/22 11:59 pm EST", "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" + "Discussion 6" ] }, { - "objectID": "discussions/M1-1.html", - "href": "discussions/M1-1.html", - "title": "Discussion 1", - "section": "", - "text": "Post\nAddress the following:\n\nDescribe at least two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data from the readings this week that I can address in my weekly announcement?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in text using APA style. Posts should range between 400-500 words.\nDue by: 8/27 11:59 pm EST\nSee here for Rubrics", + "objectID": "discussions/M4-2.html#respond", + "href": "discussions/M4-2.html#respond", + "title": "Discussion 6", + "section": "Respond", + "text": "Respond\nThis week you are assigned to small groups again to learn what content your peers explored and their thoughts on the risks of AI. Read two of your group members’ posts and describe how your viewpoints converge and diverge. Please make sure that everyone in your group gets at least one set of comments.\nDue by: 10/26 at 11:59pm EST\nSee here for Rubrics", "crumbs": [ - "Discussion 1" + "Discussion 6" ] }, { - "objectID": "discussions/M1-1.html#m1.2-the-value-of-big-data", - "href": "discussions/M1-1.html#m1.2-the-value-of-big-data", - "title": "Module 1 Discussions", + "objectID": "discussions/M3-2.html", + "href": "discussions/M3-2.html", + "title": "Discussion 4", "section": "", - "text": "Post\nAddress the following:\n\nDescribe two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data that we can address this week in class?\n\nDue by: 8/29 11:59 pm EST\nDiscuss\nOnline discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.\nDue by: 9/3 11:59 pm EST", + "text": "Post\nAt this point in the semester, you should have identified the topic of your big data research proposal and have some idea of what data you will use. \nFor this discussion, you should create an annotated bibliography to support the Literature Review of your research proposal. Hopefully you’ve already been reading studies related to your topic and the methods used in them. The literature review requires you to describe this research and how your project using big data fills a gap in what is known.\nThese instructions for the proposal detail what sections to include and the requirements of each section. Make sure that you **upload the annotated bibliography in a Word Document in the post.**\nThe annotated bibliography should start with a short paragraph that states your research question and why it’s important. Next, I would like you to 1) create a list of 7-10 articles/reports that you will include in your literature review; 2) for each work write 4-6 sentences that describes its findings and how it relates to your topic; and 3) order them to demonstrate how they will relate to each other when you write the literature review section of your proposal. \nDue by: 10/1 at 11:59pm EST\nRespond\nYou will be assigned to small groups for this discussion post. Read two of your group member’s annotated bibliographies and provide constructive comments on strengths and weaknesses. Please make sure that everyone in your group gets at least one set of comments.\nDue by: 10/5 at 11:59pm\nSee here for Rubrics", "crumbs": [ - "Module 1 Discussions" + "Discussion 4" ] }, { - "objectID": "discussions/M1-1.html#m1.4-the-value-of-big-data", - "href": "discussions/M1-1.html#m1.4-the-value-of-big-data", - "title": "Module 1 Discussions", - "section": "M1.4 The Value of Big Data", - "text": "M1.4 The Value of Big Data\nPost\nAddress the following:\n\nDescribe two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data that we can address this week in class?\n\nDue by: 8/29 11:59 pm EST\nDiscuss\nOnline discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.\nDue by: 9/3 11:59 pm EST", + "objectID": "discussions/M3-2.html#m3.2-annotated-bibliography", + "href": "discussions/M3-2.html#m3.2-annotated-bibliography", + "title": "Discussion 4", + "section": "", + "text": "Post\nAt this point in the semester, you should have identified the topic of your big data research proposal and have some idea of what data you will use. \nFor this discussion, you should create an annotated bibliography to support the Literature Review of your research proposal. Hopefully you’ve already been reading studies related to your topic and the methods used in them. The literature review requires you to describe this research and how your project using big data fills a gap in what is known.\nThese instructions for the proposal detail what sections to include and the requirements of each section. Make sure that you **upload the annotated bibliography in a Word Document in the post.**\nThe annotated bibliography should start with a short paragraph that states your research question and why it’s important. Next, I would like you to 1) create a list of 7-10 articles/reports that you will include in your literature review; 2) for each work write 4-6 sentences that describes its findings and how it relates to your topic; and 3) order them to demonstrate how they will relate to each other when you write the literature review section of your proposal. \nDue by: 10/1 at 11:59pm EST\nRespond\nYou will be assigned to small groups for this discussion post. Read two of your group member’s annotated bibliographies and provide constructive comments on strengths and weaknesses. Please make sure that everyone in your group gets at least one set of comments.\nDue by: 10/5 at 11:59pm\nSee here for Rubrics", "crumbs": [ - "Module 1 Discussions" + "Discussion 4" ] }, { @@ -1518,139 +279,152 @@ ] }, { - "objectID": "discussions/M1-3.html#m1.2-the-value-of-big-data", - "href": "discussions/M1-3.html#m1.2-the-value-of-big-data", - "title": "Module 1 Discussions", + "objectID": "discussions/M1-3.html#m1.3-the-challenges-of-big-data", + "href": "discussions/M1-3.html#m1.3-the-challenges-of-big-data", + "title": "Discussion 2", "section": "", - "text": "Post\nAddress the following:\n\nDescribe two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data that we can address this week in class?\n\nDue by: 8/29 11:59 pm EST\nDiscuss\nOnline discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.\nDue by: 9/3 11:59 pm EST", + "text": "Post\nAddress the following:\n\nUsing all three of this week’s readings, describe the challenges of big data that you feel are urgent to address by the public and why.\nWhat issues do the authors address that you don’t feel are urgent? Why?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words.\nDue by: 9/10 11:59 pm EST\nSee here for Rubrics", "crumbs": [ - "Module 1 Discussions" + "Discussion 2" ] }, { - "objectID": "discussions/M1-3.html#m1.4-the-value-of-big-data", - "href": "discussions/M1-3.html#m1.4-the-value-of-big-data", - "title": "Module 1 Discussions", - "section": "M1.4 The Value of Big Data", - "text": "M1.4 The Value of Big Data\nPost\nAddress the following:\n\nDescribe two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data that we can address this week in class?\n\nDue by: 8/29 11:59 pm EST\nDiscuss\nOnline discussions are important to the dialogue and learning in this class. Take some time to respond to at least two of your classmates’ posts by Friday.\nDue by: 9/3 11:59 pm EST", + "objectID": "discussions/M1-3.html#respond", + "href": "discussions/M1-3.html#respond", + "title": "Discussion 2", + "section": "Respond", + "text": "Respond\nYou are assigned to a small discussion group this week. Read the posts of your peers and critique two group members’ concern/lack of concern about big data. Is the concern outweighed by the benefits of the data/its use? What’s missing from your peers’ argument?\nDue by: 9/14 11:59 pm EST", "crumbs": [ - "Module 1 Discussions" + "Discussion 2" ] }, { - "objectID": "discussions/M1-1.html#m1.1-the-value-of-big-data", - "href": "discussions/M1-1.html#m1.1-the-value-of-big-data", - "title": "Discussion 1", + "objectID": "assignments/project_proposal.html", + "href": "assignments/project_proposal.html", + "title": "Big Data Project Proposal", "section": "", - "text": "Post\nAddress the following:\n\nDescribe at least two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data from the readings this week that I can address in my weekly announcement?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in text using APA style. Posts should range between 400-500 words.\nDue by: 8/27 11:59 pm EST\nSee here for Rubrics", - "crumbs": [ - "Discussion 1" - ] + "text": "At this point in your graduate studies, you’ve read multiple research papers and hopefully recognized a pattern of how they’re written. Although order can vary, most social science disciplines include these sections: Introduction, Purpose, Literature Review, Methods, Findings, Discussion, Limitations. The organization of a project proposal is very similar, except it stops after the Methods section." }, { - "objectID": "modules/module1-3.html#post-discussion-m1.3", - "href": "modules/module1-3.html#post-discussion-m1.3", - "title": "1.3 The Challenges of Big Data", - "section": "Post Discussion M1.3", - "text": "Post Discussion M1.3\nAddress the following:\n\nUsing all three of this week’s readings, describe the challenges of big data that you feel are urgent to address by the public and why.\nWhat issues do the authors address that you don’t feel are urgent? Why?\n\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words.\n\nDue by: 9/10 11:59 pm EST", + "objectID": "assignments/lab3.html", + "href": "assignments/lab3.html", + "title": "Lab 3 Machine Learning & Prediction", + "section": "", + "text": "Overview\nIn the previous lab we examined the practice of feature selection - identifying the proper set of variables to use as inputs into a predictive model. Gottman developed a framework using 20 micro-expressions coded at one-second intervals. Iceland developed a set of variables that help predict whether teens are likely to abuse alcohol.\nA key pattern that emerges from the case studies and a take-away from Lab 2 is that identifying the proper data to include in a model is difficult. Typically, experts start with a large number of measures and eliminate variables as they are tested. It requires patience and attention to detail to identify the right set of variables for a given problem.\nThis week we will explore the process of “feature engineering,” the task of starting from raw data sources and “engineering” new variables by finding ways to code or extract new features. Feature engineering is common when using data like satellite imagery, document archives, or video. It is common that the data was not designed for the task at hand, but it contains rich information that is extremely valuable if it can be extracted.\nFeature engineering is also necessary because predictive models typically require quantitative variables, so data sources like images and text need to be transformed into counts and scales. “Engineering” can be simple or complex. A simple example is computing a new variable, BMI, from two existing ones (height and weight). More complex examples of what this process might look like for unstructured data are below.\nFor example, how can a computer read handwriting? It has to be able to translate easily from an image of hand-written text to some sort of mathematical abstraction of the sentences.\n\n\n\n\n\nIt accomplishes this by isolating words, then isolating individual characters, then using a pixel grid to code which features exist on each character (vertical lines, curved lines, cross-bars, etc.), and making a prediction about which letter might be based upon specific features.\n\n\n\n\n\n(Note, the elements above are concepts from typography that describe fonts, not a list of features from natural language processing, but you get the idea)\nWithout a set of features extracted from the raw image of a character, the computer can’t predict which letter it might be. In this way Lab 4 on Feature Engineering combines elements of Lab 2 on Measurement, and Lab 3 on Feature Selection. You “engineer” or “extract” a feature by first defining it (does the character possess a “bowl”), and then figuring out how the computer will observe or measure that feature.\nThe goal of this lab is to demonstrate a couple of processes of feature engineering from common machine learning and artificial intelligence applications. I would like you to develop an intuitive sense of how engineers approach the creative endeavor of turning raw data into meaningful quantitative measures. Some steps below show how raw data might be processed to make it interpretable (to code characters from text, you first need to isolate individual characters). Others steps show how raw data is translated into a quantitative variable that can be used as an input into a model.\nAs you read through the lab, think about the readings from Social Physics. How did computer scientists approach the task of understanding team performance in organizations? What sorts of data did they collect, and how did they generate quantitative measures from the raw data inputs of environmental sensors and employee smart badges? There are thousands of potential variables you could generate from human interactions in organizations, so how did they decide which measures were important? These themes will be revisited when we look at how Google tried to design the perfect team.\nWe start here with some basic examples of feature engineering in a few task domains.\nOptical Character Recognition\nThe process by which computer programs read handwriting or scan images of text is quite interesting because of how raw image files are converted into structured data. The process is broadly called Optical Character Recognition. short video\nFor these tasks the program must first take an image and identify the different units of text:\n\n\n\n\n\nThese can then be broken into the most basic component parts, characters. The actual predictive models that comprise programs reading text occur on a letter by letter basis, and the paragraphs are reconstructed after all letters are identified:\n\n\n\n\n\nFor these programs to work the computer must be able to effectively isolate each character:\n\n\n\n\n\nImages are a challenging data source because camera resolution, light sources, distance from the subject, and focus can all impact data quality. As a result, the first step in many applications that require computers to extract features from images is to process the image in a way that isolates the important information and standardizes some of the inputs.\nLet’s consider this basic program that allows you to take a picture of a graph, then will generate the underlying data for you. It does this by identifying the trend line, converting it to a pixel grid, then for each horizontal pixel measuring the vertical location of the trend line.\nInput image data can be messy, though. So first we need to standardize it to isolate the trend line.\nRaw image of a graph:\n\n\n\n\n\nStep 1: Convert the colored picture to a grayscale version that emphasizes boundaries of graphics.\n\n\n\n\n\nStep 2: Filter out any data that is below a threshold for opacity or darkness.\n\n\n\n\n\nStep 3: Convert to a negative view to maximize the contrast of the image.\n\n\n\n\n\nThe use of filters in this way is a common first step for models that use images as raw data sources. This will be true for self-driving vehicles that use cameras to collect data streams, or examples from the Digital Humanitarians text that use satellite images to predict the location of damage from a disaster. Next we will look at how these techniques are used to conduct a tree census.\nFor an interesting urban policy application of this approach, thanks to an program built by Microsoft you can now download a database of every single building in the United States that was built by an AI application that was taught to read satellite images.\nDoes anyone recognize this community?\n Trees\nTrees are important for cities, but maintaining a robust urban forest is expensive and challenging. “Trees clean the air and water, reduce stormwater floods, improve building energy use, and mitigate climate change, among other things. For every dollar invested in planting, cities see an average $2.25 return on their investment each year.” cite\nIf we want to decide where to plant trees in our city by bringing a data-driven approach to tree policy, we first need to measure the outcome. “How many trees are in your city? It might seem like a straightforward question, but finding the answer can be a monumental task. New York City’s 2015-2016 tree census, for example, took nearly two years (12,000 hours total) and more than 2,200 volunteers.” cite\nUsing high-resolution images that can capture a wide spectrum of light, data scientists have designed ways to use public data sources and AI to perform this task. Here is a high-resolution image of Washington DC that has eliminated all of the light except that reflecting off of green objects, i.e. plant life in the city (as opposed to buildings and parking lots):\n\n\n\n\n\nEasy enough, right? But wait! Green patches might be grass, shrubs, or flowers. How does the program know NOT to count these green patches as trees?\n\n\n\n\n\nIt turns out that trees reflect green light differently than other plants. If you apply the right filters you can further isolate the trees in the image from the rest of the plants. For example, watch the grass in the park disappear:\n\n\n\n\n\nOr eliminate all of the open green spaces in Boston (the grass at the airport is especially vivid):\n\n\n\n\n\nThese techniques allow us to isolate trees from everything else. But we now have another problem - a green island rarely contains a single tree. How can we isolate individual trees from a group of trees in a cluster?\nThe nice tutorial on tree canopy analysis offers some solutions.\nThe basic recipe for identifying trees within a cluster is to:\n1. Apply a Filter to isolate trees from other elements on the landscape.\n\n\n\n\n\n2. Detect Treetops using an algorithm that can predict the height of each pixel in the image.\n\n\n\n\n\nThis step returns the geographic coordinates of each tree-top in the cluster so that you can see the tree through the forest.\nNote that Lidar uses lasers to enhance digital photographic techniques by including rich measures of light frequency and the ability to triangulate height. Lidar is an expensive technology, but many cities have a database of Lidar images that are open for public use.\n3. Model Canopies by using a geometric tesselation algorithm to predict canopy boundaries and create a new spatial file that contains both tree coordinates, heights, and canopy sizes.\n\n\n\n\n\nThis information would be useful for tracking the number of trees as well as changes in canopies over time.\nFaces\nFacial recognition software has become accurate and ubiquitous. What features does a computer need in order to recognize a person? How are these features engineered from a two-dimensional image of a face?\nFeature Extraction\nSimilar to the programs that read text in images above by identifying text then isolating characters, facial recognition software starts by scanning an image to look for faces. If it finds one it then has a routine to frame the face, then isolate prominent features on the face:\n\n\n\n\n\nSimilar to the hand-writing recognition example, we can apply filters to an image to accentuate specific features. With letters on paper we are trying to maximize the contrast between the ink and the page. With faces, different filters applied to images (or image processing algorithms) will highlight specific facial features.\n\n\n\n\n\nOnce oriented to the face, an algorithm can identify facial landmarks (which is actually similar to how our own brains recognize faces):\n\n\n\n\n\nThe computer translates the landmark view of a small set of prominent features of a face to a grid model that measures the distance between each feature:\n\n\n\n\n\nVoila! We now have a format that can be used to generate quantitative variables describing a face. Each line represents a distance between facial landmarks. The length of each line becomes a distinct feature that can be measured and quantified.\nYou don’t always know the distance from the camera to the face, so you might not be able to predict the actual size of specific features (is my nose big or was the camera just way too close?), but it is easy enough to calculate relative sizes. If you set the distance between the eyes as a measure of one unit, for example, then every other distance on this graph (each line) can be calculated relative to that distance. Thus, you are not identifying individuals based upon the actual size of their noses, but by the relative distances between all features on their face.\nThere are different ways to accomplish this basic process of creating abstract mathematical models of the face. They all return a list of quantitative features that describe an individual face.\nAnd finally, we compare the measurements from a specific face against measurements in a large database of candidate faces. You can do this quickly because you are working with a few dozen measures (distance between eyes, distance between edges of the mouth, distance between edge of mouth to eye, etc.). You would calculate the difference between the face you are trying to identify, and each face in the database by comparing the length of each line.\n\n\n\n\n\nIf the total distance between all of the features falls below a threshold, then the faces are flagged as a match to be examined further by a human, or some action is triggered (unlocking your phone or your front door, for example).\nFeature Selection\nAssuming that the photos are taken in good light with forward-facing subjects and decent resolution cameras, what do you anticipate being a challenge with this process? Facial feature landmarks might vary greatly based upon expressions (or angles of the camera):\n\n\n\n\n\nNote that some features, like distance between the eyes and size of the nose, will be static (i.e., reliable measures). Others, like the edges of the mouth or the size of lips, will be highly dependent upon the expression (i.e., low alpha if we think about facial features as latent constructs).\nStated another way, some features convey more information about the unique identity of an individual than others. The feature selection task is to identify which measures will contain the highest signal-to-noise ratios. The algorithms that match faces can also weight certain features more than others to account for expressions.\nThis paper explores the issue by examining which facial features contain the most information during the recognition task. They do this by examining which features, when changed, render the individual most unrecognizable.\n\n\n\n\n\nStated differently, if you omitted the high-salience features from your model like lip thickness, you would see a large drop in correct matches to the database. If you omit low-salience features like mouth size, you experience a lower impact on the accuracy of facial recognition.\nThis example is meant to give you an intuitive sense of what the computer algorithm might experience as features are “corrupted” in the images and remind you of the importance of the feature selection task.\nSo, if you are trying to escape the country by crossing a border, which feature would you try to disguise?\nLAB QUESTIONS\nPart 1: Letters\nAssume you have designed a program that can effectively isolate individual characters from an image of a license plate:\n\n\n\n\n\nYou now want to develop a machine learning model that will accurately identify each character, so you need to develop a set of features that describe the letters so the computer can begin to tell them apart.\nFor the lab, list three features of characters that could be reliably derived (“engineered”) from an image of a single letter and used as input data for a predictive model that can read a license plate accurately. Note that fonts used by states might vary, so letters and digits will not be identical.\n\n\n\n\n\n\n\n\n\n\nIf you think this sounds challenging, recognize that there are hundreds of potential features (characteristics of the letter). Just look at how many terms hipsters have invented to lovingly diagram their favorite typefaces:\n\n\n\n\n\nOr spend five minutes with a hand-writing analyst.\nIt may be helpful for you to start by describing the difference between these letters to a child.\n\nYou might start by pondering how someone would distinguish the difference between a lowercase “o” and a lowercase “a”.\nHow would you describe the difference between a “1” and an “l”?\nBetween a zero “0” and an upper-case “O”?\nCapital “T” versus lower-case “t”?\n\nPart 2: Using Uniforms to Predict Job Title\nThe blog Toward Data Science describes an interesting machine learning application that predicts which category that an object or person belongs to based upon an image. They demonstrate the software by training it to guess people’s career based upon a picture of them at work:\nFor this tutorial, we have provided a dataset called IdenProf. IdenProf (Identifiable Professionals) is a dataset that contains 11,000 pictures of 10 different professionals that humans can see and recognize their jobs by their mode of dressing.\n\n\n\n\n\nThere are ten professions used in the example:\n\nChef\nDoctor\nEngineer\nFarmer\nFirefighter\nJudge\nMechanic\nPilot\nPolice\nWaiter\n\nThis dataset is split into 9000 pictures (900 pictures for each profession) to train the artificial intelligence model and 2000 pictures (200 pictures for each profession) to test the performance of the artificial intelligence model as it is training. IdenProf has been properly arranged and made ready for training your artificial intelligence model to recognize professionals by their mode of dressing. For reference purposes, if you are using your own image dataset, you must collect at least 500 pictures for each object or scene you want your artificial intelligence model to recognize.\nFor the lab, suggest three features of uniforms that might be used to classify images from a profession. Also include a rule statement about how the feature maps the specific profession.\nFor example:\nRule: If the uniform has stripes on the arms, Prediction the individual will be either a fire fighter or a pilot.\nSome more examples:\n\nIf the uniform includes a skirt the individual is not a farmer, mechanic, or chef (on the job).\nIf the uniform is mostly black, the individual will be a judge, pilot, or police officer.\nIf the uniform is mostly white, the individual will be a chef or a doctor.\nIf the uniform includes a bow tie, the individual will likely be a waiter (or an engineer?).\n\n\n\n\n\n\nPart 3: Home Values\nWe have focused so far on feature engineering examples that use images as the input data, then extract features based upon models of letter style, tree structure, or the topology of faces. In these instances, we are translating from one data type (an image) to a traditional set of quantitative variables in a spreadsheet format (columns represents variables or features, and rows represent observations).\nIt is also common to engineer features from an existing dataset, to create new variables from existing variables. For example, population density is a common variable used in urban policy. This variable requires that you have a measure of the population of an administrative unit (number of people in a census tract) and the total geographic area of that unit. Density is then calculated as people per square mile (or whatever unit you use for area).\nPopulation density is often a more useful variable than the raw population because it tells you something about the average distance between individuals in a city. If you are opening a pizza delivery business, for example, the total population of the city does not matter if they are spread out over a large area. You will be more profitable serving a smaller population that is packed into a tight neighborhood than a large suburb which requires long delivery times and high operating costs. Stated differently, population density is a better predictor of the profitability of your new business.\nThe urban policy research group, Urban Spatial, offers a nice example of this in a model they created to predict gentrification of census tracts using historic census data.\nThey describe several features that they engineer for the model. Similar to other community change models we have examined, they measure characteristics of housing in census tracts to predict how each might change over time.\nOne thing they do differently from previous models, however, is use information about home value contagion. When home values rise in an adjacent census tract it increases the likelihood that home values rise in my census tract. This type of relationship is called a spatial correlation. This might occur because people that are looking to buy in a specific neighborhood might be priced out by rising costs in that neighborhood, so they instead purchase a home in an adjacent neighborhood. Higher demand from the spill-over will drive up prices.\nSimilarly, a drop in prices in a neighborhood can have a contagion effect if buyers looking at purchasing in an adjacent neighborhood get nervous about falling prices and look elsewhere, thus reducing the demand and lowering prices, resulting in a self-fulling prophesy in some instances.\nTo model these processes, you need to take into account information about neighboring communities. The paper Urban Spatial explains how they create (engineer) two variables (features) for their model.\nOne variable is created by calculating the average value of homes in all of the surrounding census tracts. Only those census tracts that are contiguous to your own (they share a border) are included. So, the census tract in the top left corner is excluded in the calculation, for example:\n\n\n\n\n\nThe second variable was created after recognizing that the average value of surrounding homes might not be the best predictor of change. Rather, homeowners typically want to move into hot neighborhoods that have nice amenities and cool people. Since everyone wants to move there, though, these neighborhoods quickly become saturated and overpriced, and buyers spill over into nearby neighborhoods.\nThey identified all of the census tracts in the top 5th of the data (the top 20% of tracts based upon home values) and categorize them as highly-desirable tracts. They then calculate the distance from each tract in the dataset to the nearest highly-desirable tract:\n\n\n\n\n\nIf we are predicting home values for a tract in 2010, the value in 2000 will provide a reference point for where home values start but will not tell you much about whether you expect them to be rising or falling. The value of neighboring tracts, however, is a good predictor (if adjacent tracts have more expensive homes, prices are likely to rise, if they have cheaper homes, prices will likely fall). And the distance to the closest “hot” neighborhood in 2000 will provide a different type of information about possible trends. The two variables that explain trends are both second-order variables that are created from the raw census data.\nFor the Lab: recall that in the previous lab you were required to identify a neighborhood or metropolitan feature that impacts home values (for example, every time a Starbucks opens in a new neighborhood home values increase by 0.5%).\nUsing the feature that you identified in Lab 2 (or another you may need to choose), explain how you would “engineer” it by answering the following questions. You are NOT allowed to use a metric that is already an existing census variable and requires no engineering (i.e., calculation).\n1. What is your unit of analysis? Census tract, zip code, city, etc.\nSome variables are better at a very local scale (crimes tend to impact prices on surrounding blocks but not much further), whereas others are only meaningful at a metro or regional scale (i.e., activity of regional airports).\n2. What type of data do you need to calculate the variable?\nWhat measures are included in your formula? For example, if a new Starbucks impacts home values, you would measure the distance to it for each home. You would need a database of Starbucks in your city with their locations.\n3. What is the process or formula you would follow to create the variable?\nWrite out instructions for calculating your metric in a way that a data engineer could implement. For example, to calculate the distance to the nearest Starbucks:\n\nIdentify the location of each home in the dataset.\nIdentify the nearest Starbucks.\nCalculate the distance between the two points.\n\nNote! These instructions are not specific enough because you can easily calculate the Euclidian distance between two points on a map with a formula, but how often do you travel in a straight line to a destination? Maybe a travel distance time derived from Google maps might be a better metric? Or maybe travel time, rather than distance? Are you assuming people are walking, driving, or taking public transit?\n4. How reliable will your measure be?\nUsing your knowledge about instrument development, do you feel like your new variable will accurately reflect the true underlying latent construct you are trying to measure? For example, using the programs described in the lab, I could fairly accurately measure the total tree canopy cover for a specific neighborhood (the average is about 20% coverage in most cities). If I were trying to create a hipster scale to measure how cool a neighborhood is by identifying how many local menus include craft beer, my scale might be less reliable (largely because craft beer is mainstream, not cool enough).\nLooking Ahead\nThe goal behind the design of the last three labs was (1) to encourage you to think about how the data that we use every day and that has a big impact on our lives is created, and (2) to demystify machine learning and artificial intelligence. These exercises were designed to give you insight into the black box of predictive analytics, remote sensors, and artificial intelligence.\nYou do not need to know how to build a car from scratch to be a good driver. Similarly, you do not need to be a data scientist to incorporate these new tools into your future career. Your job as a future analyst or manager, for example, may be to determine whether automation or a predictive model would add value to your organization. If so, you can hire an expert to build the models for you.\nThat said, a little bit of vocabulary will help you write a call for proposals, interview potential firms, and manage the process along the way. The processes of feature selection and measurement can be challenging for an outside expert that doesn’t intimately understand your program. You can add a lot of value by collaborating with the experts during the process to identify latent constructs, select features that are important to the question at hand, and discussing how the data may need to be engineered for analysis and/or modeling.", "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.3 The Challenges of Big Data" + "Lab 3 Machine Learning & Prediction" ] }, { - "objectID": "modules/module1-3.html#respond", - "href": "modules/module1-3.html#respond", - "title": "1.3 The Challenges of Big Data", - "section": "Respond", - "text": "Respond\nYou are assigned to a small discussion group this week. Read the posts of your peers and critique two group members’ concern/lack of concern about big data. Is the concern outweighed by the benefits of the data/its use? What’s missing from your peers’ argument?\nDue by: 9/14 11:59 pm EST", + "objectID": "assignments/lab1.html", + "href": "assignments/lab1.html", + "title": "Lab 1 Social Data", + "section": "", + "text": "This lab introduces you to two data-driven models of neighborhood change. We will use this case study over the semester to discuss things like data needs for predictive models. You will be required to think critically about the data used in the labs, but you will not be responsible for things like the advanced analytical models in the paper. I am approaching the labs with the assumption that you are likely to be new analysts or a manager hiring an analyst, so you just need a high-level understanding of the models in order to participate in the task.\nNeighborhood change is a complicated concept with a lot of loaded terminology. We might think about neighborhoods that are “revitalized”, “gentrified”, that are “stable”, or that “decline”. We could spend an entire semester unpacking all of these constructs, but that is out of scope of the lab. Here we are more interested in how we might make sense of our data, and then once we have meaningful groups how we might use them to make predictions with the data. Can a city forecast how its current neighborhoods are likely to change over the next decade, and can that help with urban planning processes?\nRead the following articles:\n\nGoldstein, I. Market Value Analysis: A Data-Based Approach to Understanding Urban Housing Markets in Board of Governors, Federal Reserve Systems, Putting Data to Work: Data-Driven Approaches to Strengthening Neighborhoods. 2011. pp 49-59\nDelmelle, E. C. (2017). Differentiating pathways of neighborhood change in 50 US metropolitan areas. Environment and planning A, 49(10), 2402-2424.\n\nWe are interested in understanding neighborhood change. These data-driven approaches to the phenomenon use machine-learning algorithms to “discover” coherent communities within the city by grouping census tracks into groups that minimize within-group differences and maximize between-group differences.\nYou can explore one of these algorithms by looking at examples of how botanists might create “species” based upon characteristics of flowers:\nClustering Example\nA data-driven approach to understanding neighborhood change requires us to (1) define “neighborhoods”, or groups of census tracks in the data that are very similar, and (2) use those group characteristics at a point in time to predict how the “neighborhood” might change in the future. Both of the papers present variations on Step (1) above.\nRead the two papers linked on iCollege, then answer the following questions:\n\nHow did each author identify coherent “neighborhoods” (or groups) in each model?\nWould these “neighborhoods” line up with neighborhoods that are defined on a city’s zoning maps?\nDid the two models use the same data to create the groups?\nHow do the labels and descriptions of the groups differ in each model and why?\n\nWrite your responses in a word document and name your file LAB-01-YOUR-LAST-NAME, then submit it via the iCollege assignment folder. Concise and precise answers are preferred to meandering paragraphs! One page would be fine for this assignment.\nConcise and precise answers are preferred to meandering paragraphs! One page would be fine for this assignment.", "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.3 The Challenges of Big Data" + "Lab 1 Social Data" ] }, { - "objectID": "discussions/M1-3.html#m1.3-the-value-of-big-data", - "href": "discussions/M1-3.html#m1.3-the-value-of-big-data", - "title": "Discussion 2", + "objectID": "about.html", + "href": "about.html", + "title": "About the Big Data for Public Good Open Course", "section": "", - "text": "Post", - "crumbs": [ - "Discussion 2" - ] + "text": "About the course\nThis “Big Data for Public Good” course website serves as a public template for higher education instructors in the fields of public administration or public policy teaching about big data fundamentals.\n\n\nInstruction on how to use the course materiterals\nPlease cite properly cite this site when using or adapting the materials.\n\n\nContact\nPlease contact Dr. Cynthia Searcy at csearcy@gsu.edu for question regarding this open course" }, { - "objectID": "modules/module2-1.html#read", - "href": "modules/module2-1.html#read", - "title": "2.1 Social Data", - "section": "Read", - "text": "Read\n\nBeheshti, A. et al. 2022. Social Data Analytics. Chapter 1 Social Data Analytics; Challenges and Opportunities. p 1-17\nDong, X., Morales, A.J., Jahani, E. et al. 2020. Segregated interactions in urban and online space. EPJ Data Sci. 9, 20.\nMcCosker, A., Farmer, J., and Soltani Panah, A. 2020. Community Responses to Family Violence: Charting Policy Outcomes using Novel Data Sources, Text Mining and Topic Modelling. Swinburne University of Technology, Melbourne.", - "crumbs": [ - "Module 2 - Types of big data", - "2.1 Social Data" - ] + "objectID": "assignments/data-certification-plan.html", + "href": "assignments/data-certification-plan.html", + "title": "Data Certification Plan", + "section": "", + "text": "This assignment requires you to identify at least five workshops or other modules on data techniques that support your development as a data user. It’s intended to align with the big data project proposal that you’ll develop during the semester.\nThis plan starts by asking you to identify a research question that you can answer using the data skills that you’ll learn in the certification. Since it’s early in the semester, what you propose to do in this plan may change as you learn more about big data and its applications to your interests. However, I’m requiring that you submit this plan now so that I can learn what topics you’re interested in and how the data certification will enhance your data skills.\nGSU provides multiple opportunities for learning data skills and tools. Some examples include the library’s Research Data Services workshops, LinkedIn Learning, and O’Reilly for Higher Education. You may also use open access courses via Coursera, EdEx, Kaggle, etc. should they meet your needs. Each of the five workshops/modules that you complete must be:\nIdeally, the series of workshops that you pursue offers a badge or certificate that you can link to your resume or other digital profile to showcase your professional qualifications (e.g., personal website, LinkedIn profile, Handshake profile, etc.) If you’re curating workshops/courses across multiple sources or platforms, you likely won’t be able to earn a badge or certificate. Be aware that some platforms offer free access to courses, but earning a badge/certificate comes with a fee.\nYou may not use trainings or modules that you complete for other courses that you’re taking this semester. You are earning unique credit for this course, so the work you conduct for it must be different than what you may be learning in other classes. This exercise is intended to give you time (and credit) to pursue data knowledge or skills gaps that that aren’t covered in other courses. It’s an opportunity to invest in data skills that you think will be important to your future work.\nDue to the variety of data workshops and free data courses, it’s impossible for me to know what will interest you or their formats. If you have a question about the suitability of a workshop or course for this assignment, please contact me well before the deadline of this plan.\nAfter I review and approve your data certification plan, you will be required to submit a short report after each workshop to identify what you learned, how you’ll use the new skills in your project proposal (or elsewhere), and whether you recommend the workshop to other students. You’ll also submit an example of work that you generated in the training." }, { - "objectID": "modules/module2-1.html#complete", - "href": "modules/module2-1.html#complete", - "title": "2.1 Social Data", - "section": "Complete", - "text": "Complete\nYour first lab assignment is due on 9/21. The labs are non-technical, but they do require you to consider the origins of data, how they perform as measures of individual/social constructs, and how data scientists make choices about data to use in their models and the effects of those choices.\nThis lab introduces you to two data-driven models of neighborhood change. We will use this case study over the semester to discuss things like data needs for predictive models. You will be required to think critically about the data used in the labs, but you will not be responsible for things like the advanced analytical models in the paper. I am approaching the labs with the assumption that you are likely to be new analysts or a manager hiring an analyst, so you just need a high-level understanding of the models in order to participate in the task.\nTo prepare for the lab, you should read these two report (**note the page numbers assigned**):\n\nGoldstein, I. Market Value Analysis: A Data-Based Approach to Understanding Urban Housing Markets in Board of Governors, Federal Reserve Systems, Putting Data to Work: Data-Driven Approaches to Strengthening Neighborhoods. 2011. pp 49-59\n\nDelmelle, E. C. (2017). Differentiating pathways of neighborhood change in 50 US metropolitan areas. Environment and planning A, 49(10), 2402-2424.\n\nRead the instructions for Lab 1 and post your answers to the questions posed in the lab to the 2.2 Lab 1 assignment folder.\nDue by: 9/21 11:59pm EST", + "objectID": "assignments/data-certification-plan.html#i.ask-an-interesting-question", + "href": "assignments/data-certification-plan.html#i.ask-an-interesting-question", + "title": "Data Certification Plan", + "section": "I.Ask an Interesting Question", + "text": "I.Ask an Interesting Question\nDescribe the question that you hope to answer with big data. Why does it interest you and/or why is it important?\nMost research starts with a large question and narrows to a very specific one that can be answered once you know more about what data is available to answer the question and how it can be analyzed. If you’re undecided at this point in the semester what question you’ll propose to answer, describe ones that cluster around the same topic for this assignment.\nFor example, I’m interested in understanding the relationship between needle aversion and COVID vaccination rates. I think that the language and imagery used to encourage vaccination is having the opposite effect. Nearly every advertisement or news segment about COVID vaccination shows a needle going in an arm (up close!) and uses language like “get the shot” or “jab,” which evokes negative feelings in most people (e.g., like going to the dentist or booster shots as a child). For those who also have fears that the vaccination is risky, these images and words reduce the likelihood that they’ll get a vaccine. I hypothesize that the ubiquity and volume of this imagery on TV and social media (big data) is reinforcing needle aversion to an extent that it’s driving down vaccination rates. So, my research question is “Has intense exposure to imagery and language of needles decreased COVID vaccination rates in the United States?” If a relationship exists, removing needle imagery and language from mass media could improve vaccination compliance and lower the threat of COVID to public health." + }, + { + "objectID": "assignments/data-certification-plan.html#ii.-collect-data", + "href": "assignments/data-certification-plan.html#ii.-collect-data", + "title": "Data Certification Plan", + "section": "II. Collect Data", + "text": "II. Collect Data\nDescribe the big data that’s available to answer your research question, its source, and its format. Describe how it meets the characteristics of big data, and how you plan to access it." + }, + { + "objectID": "assignments/data-certification-plan.html#iii.-data-workshops", + "href": "assignments/data-certification-plan.html#iii.-data-workshops", + "title": "Data Certification Plan", + "section": "III. Data Workshops", + "text": "III. Data Workshops\nList the workshops you plan to complete by the end of the semester to build your data knowledge and skills to develop a big data project. For each workshop, describe:\n\nTitle and purpose\nLength/duration\nOrganization providing the training\nLink to the training\nDate(s) you plan to attend/do the training\nWhere it fits in the data science cycle\nHow it will enhance your data knowledge and/or skills" + }, + { + "objectID": "assignments/data-certification-plan.html#iv.-certification", + "href": "assignments/data-certification-plan.html#iv.-certification", + "title": "Data Certification Plan", + "section": "IV. Certification", + "text": "IV. Certification\nDescribe how will you certify completion of each workshop. (Make sure that the badge(s) or certification(s) you may earn will be available to post to iCollege by December 14.)" + }, + { + "objectID": "assignments/lab2.html", + "href": "assignments/lab2.html", + "title": "Lab 2 Open Data and Discovery", + "section": "", + "text": "Instructions\nIn the first lab we examined two articles (Goldstein, 2019; Delmelle, 2017) where machine learn models were used to identify neighborhood “types” in cities.\nThe authors explained how each type tended to change in different ways over time, so understanding what category or stage a neighborhood belongs to helps the city understand what types of change might be eminent.\nIf you read about the methodologies in both articles, you will see that the advanced machine learning methods they apply are not doing anything more complicated than what many indices developed for well-being or consumer prices do – combine different variables and assign probabilities to each to measure an underlying or latent social construct. The main advantage that a computer has in this process is the ability to try every combination of variables to find the optimal way(s) to group them to create distinct dimensions of the construct you want to measure. Once the computers identify the stable groups, it is up to the human to come up with meaningful titles for each and determine if the groups actually tell us anything useful about the world (just like we would like to know if happiness scores tell us something useful).\nUnderstanding the process of how indices are developed is a stepping stone to using predictive analytics to prescribe outcomes to social or organization phenomena. The hardest part is identifying the right types of data to collect and ensuring they are high quality. You can always hire data scientists to build the models once your organization has begun collecting meaningful data.\nIn the previous lab you read about a set of variables used in a model, so you did not have to think very hard about where that data came from and why they were selected. This lab challenges you to break open the black box and think about the process of identifying data to collect for a project. How do you know which variables are useful? How do you know what data is needed to predict success or some other outcome?\nThe short answer is you don’t, not if you are starting a new project. This lab is about “feature selection” - the process of identifying the data you will need for your project.\nStart by listening to this story about the birth and evolution of a large-scale data-driven social program. Pay particular attention to how the researchers figured out what data they needed for the program.\n“Data is like vegetables. It needs to be fresh, and it needs to be local.”\nHow Iceland Saved Its Teens (23:30)\nHow long did it take Iceland to develop its survey for youth? Did they know what they were looking for when they started?\nFeature Selection\nIf we want to use predictive analytics for a problem, we need to identify the data that is best suited for predicting the outcome. At the beginning of most projects, however, we rarely know which factors will be the biggest drivers of outcomes.\nFor example, which school characteristics best predict student performance? Is it the facilities and technology? The level of funding? Classroom sizes? Training and support provided to teachers? Parent involvement? Peer networks? All of these are plausible drivers of student performance – the most important factors are rarely self-evident in advance of having data to test them all.\n“Feature selection” is data science speak for generating a set of hypotheses and measures about what generates the outcome of interest. In many cases, feature selection is an iterative process of generating hypotheses then determining how to find or collect data to test them.\nFeature selection requires critical thinking and creativity more than technical expertise, but is a core component of any successful data science project.\nLab Takeaway: Most data architects and engineers will not be domain or subject matter experts, so they are not always good at identifying useful features. The best approach is often to assemble people close to the problem, brainstorm a large list of features, collect test data, and see what is working before you encode your data collection process for a large project or organizational need. More learning occurs during this phase of the project than any other.\nPART 1: Predicting Divorce in 3 Minutes\nThe following excerpt from Malcolm Gladwell’s book Blink describes the work of John Gottman, one of the world’s foremost experts on predictors of divorce in marriage. Unlike many scholars that study marriage and divorce in the academic field of counseling, Gottman did not approach the problem using traditional psychological theories and counseling tools. Instead he brought a data-driven approach to the subject and meticulously developed models to predict whether relationships would last or would end in divorce.\nSince the 1980s, Gottman has brought more than three thousand married couples—just like Bill and Sue— into that small room in his “love lab” near the University of Washington campus. Each couple has been videotaped, and the results have been analyzed according to something Gottman dubbed SPAFF (for specific affect), a coding system that has twenty separate categories corresponding to every conceivable emotion that a married couple might express during a conversation. Disgust, for example, is 1, contempt is 2, anger is 7, defensiveness is 10, whining is 11, sadness is 12, stonewalling is 13, neutral is 14, and so on.\nGottman has taught his staff how to read every emotional nuance in people’s facial expressions and how to interpret seemingly ambiguous bits of dialogue. When they watch a marriage videotape, they assign a SPAFF code to every second of the couple’s interaction, so that a fifteen-minute conflict discussion ends up being translated into a row of eighteen hundred numbers—nine hundred for the husband and nine hundred for the wife. The notation “7, 7, 14, 10, 11, 11,” for instance, means that in one six-second stretch, one member of the couple was briefly angry, then neutral, had a moment of defensiveness, and then began whining. Then the data from the electrodes and sensors is factored in, so that the coders know, for example, when the husband’s or the wife’s heart was pounding or when his or her temperature was rising or when either of them was jiggling in his or her seat, and all of that information is fed into a complex equation.\nOn the basis of those calculations, Gottman has proven something remarkable. If he analyzes an hour of a husband and wife talking, he can predict with 95 percent accuracy whether that couple will still be married fifteen years later. If he watches a couple for fifteen minutes, his success rate is around 90 percent.\nRecently, a professor who works with Gottman named Sybil Carrère, who was playing around with some of the videotapes, trying to design a new study, discovered that if they looked at only three minutes of a couple talking, they could still predict with fairly impressive accuracy who was going to get divorced and who was going to make it. The truth of a marriage can be understood in a much shorter time than anyone ever imagined. Gladwell, M. (2006)\nSome key findings from Gottman’s years of research can be summed up by his description of the “Four Horseman of the Apocolypse” - the signs that a relationship is in danger. In this interview he describes the signs and gives examples and demonstrates using video clips of couples:  https://youtu.be/625t8Rr9o6o\nRecall he began with a code book of twenty distinctive emotions that can be conveyed during a conversation. How difficult would it be for you to accurately differentiate between all 20 emotions using only the video clip? How did Gottman develop this system?\nA math major at MIT before he switched to psychology, Gottman developed a coding system that not only tracked the content of speech but the emotional messages that spouses send with minute changes in expressions, vocal tone, and body language. Using facial recognition systems, Gottman’s code accounts for the fact that, for instance, in “coy, playful, or flirtatious interactions,” the lips are often turned down. “It looks like the person is working hard not to smile,” he writes. Conversely, “many ‘smiles’ involve upturned corners of the mouth but are often indices of negative affect.” [ Dissecting Gottman’s Love Lab: Slate Magazine ]\nNote that in this case, “using facial recognition systems” does not refer to computer algorithms, rather just training grad students to watch hours and hours of interviews!\nGottman’s contribution was figuring out how to systemetize data collection about marital conflict. He may have ended up with 20 emotional constructs that were coded in all of the studies, but he no doubt started with dozens more ideas that were intractable to operationalize or not predictive of the outcomes. The list was eventually narrowed to 20, and time was spent on improving the coding protocols so data could be collected consistently.\nTo use the language of data science, Gottman went through a process of feature selection - identifying a set of meaningful variables that have the potential to predict the outcome of interest and looking for which are most useful. Loosely speaking, the more highly correlated a feature (variable) is with the outcome, the better a predictor it will be.\nPart 1 Questions:\n\nGottman’s lab records 15-minute videos of each couple, which sounds like a small amount of data relative to some of the other case studies we have examined this semester. How many data points are generated from those 15 minutes of footage, though? Stated differently, how many observations do the lab scientists record in each 15-minute interview?\nWhat is the measured outcome in the study described by Gladwell? How would that data be collected? And consequently how long did these studies take? Note that in machine learning jargon this would be called the “training dataset” since it includes outcomes that are used to train computer models which features are useful to accurately predict the outcome. The calibrated models can then be used to predict outcomes using data that does not include results.\nDo you think that a marriage counselor working with couples for 30 years would be able to accurately predict those that will get divorced after a 15-minute session 95 percent of the time, relying on intuition from practice alone? What was unique about Gottman’s approach that allowed him to achieve that kind of accuracy?\n\nPart 2: Predicting Home Values\nThe hard part of feature selection is that it’s always fairly easy to generate a large list of candidate variables, and often the only way to know which actually work is to test them all. It is typically hard to predict which variables might be more predictive before collecting data and testing them out.\nConsider the project to reduce harmful levels of binge drinking by youth in Iceland. To identify some key causes of binge drinking they developed literally hundreds of theories, and tested as many as they could. Some explanations were still unexpected:\nThe team has analyzed 99,000 questionnaires from places as far afield as the Faroe Islands, Malta and Romania—as well as South Korea and, very recently, Nairobi and Guinea-Bissau. Broadly, the results show that when it comes to teen substance use, the same protective and risk factors identified in Iceland apply everywhere. There are some differences: in one location (in a country “on the Baltic Sea”), participation in organized sport actually emerged as a risk factor. Further investigation revealed that this was because young ex-military men who were keen on muscle-building drugs, drinking and smoking were running the clubs. Here, then, was a well-defined, immediate, local problem that could be addressed.\nData scientists have grown very skilled at using data to predict home values before houses are listed for sale. Zillow’s median national error rate is under 4%, for example, meaning that more than half of their predictions about home values are within 4% of true selling prices. They are becoming so accurate that Zillow is experimenting with a new service of buying homes based upon their estimates and re-selling them on their platform without realtors ever being involved in order to bypass the painful process of spending 6 months in a house that is for sale.\nHow does Zillow do this? Which variables or features are the best predictors of home values? The variables (“features”) that Zillow uses in their model are reported below. Can you guess the three factors that are most predictive of home value just by reading the list?\nFeature Description\nLabel Description\n\n\n\n\n\n\n\nLabel\nDescription\n\n\n\n\nairconditioningtypeid\nType of cooling system present in the home (if any)\n\n\narchitecturalstyletypeid\nArchitectural style of the home (i.e. ranch, colonial, split-level, etc…)\n\n\nbasementsqft\nFinished living area below or partially below ground level\n\n\nbathroomcnt\nNumber of bathrooms in home including fractional bathrooms\n\n\nbedroomcnt\nNumber of bedrooms in home\n\n\nbuildingqualitytypeid\nOverall assessment of condition of the building from best (lowest) to worst (highest)\n\n\nbuildingclasstypeid\nThe building framing type (steel frame, wood frame, concrete/brick)\n\n\ncalculatedbathnb\nNumber of bathrooms in home including fractional bathroom\n\n\ndecktypeid\nType of deck (if any) present on parcel\n\n\nthreequarterbathnbr\nNumber of 3/4 bathrooms in house (shower + sink + toilet)\n\n\nfinishedfloor1squarefeet\nSize of the finished living area on the first (entry) floor of the home\n\n\ncalculatedfinishedsquarefeet\nCalculated total finished living area of the home\n\n\nfinishedsquarefeet6\nBase unfinished and finished area\n\n\nfinishedsquarefeet12\nFinished living area\n\n\nfinishedsquarefeet13\nPerimeter living area\n\n\nfinishedsquarefeet15\nTotal area\n\n\nfinishedsquarefeet50\nSize of the finished living area on the first (entry) floor of the home\n\n\nfips\nFederal Information Processing Standard code - see https://en.wikipedia.org/wiki/FIPS_county_code for more details\n\n\nfireplacecnt\nNumber of fireplaces in a home (if any)\n\n\nfireplaceflag\nIs a fireplace present in this home\n\n\nfullbathcnt\nNumber of full bathrooms (sink, shower + bathtub, and toilet) present in home\n\n\ngaragecarcnt\nTotal number of garages on the lot including an attached garage\n\n\ngaragetotalsqft\nTotal number of sqft of all garages on lot including an attached garage\n\n\nhashottuborspa\nDoes the home have a hot tub or spa\n\n\nheatingorsystemtypeid\nType of home heating system latitude Latitude of the middle of the parcel multiplied by 10e6\n\n\nlongitude\nLongitude of the middle of the parcel multiplied by 10e6\n\n\nlotsizesquarefeet\nArea of the lot in square feet\n\n\nnumberofstories\nNumber of stories or levels the home has\n\n\nparcelid\nUnique identifier for parcels (lots)\n\n\npoolcnt\nNumber of pools on the lot (if any)\n\n\npoolsizesum\nTotal square footage of all pools on property\n\n\npooltypeid10\nSpa or Hot Tub\n\n\npooltypeid2\nPool with Spa/Hot Tub\n\n\npooltypeid7\nPool without hot tub\n\n\npropertycountylandusecode\nCounty land use code i.e. it’s zoning at the county level\n\n\npropertylandusetypeid\nType of land use the property is zoned for\n\n\npropertyzoningdesc\nDescription of the allowed land uses (zoning) for that property\n\n\nrawcensustractandblock\nCensus tract and block ID combined - also contains blockgroup assignment by extension\n\n\ncensustractandblock\nCensus tract and block ID combined - also contains blockgroup assignment by extension\n\n\nregionidcounty\nCounty in which the property is located\n\n\nregionidcity\nCity in which the property is located (if any)\n\n\nregionidzip\nZip code in which the property is located\n\n\nregionidneighborhood\nNeighborhood in which the property is located\n\n\nroomcnt\nTotal number of rooms in the principal residence\n\n\nstorytypeid\nType of floors in a multi-story house (i.e. basement and main level, split-level, attic, etc.). See tab for details.\n\n\ntypeconstructiontypeid\nWhat type of construction material was used to construct the home\n\n\nunitcnt\nNumber of units of structure (i.e. 2 = duplex, 3 = triplex, etc…)\n\n\nyardbuildingsqft17\nPatio in yard\n\n\nyardbuildingsqft26\nStorage shed/building in yard\n\n\nyearbuilt\nThe Year the principal residence was built\n\n\ntaxvaluedollarcnt\nThe total tax assessed value of the parcel\n\n\nstructuretaxvaluedollarcnt\nThe assessed value of the built structure on the parcel\n\n\nlandtaxvaluedollarcnt\nThe assessed value of the land area of the parcel\n\n\ntaxamount\nThe total property tax assessed for that assessment year\n\n\nassessmentyear\nThe year of the property tax assessment\n\n\ntaxdelinquencyflag\nProperty taxes for this parcel are past due as of 2015\n\n\ntaxdelinquencyyear\nYear for which the unpaid property taxes were due\n\n\n\nZillow tracks an impressive amount of data on homes, but this dataset is far from exhaustive. If we wanted to improve their models by adding new data, which features of homes and neighborhoods would you propose?\nPart 2 Questions:\nIn order to demonstrate what a feature selection exercise might look like, part 2 of this lab asks you to come up with one variable that predicts home value that is not included in the Zillow dataset.\nThe obvious features are already in the data - square footage, number of bedrooms, whether there is a garage and a pool, etc. You need to be a little creative to come up with another feature.\nNote that features in this case might be characteristics of houses themselves, but they also might be characteristics of neighborhoods or cities. These broader characteristics of the community, positive or negative, are baked into the selling price of the home.\nYou only need to think up one other variable that impacts home values. There are hundreds more. In order to verify whether your hunch was correct, you need to identify an academic article or peer-reviewed report that supports your claim. I suggest using Google Scholar, but many search engines work fine.\nFor example, I might hypothesize that high-end restaurants or coffee shops increase the value of homes. After a little searching I found this study: Measuring Gentrification: Using Yelp Data to Quantify Neighborhood Change. It finds that “changes in the local business landscape is a leading indicator of housing price changes. Each additional Starbucks that enters a zip code is associated with a 0.5% increase in housing prices.”\nAfter identifying the feature and an academic source to support its selection, write a paragraph about your predictor and how you think it will impact home values. Include a citation to the article that supports your claim. Combine your response with your answers to part 1 and submit them to iCollege to the Lab2 assignment folder by the due date.", "crumbs": [ - "Module 2 - Types of big data", - "2.1 Social Data" + "Lab 2 Open Data and Discovery" ] }, { - "objectID": "modules/module2-3.html#read", - "href": "modules/module2-3.html#read", - "title": "2.3", - "section": "Read", - "text": "Read", + "objectID": "assignments/lab4.html", + "href": "assignments/lab4.html", + "title": "Lab 4 Bias in Modeling", + "section": "", + "text": "The final lab of the semester dives deeper into bias in machine learning. It has four parts with bolded questions for you to answer in each part. It begins with an activity asking you to crop a series of photos.\nPart I\n\nYou are uploading photos to a social media site and need to crop them so the posts will upload quickly.\nThe three images below need cropping to 2”x 2”.\nCrop images the images by 1) clicking on the photo; 2) select the “picture format” tab; and 3) select “crop” and move the borders of the photo to include the part of the image that you want to upload to your social media account. (DO NOT SHRINK the size of the image on the page before cropping.)\nAfter cropping, answer these questions for each photo:\n\nHow did you decide what to keep in the cropped image? Why?\nWhen we crop something out of a picture, it never gets seen by your audience. Look back at the photos you cropped. What or who got left out?\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNow imagine we recorded how everyone in the class cropped the images above. We could use that information to train a model to crop other photos being uploaded to the social media site.\n\nHow might the cropping data from our classroom be biased?\nWhat are some ways we could address the biases in our data?\n\nPart 2\nIt turns out that the issue of how to crop an image is something social media platforms have been working on for some time. A well document attempt was when Twitter used machine learning to train an algorithm to do this cropping. Watch the video “Are We Automating Racism?” (23 minutes) and answer the following questions:\n\nHow was the Twitter cropping algorithm trained?\nAccording to the video, where is a potential source of bias when training similar cropping algorithms?\n\nPart 3\nIt didn’t take long for users of Twitter’s autocropping feature to notice that it was biasing White faces over Black ones and gender-based biases. Read this study from Twitter researchers investigating the claims:\nKyra Yee, Uthaipon Tantipongpipat, and Shubhanshu Mishra. 2021. Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency.\nBriefly describe the results of the first two research questions:\n\nTo what extent, if any, did Twitter’s image cropping have disparate impact (i.e. systematically favor cropping) people on racial or gendered lines?\nWhat were some of the factors that caused systematic disparate impact of the Twitter image cropping model?\n\nLastly,\n\nIf you were the CEO of Twitter and found evidence of this bias in your cropping algorithm, how would you respond? What steps would you take and why?\n\nAs a review, it’s important to understand the types bias that can result from machine learning (and many other data-driven functions). This explanation comes from How Artificial Intelligence Can Support Healthcare, University of Groningen (n.d).\nFirst, bias is a phenomenon that occurs when the machine learning model systemically produces prejudiced results. It can be caused by bad quality or wrong example data, which is called representational bias, or due to choices made in algorithm development, called procedural bias. Both of these sources of bias could result in incorrect predictions by the AI model, which in turn can lead to dangerous situations, such as patients receiving the wrong treatment.\nRepresentational bias\nIn machine learning, the general rule is: “Garbage In, Garbage Out”. This means that if your machine is trained on wrong data, the model will not be able to produce accurate results. For this reason, it’s extremely important to consider whether your data contains any possible biases. A few of the most common biases will be discussed, along with solutions to prevent them from occurring.\nHistorical bias. This type of bias is a consequence of existing biases in society and is therefore also known as cultural bias. The data is filled with stereotypes that exist in real life. For example, Google Translate learns from existing translations from the web. However, these translations were often very biased with regard to gender. For example, “doctor” would usually be assumed male, whereas “nurse” would be assumed female. This type of bias can be prevented by examining the data first and looking for existing prejudices. If they exist, more examples could be required to reflect society more accurately. Another solution by Google for this situation was to return both a masculine and feminine translation.\nSample bias. This occurs when the collected data is unbalanced and does not accurately represent the population the machine is supposed to be used for. When a machine learning model is supposed to recognize both benign and malignant nodules in a thoracic X-ray, it’s not sufficient to only train it with X-rays containing benign nodules. A solution is to examine the data for an even distribution of the cases among features and checking if your dataset works well on an evenly distributed test set. More training examples could be required if this is not the case. This can also be done artificially with the use of data augmentation. Data augmentation consists of techniques that help to increase your dataset synthetically by adding slightly modified copies of the existing examples in your dataset.\nExclusion bias. This happens when the developer of the algorithm decides to remove features or particular instances from the dataset because they believe them to be irrelevant for the problem at hand, even though they were of value. For example, a developer might believe that a feature addressing the patient’s blood pressure is irrelevant for predicting the likelihood that the patient will develop Alzheimer’s disease. However, this actually is a good indicator, especially in combination with other factors such as cholesterol levels. Prematurely removing such valuable information can be prevented by performing a proper investigation of the features and data points and their relation to the prediction that will be made beforehand, and asking someone else to take a look at the use of the features and data points before removing them.\nMeasurement bias. This happens when the values of particular features are poorly measured. For example, measuring instruments might be faulty, which might result in skewed data. Solutions include calibrating the instruments before use and using multiple measuring devices.\nLabeling bias. This type of bias happens when the annotator does not label the data accurately due to subjective perceptions. For example, one might want to detect lung nodules in CT scans. Whereas one radiologist might classify a particular growth shown in these scans as a nodule, another might not classify it as such due to a different conception of the requirements of such a nodule (such as the minimum diameter). Common methodologies to solve this problem are the use of labeling guidelines and/or having multiple experts provide the labels and to have them reach a consensus when they have different opinions. When a large number of experts is available, a majority vote for the right label could also be used.\nProcedural Bias\nThe choices the developer makes during the process of algorithm development are also able to affect the output significantly.\nConfirmation bias. Developers tend to choose particular models and hyperparameters that align more closely with their preconceived beliefs or hypotheses, even though it might not be the more representative model. An example of this is when a developer previously witnessed that a decision tree was able to predict very well whether or not a doctor should apply antibiotics in case of a fever. Therefore, he decides to use such a decision tree for all the problems he must create solutions for afterwards. He does this without even considering other algorithms, which might be better suited for the data or problem at hand. This confirmation bias can be prevented by involving independent critics, or by allowing for a direct comparison of models by making the used database open source.\nAssociation bias. This occurs when a machine learning model is built to amplify an existing bias. A well-known example is PredPol’s drug crime prediction algorithm. This algorithm was trained on data biased by housing segregation and police bias. Because of that, it would send police more frequently to a neighborhood where a lot of minorities live, resulting in more drug arrests there. That arrest data was fed back into the algorithm, which again trained on these new examples, resulting in a positive feedback loop. Preventing this can be done by monitoring how the data is processed closely.\nThese examples cover only a small part of the full range of possible biases in machine learning. For this reason, you should always be critical about both your data and the algorithm development when implementing artificial intelligence. Several methodologies have been developed over the past years to help to critically assess the dataset used (Datasheets for Datasets) and to provide proper information to allow assessments of models by clinical end users (Model Cards). Both inject more transparency into the algorithm development process and could improve bias in machine learning and AI broadly if adopted voluntarily by organizations or required by governments.", "crumbs": [ - "Module 2 - Types of big data", - "2.3" + "Lab 4 Bias in Modeling" ] }, { - "objectID": "modules/module2-3.html#post-discussion-1", - "href": "modules/module2-3.html#post-discussion-1", - "title": "2.3", - "section": "Post Discussion 1", - "text": "Post Discussion 1\nAddress the following:\n\nDescribe at least two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data from the readings this week that I can address in my weekly announcement?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in text using APA style. Posts should range between 400-500 words.\nDue by: 8/27 11:59 pm EST", + "objectID": "discussions/M1-1.html", + "href": "discussions/M1-1.html", + "title": "Discussion 1", + "section": "", + "text": "Post\nAddress the following:\n\nDescribe at least two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data from the readings this week that I can address in my weekly announcement?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in text using APA style. Posts should range between 400-500 words.\nDue by: 8/27 11:59 pm EST\nSee here for Rubrics", "crumbs": [ - "Module 2 - Types of big data", - "2.3" + "Discussion 1" ] }, { - "objectID": "modules/module2-0.html", - "href": "modules/module2-0.html", - "title": "Module Overview", + "objectID": "discussions/M1-1.html#m1.1-the-value-of-big-data", + "href": "discussions/M1-1.html#m1.1-the-value-of-big-data", + "title": "Discussion 1", "section": "", - "text": "The “variety” of big data is one of its defining features. This module explores the types of data being generated and some of its use cases for the public sector.", + "text": "Post\nAddress the following:\n\nDescribe at least two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data from the readings this week that I can address in my weekly announcement?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in text using APA style. Posts should range between 400-500 words.\nDue by: 8/27 11:59 pm EST\nSee here for Rubrics", "crumbs": [ - "Module 2 - Types of big data" + "Discussion 1" ] }, { - "objectID": "modules/module2-0.html#introduction", - "href": "modules/module2-0.html#introduction", - "title": "Module Overview", + "objectID": "discussions/M2-2.html", + "href": "discussions/M2-2.html", + "title": "Discussion 3", "section": "", - "text": "The “variety” of big data is one of its defining features. This module explores the types of data being generated and some of its use cases for the public sector.", + "text": "Post\nAddress the following:\n\nThe readings from the last two submodules (2.1 and 2.2) describe use cases for social data, surveillance systems, satellite and aerial imagery, and other types of sensing data (e.g., biometric, IOT, etc.). Describe what promise these types of data hold for the public sector in your interest areas. What concerns do you have about the use of these data for the purposes you describe or otherwise?\nRevisit the research question that you’re exploring for your big data project proposal. Describe types of data from the last two submodules (2.1 and 2.2) that you hadn’t considered before but could incorporate into your proposal. Describe how they might improve your project and their limitations.\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words. \nDue by: 9/24 11:59 pm EST\nSee here for Rubrics", "crumbs": [ - "Module 2 - Types of big data" + "Discussion 3" ] }, { - "objectID": "modules/module2-0.html#content", - "href": "modules/module2-0.html#content", - "title": "Module Overview", - "section": "Content", - "text": "Content\n\n\n\nSection\nAssignment\nDue Date\n\n\n\n\n2.1 Social Data\nLab 1\n9/21\n\n\n2.2 Eyes in the Skies & Remote Sensing\nDiscussion Post\n9/24", + "objectID": "discussions/M2-2.html#m2.2-eyes-in-the-skies", + "href": "discussions/M2-2.html#m2.2-eyes-in-the-skies", + "title": "Discussion 3", + "section": "", + "text": "Post\nAddress the following:\n\nThe readings from the last two submodules (2.1 and 2.2) describe use cases for social data, surveillance systems, satellite and aerial imagery, and other types of sensing data (e.g., biometric, IOT, etc.). Describe what promise these types of data hold for the public sector in your interest areas. What concerns do you have about the use of these data for the purposes you describe or otherwise?\nRevisit the research question that you’re exploring for your big data project proposal. Describe types of data from the last two submodules (2.1 and 2.2) that you hadn’t considered before but could incorporate into your proposal. Describe how they might improve your project and their limitations.\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words. \nDue by: 9/24 11:59 pm EST\nSee here for Rubrics", "crumbs": [ - "Module 2 - Types of big data" + "Discussion 3" ] }, { @@ -1658,7 +432,7 @@ "href": "discussions/M4-1.html", "title": "Discussion 5", "section": "", - "text": "Post\nAddress the following:\n\nPoor data quality and data practices can lead to downstream bias that impacts individuals and decision-making. What examples from the readings (or your own reading) do you consider most harmful? Overblown?\nGovernments around the world are developing frameworks for improving data integrity and reducing bias. Based on your own experience as a user and/or analyst of data, what elements of the framework do you think are least understood or practiced by researchers/government agencies? How might they improve?\nFind an example of a recent data quality problem in the news that resulted in bias from big data. What was the origin of the bias and did the media address how to prevent it? Describe what you find surprising about the problem. (Make sure to cite the article.)\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words. \nDue by: 10/15 11:59 pm EST\nSee here for Rubrics", + "text": "Post\nAddress the following:\n\nPoor data quality and data practices can lead to downstream bias that impacts individuals and decision-making. What examples from the readings (or your own reading) do you consider most harmful? Overblown?\nGovernments around the world are developing frameworks for improving data integrity and reducing bias. Based on your own experience as a user and/or analyst of data, what elements of the framework do you think are least understood or practiced by researchers/government agencies? How might they improve?\nFind an example of a recent data quality problem in the news that resulted in bias from big data. What was the origin of the bias and did the media address how to prevent it? Describe what you find surprising about the problem. (Make sure to cite the article.)\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words.\nDue by: 10/15 11:59 pm EST\nSee here for Rubrics", "crumbs": [ "Discussion 5" ] @@ -1668,80 +442,122 @@ "href": "discussions/M4-1.html#m4.1-data-quality", "title": "Discussion 5", "section": "", - "text": "Post\nAddress the following:\n\nPoor data quality and data practices can lead to downstream bias that impacts individuals and decision-making. What examples from the readings (or your own reading) do you consider most harmful? Overblown?\nGovernments around the world are developing frameworks for improving data integrity and reducing bias. Based on your own experience as a user and/or analyst of data, what elements of the framework do you think are least understood or practiced by researchers/government agencies? How might they improve?\nFind an example of a recent data quality problem in the news that resulted in bias from big data. What was the origin of the bias and did the media address how to prevent it? Describe what you find surprising about the problem. (Make sure to cite the article.)\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words. \nDue by: 10/15 11:59 pm EST\nSee here for Rubrics", + "text": "Post\nAddress the following:\n\nPoor data quality and data practices can lead to downstream bias that impacts individuals and decision-making. What examples from the readings (or your own reading) do you consider most harmful? Overblown?\nGovernments around the world are developing frameworks for improving data integrity and reducing bias. Based on your own experience as a user and/or analyst of data, what elements of the framework do you think are least understood or practiced by researchers/government agencies? How might they improve?\nFind an example of a recent data quality problem in the news that resulted in bias from big data. What was the origin of the bias and did the media address how to prevent it? Describe what you find surprising about the problem. (Make sure to cite the article.)\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words.\nDue by: 10/15 11:59 pm EST\nSee here for Rubrics", "crumbs": [ "Discussion 5" ] }, { - "objectID": "discussions/M2-2.html", - "href": "discussions/M2-2.html", - "title": "Discussion 3", + "objectID": "discussions/M5-2.html", + "href": "discussions/M5-2.html", + "title": "Discussion 7", "section": "", - "text": "Post\nAddress the following:\n\nThe readings from the last two submodules (2.1 and 2.2) describe use cases for social data, surveillance systems, satellite and aerial imagery, and other types of sensing data (e.g., biometric, IOT, etc.). Describe what promise these types of data hold for the public sector in your interest areas. What concerns do you have about the use of these data for the purposes you describe or otherwise?\nRevisit the research question that you’re exploring for your big data project proposal. Describe types of data from the last two submodules (2.1 and 2.2) that you hadn’t considered before but could incorporate into your proposal. Describe how they might improve your project and their limitations.\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words. \nDue by: 9/24 11:59 pm EST\nSee here for Rubrics", + "text": "Post\nAddress the following:\n\nDescribe two differences between US and EU data privacy laws. What are the implications of these differences for safeguarding your personal information?\nAs we’ve learned this semester, your personal data can be used by different entities for economic gain as well as improvements to how we live, work, and play. If you could choose how your data are used, what would you permit versus prohibit and why?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words.\nDue by: 11/16 11:59 pm EST\nSee here for Rubrics", "crumbs": [ - "Discussion 3" + "Discussion 8" ] }, { - "objectID": "discussions/M2-2.html#m2.2-eyes-in-the-skies", - "href": "discussions/M2-2.html#m2.2-eyes-in-the-skies", - "title": "Discussion 3", + "objectID": "discussions/M5-2.html#m5.2-data-stewardship", + "href": "discussions/M5-2.html#m5.2-data-stewardship", + "title": "Discussion 7", "section": "", - "text": "Post\nAddress the following:\n\nThe readings from the last two submodules (2.1 and 2.2) describe use cases for social data, surveillance systems, satellite and aerial imagery, and other types of sensing data (e.g., biometric, IOT, etc.). Describe what promise these types of data hold for the public sector in your interest areas. What concerns do you have about the use of these data for the purposes you describe or otherwise?\nRevisit the research question that you’re exploring for your big data project proposal. Describe types of data from the last two submodules (2.1 and 2.2) that you hadn’t considered before but could incorporate into your proposal. Describe how they might improve your project and their limitations.\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words. \nDue by: 9/24 11:59 pm EST\nSee here for Rubrics", + "text": "Post\nAddress the following:\n\nDescribe two differences between US and EU data privacy laws. What are the implications of these differences for safeguarding your personal information?\nAs we’ve learned this semester, your personal data can be used by different entities for economic gain as well as improvements to how we live, work, and play. If you could choose how your data are used, what would you permit versus prohibit and why?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words.\nDue by: 11/16 11:59 pm EST\nSee here for Rubrics", "crumbs": [ - "Discussion 3" + "Discussion 8" ] }, { - "objectID": "discussions/M3-2.html", - "href": "discussions/M3-2.html", - "title": "Discussion 4", + "objectID": "modules/module1-0.html", + "href": "modules/module1-0.html", + "title": "Module Overview", "section": "", - "text": "Post\nAt this point in the semester, you should have identified the topic of your big data research proposal and have some idea of what data you will use. \nFor this discussion, you should create an annotated bibliography to support the Literature Review of your research proposal. Hopefully you’ve already been reading studies related to your topic and the methods used in them. The literature review requires you to describe this research and how your project using big data fills a gap in what is known.\nThese instructions for the proposal detail what sections to include and the requirements of each section. Make sure that you **upload the annotated bibliography in a Word Document in the post.**\nThe annotated bibliography should start with a short paragraph that states your research question and why it’s important. Next, I would like you to 1) create a list of 7-10 articles/reports that you will include in your literature review; 2) for each work write 4-6 sentences that describes its findings and how it relates to your topic; and 3) order them to demonstrate how they will relate to each other when you write the literature review section of your proposal. \nDue by: 10/1 at 11:59pm EST\nRespond\nYou will be assigned to small groups for this discussion post. Read two of your group member’s annotated bibliographies and provide constructive comments on strengths and weaknesses. Please make sure that everyone in your group gets at least one set of comments.\nDue by: 10/5 at 11:59pm\nSee here for Rubrics", + "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.", "crumbs": [ - "Discussion 4" + "Module 1 - Introduction to Big Data" ] }, { - "objectID": "discussions/M3-2.html#m3.2-annotated-bibliography", - "href": "discussions/M3-2.html#m3.2-annotated-bibliography", - "title": "Discussion 4", + "objectID": "modules/module1-0.html#introduction", + "href": "modules/module1-0.html#introduction", + "title": "Module Overview", "section": "", - "text": "Post\nAt this point in the semester, you should have identified the topic of your big data research proposal and have some idea of what data you will use. \nFor this discussion, you should create an annotated bibliography to support the Literature Review of your research proposal. Hopefully you’ve already been reading studies related to your topic and the methods used in them. The literature review requires you to describe this research and how your project using big data fills a gap in what is known.\nThese instructions for the proposal detail what sections to include and the requirements of each section. Make sure that you **upload the annotated bibliography in a Word Document in the post.**\nThe annotated bibliography should start with a short paragraph that states your research question and why it’s important. Next, I would like you to 1) create a list of 7-10 articles/reports that you will include in your literature review; 2) for each work write 4-6 sentences that describes its findings and how it relates to your topic; and 3) order them to demonstrate how they will relate to each other when you write the literature review section of your proposal. \nDue by: 10/1 at 11:59pm EST\nRespond\nYou will be assigned to small groups for this discussion post. Read two of your group member’s annotated bibliographies and provide constructive comments on strengths and weaknesses. Please make sure that everyone in your group gets at least one set of comments.\nDue by: 10/5 at 11:59pm\nSee here for Rubrics", + "text": "This module introduces you to the opportunities and challenges of big data for the public sector. You’ll learn the characteristics that define big data and begin to think about how to apply it to your own field of study.", + "crumbs": [ + "Module 1 - Introduction to Big Data" + ] + }, + { + "objectID": "modules/module1-0.html#content", + "href": "modules/module1-0.html#content", + "title": "Module Overview", + "section": "Content", + "text": "Content\n\n\n\n\n\n\n\n\nSection\nAssignment\nDue Date\n\n\n\n\n1.1 The Value of Big Data\nDiscussion Post\n8/27\n\n\n1.2 Big Data and You\nData Certification Plan\n9/3\n\n\n1.3 The Challenges of Big Data\nDiscussion Post & Peer Response\n9/10 & 9/14", + "crumbs": [ + "Module 1 - Introduction to Big Data" + ] + }, + { + "objectID": "modules/module1-2.html#read", + "href": "modules/module1-2.html#read", + "title": "1.2 Big Data and You", + "section": "Read", + "text": "Read\n\nThe Data Science Process: A Visual Guide to Standard Procedures in Data Science by Chanin Nantasenamat\nHilary: the most poisoned baby name in US history by Hilary Parker", + "crumbs": [ + "Module 1 - Introduction to Big Data", + "1.2 Big Data and You" + ] + }, + { + "objectID": "modules/module1-2.html#watch", + "href": "modules/module1-2.html#watch", + "title": "1.2 Big Data and You", + "section": "Watch", + "text": "Watch\n\nHow I Would Learn Data Science (If I Had to Start Over) by Ken Jee", + "crumbs": [ + "Module 1 - Introduction to Big Data", + "1.2 Big Data and You" + ] + }, + { + "objectID": "modules/module1-2.html#complete", + "href": "modules/module1-2.html#complete", + "title": "1.2 Big Data and You", + "section": "Complete", + "text": "Complete\nThis week you’ll spend most of your time creating a plan to earn a data certification by the end of the semester. This assignment requires that you identify a research question and develop a plan to complete workshops over the semester that will expose you to the data science knowledge and skills needed to develop a big data project. By the end of the semester, you’ll submit a proposal that describes how you can answer the research question using big data and data science techniques.\nYou may be a novice to big data terms and sources at this point in the semester, but you probably have questions in your field that can be explored with data. For example, you may be interested in pollution levels of major cities and what conditions lead to poor air quality days. While structured data like measures of ozone levels, particulate matter, and nitrogen oxides can be used to identify poor air quality, big data generated from the sources of pollutants can be used to predict poor air quality days.  Vehicles, traffic lights, navigation apps, satellites, etc. generate high velocity, high volume, and unstructured data that can be harnessed to predict poor air quality days. With these data, you can explore questions like, “What impact could limiting semi-trucks driving on I-75 between 6am and 6pm have on air quality in Atlanta?” The purpose of the data certification is to learn about the types of big data and analysis techniques needed to answer the question that you formulate from your field of interest.\nFormulating a good research question takes time and is iterative as you learn about types of data and techniques. This exercise gets you started on crafting a question and finding tutorials that can help you develop a project plan to answer it. It requires that you identify five, 90 minute tutorials that can tailor your learning about big data to answer the research question that you develop. To complete the assignment for this week:\n\nRead the instructions for the data certification plan.\nIf you need some inspiration, Google “big data and X (your field).” It’s also a good idea to look at some academic journals in your field to see how the techniques of data science are being applied.\nWrite me an email or set up a time to talk if you need help.\nSubmit your data certification plan to corresponding assignment folder.\n\nDue by: 9/3 at 11:59pm", "crumbs": [ - "Discussion 4" + "Module 1 - Introduction to Big Data", + "1.2 Big Data and You" ] }, { - "objectID": "discussions/M5-2.html", - "href": "discussions/M5-2.html", - "title": "Discussion 6", + "objectID": "modules/module2-0.html", + "href": "modules/module2-0.html", + "title": "Module Overview", "section": "", - "text": "Post\nAddress the following:\n\nDescribe two differences between US and EU data privacy laws. What are the implications of these differences for safeguarding your personal information?\nAs we’ve learned this semester, your personal data can be used by different entities for economic gain as well as improvements to how we live, work, and play. If you could choose how your data are used, what would you permit versus prohibit and why?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words. \nDue by: 11/16 11:59 pm EST\nSee here for Rubrics", + "text": "The “variety” of big data is one of its defining features. This module explores the types of data being generated and some of its use cases for the public sector.", "crumbs": [ - "Discussion 6" + "Module 2 - Types of big data" ] }, { - "objectID": "discussions/M5-2.html#m5.2-data-stewardship", - "href": "discussions/M5-2.html#m5.2-data-stewardship", - "title": "Discussion 6", + "objectID": "modules/module2-0.html#introduction", + "href": "modules/module2-0.html#introduction", + "title": "Module Overview", "section": "", - "text": "Post\nAddress the following:\n\nDescribe two differences between US and EU data privacy laws. What are the implications of these differences for safeguarding your personal information?\nAs we’ve learned this semester, your personal data can be used by different entities for economic gain as well as improvements to how we live, work, and play. If you could choose how your data are used, what would you permit versus prohibit and why?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words. \nDue by: 11/16 11:59 pm EST\nSee here for Rubrics", + "text": "The “variety” of big data is one of its defining features. This module explores the types of data being generated and some of its use cases for the public sector.", "crumbs": [ - "Discussion 6" + "Module 2 - Types of big data" ] }, { - "objectID": "modules/module1-1.html#post-discussion-m1.1", - "href": "modules/module1-1.html#post-discussion-m1.1", - "title": "1.1 The Value of Big Data", - "section": "Post Discussion M1.1", - "text": "Post Discussion M1.1\nAddress the following:\n\nDescribe at least two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data from the readings this week that I can address in my weekly announcement?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in text using APA style. Posts should range between 400-500 words.\nDue by: 8/27 11:59 pm EST", + "objectID": "modules/module2-0.html#content", + "href": "modules/module2-0.html#content", + "title": "Module Overview", + "section": "Content", + "text": "Content\n\n\n\nSection\nAssignment\nDue Date\n\n\n\n\n2.1 Social Data\nLab 1\n9/21\n\n\n2.2 Eyes in the Skies & Remote Sensing\nDiscussion Post\n9/24", "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.1 The Value of Big Data" + "Module 2 - Types of big data" ] }, { @@ -1749,269 +565,292 @@ "href": "modules/module2-2.html#read", "title": "2.2 Eyes in the Skies & Remote Sensing", "section": "Read", - "text": "Read\n\nMcKinsey Global Institute. “Smart Cities: Digital Solutions for a More Livable Future.” 2018. **Read the executive summary only (unless you want to read more!**)\nHumanitarian OpenStreetMap Team, What We Do and Our Work. n.d. **Browse site for applications to your field of interest.**\nFuture of Privacy Forum, “Understanding Facial Detection, Characterization, and Recognition Technologies,” 2018.", + "text": "Read\n\nMcKinsey Global Institute. “Smart Cities: Digital Solutions for a More Livable Future.” 2018. **Read the executive summary only (unless you want to read more!**)\nHumanitarian OpenStreetMap Team, What We Do and Our Work. n.d. **Browse site for applications to your field of interest.**\nFuture of Privacy Forum, “Understanding Facial Detection, Characterization, and Recognition Technologies,” 2018.", "crumbs": [ "Module 2 - Types of big data", "2.2 Eyes in the Skies & Remote Sensing" ] }, { - "objectID": "modules/module2-2.html#post-discussion-1", - "href": "modules/module2-2.html#post-discussion-1", - "title": "2.2", - "section": "Post Discussion 1", - "text": "Post Discussion 1\nAddress the following:\n\nDescribe at least two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data from the readings this week that I can address in my weekly announcement?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in text using APA style. Posts should range between 400-500 words.\nDue by: 8/27 11:59 pm EST", + "objectID": "modules/module2-2.html#post", + "href": "modules/module2-2.html#post", + "title": "2.2 Eyes in the Skies & Remote Sensing", + "section": "Post", + "text": "Post\nAddress the following in Discussion 3:\n\nThe readings from the last two submodules (2.1 and 2.2) describe use cases for social data, surveillance systems, satellite and aerial imagery, and other types of sensing data (e.g., biometric, IOT, etc.). Describe what promise these types of data hold for the public sector in your interest areas. What concerns do you have about the use of these data for the purposes you describe or otherwise?\nRevisit the research question that you’re exploring for your big data project proposal. Describe types of data from the last two submodules (2.1 and 2.2) that you hadn’t considered before but could incorporate into your proposal. Describe how they might improve your project and their limitations.\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in-text using APA style. Posts should range between 400-500 words.\nDue by: 9/24 11:59 pm EST", "crumbs": [ "Module 2 - Types of big data", "2.2 Eyes in the Skies & Remote Sensing" ] }, { - "objectID": "modules/module5-2.html#read", - "href": "modules/module5-2.html#read", - "title": "Module 5 - Data Privacy & Stewardship", + "objectID": "modules/module3-1.html#read", + "href": "modules/module3-1.html#read", + "title": "3.1 Open Data and Discovery", "section": "Read", - "text": "Read", + "text": "Read\n\nOpen Knowledge Network, The Open Data Handbook, n.d., (**Read the first three sections, Introduction through What is Open Data?**)\nNYC OpenData, Project Gallery (**View sample projects.**)\nS. Temiz, M. Holgersson, J. Björkdahl, M.W. Wallin.Open data: Lost opportunity or unrealized potential? Technovation, Volume 114, 2022.\nBourke, D. A Gentle Introduction to Exploratory Data Analysis. 2019", "crumbs": [ - "Module 5 - Data Privacy & Stewardship", - "5.2 Data Stewardship" + "Module 3 - Discovery & Insights", + "3.1 Open Data and Discovery" ] }, { - "objectID": "modules/module5-2.html#post-discussion-1", - "href": "modules/module5-2.html#post-discussion-1", - "title": "Module 5 - Data Privacy & Stewardship", - "section": "Post Discussion 1", - "text": "Post Discussion 1\nAddress the following:\n\nDescribe at least two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data from the readings this week that I can address in my weekly announcement?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in text using APA style. Posts should range between 400-500 words.\nDue by: 8/27 11:59 pm EST", + "objectID": "modules/module3-1.html#reflect", + "href": "modules/module3-1.html#reflect", + "title": "3.1 Open Data and Discovery", + "section": "Reflect", + "text": "Reflect\nFor the past several weeks you’ve exposed to types of big data that may be useful for your project proposal that’s due at the end of the semester. This week’s readings introduce you to open data that may be available for your project. Through the readings, you may learn about new repositories or sources of data for your project, including those made available through formal research.\nThis week you’re also being exposed to a very important part of a data project - exploratory data analysis (EDA). The blog by Bourke (2019) describes the key steps in exploratory data analysis that you’ll need to consider as you determine the focus and feasibility of your big data project. Although you won’t be responsible for manipulating or analyzing data for your project, you do need to know where you’ll get it (source), its type, structure, and features.", "crumbs": [ - "Module 5 - Data Privacy & Stewardship", - "5.2 Data Stewardship" + "Module 3 - Discovery & Insights", + "3.1 Open Data and Discovery" ] }, { - "objectID": "modules/module5-1.html#read", - "href": "modules/module5-1.html#read", - "title": "Module 5 - Data Privacy & Stewardship", - "section": "Read", - "text": "Read", + "objectID": "modules/module3-1.html#complete", + "href": "modules/module3-1.html#complete", + "title": "3.1 Open Data and Discovery", + "section": "Complete", + "text": "Complete\nThis week you’ll complete two tasks.\n\nThe first task is Lab 2. It’s a continuation of Lab 1 and explores “feature selection” in the data project life cycle. It asks you to consider the quality and types of data that are used in research and measurement. Lab 2 is due on Thursday, 9/28.\nThe second task is to read the instructions for your big data project and begin compiling existing research on the topic you’ve chosen. Reading peer-reviewed articles in academic journals or think tank reports can help you identify big data sources and how they’ve been used to explore your topic. The data and methods sections of research articles/reports should detail the source, type, structure, and variables used in the study, which will give you insight into data that you can use for your research question. You should compile this research into an annotated bibliography that you’ll use for the literature review in your big data project proposal. The annotated bibliography should include 7 - 10 works on your research topic that you’ll annotate and share with a peer discussion group. This discussion post will be due on Sunday, 10/1.\n\nDue by: Lab 2 - 9/28 11:59 pm EST; Discussion Post 3.3 - 10/1 11:59 pm EST.", "crumbs": [ - "Module 5 - Data Privacy & Stewardship", - "5.1 Data Privacy" + "Module 3 - Discovery & Insights", + "3.1 Open Data and Discovery" ] }, { - "objectID": "modules/module5-1.html#post-discussion-1", - "href": "modules/module5-1.html#post-discussion-1", - "title": "Module 5 - Data Privacy & Stewardship", - "section": "Post Discussion 1", - "text": "Post Discussion 1\nAddress the following:\n\nDescribe at least two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data from the readings this week that I can address in my weekly announcement?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in text using APA style. Posts should range between 400-500 words.\nDue by: 8/27 11:59 pm EST", + "objectID": "modules/module4-0.html", + "href": "modules/module4-0.html", + "title": "Module 4 - Bias in Big Data", + "section": "", + "text": "This module explores bias that results from poor data quality and algorithms using big data.", "crumbs": [ - "Module 5 - Data Privacy & Stewardship", - "5.1 Data Privacy" + "Module 4 - Bias in Big Data" ] }, { - "objectID": "modules/module4-1.html#read", - "href": "modules/module4-1.html#read", + "objectID": "modules/module4-0.html#introduction", + "href": "modules/module4-0.html#introduction", "title": "Module 4 - Bias in Big Data", - "section": "Read", - "text": "Read", + "section": "", + "text": "This module explores bias that results from poor data quality and algorithms using big data.", "crumbs": [ - "Module 4 - Bias in Big Data", - "4.1 Data Quality" + "Module 4 - Bias in Big Data" ] }, { - "objectID": "modules/module4-1.html#post-discussion-1", - "href": "modules/module4-1.html#post-discussion-1", + "objectID": "modules/module4-0.html#content", + "href": "modules/module4-0.html#content", "title": "Module 4 - Bias in Big Data", - "section": "Post Discussion 1", - "text": "Post Discussion 1\nAddress the following:\n\nDescribe at least two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data from the readings this week that I can address in my weekly announcement?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in text using APA style. Posts should range between 400-500 words.\nDue by: 8/27 11:59 pm EST", + "section": "Content", + "text": "Content\n\n\n\nSection\nAssignment\nDue Date\n\n\n\n\n4.1 Data Quality\nDiscussion Post\n10/15\n\n\n4.2 Bias in Modeling\nDiscussion Post\n10/22\n\n\n4.2 Bias in Modeling\nDiscussion Response\n10/26\n\n\n4.2 Bias in Modeling\nLab 4\n11/2", "crumbs": [ - "Module 4 - Bias in Big Data", - "4.1 Data Quality" + "Module 4 - Bias in Big Data" ] }, { - "objectID": "modules/module3-1.html#read", - "href": "modules/module3-1.html#read", - "title": "3.1 Open Data and Discovery", + "objectID": "modules/module4-2.html#read", + "href": "modules/module4-2.html#read", + "title": "4.2 Bias in Modeling", "section": "Read", - "text": "Read\n\nOpen Knowledge Network, The Open Data Handbook, n.d., (**Read the first three sections, Introduction through What is Open Data?**)\nNYC OpenData, Project Gallery (**View sample projects.**)\nS. Temiz, M. Holgersson, J. Björkdahl, M.W. Wallin.Open data: Lost opportunity or unrealized potential? Technovation, Volume 114, 2022.\nBourke, D. A Gentle Introduction to Exploratory Data Analysis. 2019", + "text": "Read\n\nO’Neil, L. (2023, August 12). These Women Warned Of AI’s Dangers And Risks Long Before ChatGPT. Rolling Stone. **Read the article and choose one of the featured women to further explore her work (as linked throughout the article).**\nHendrycks, D. et al. 2023. An Overview of Catastrophic AI Risks. Center for AI Safety. **You can also read an overview here.**\nObermeyer et al. 2021. Algorithmic Bias Playbook. Chicago Booth School of Business.", "crumbs": [ - "Module 3 - Discovery & Insights", - "3.1 Open Data and Discovery" + "Module 4 - Bias in Big Data", + "4.2 Bias in Modeling" ] }, { - "objectID": "modules/module3-1.html#post-discussion-1", - "href": "modules/module3-1.html#post-discussion-1", - "title": "3.1 Open Data and Discovery", - "section": "Post Discussion 1", - "text": "Post Discussion 1\nAddress the following:\n\nDescribe at least two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data from the readings this week that I can address in my weekly announcement?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in text using APA style. Posts should range between 400-500 words.\nDue by: 8/27 11:59 pm EST", + "objectID": "modules/module4-2.html#post", + "href": "modules/module4-2.html#post", + "title": "4.2 Bias in Modeling", + "section": "Post", + "text": "Post\nAddress the following in Discussion 6:\nO’Neil’s article (2023) reports on the voices of five women from tech fields who have been exposing problems with bias and discrimination in algorithms for years. Hendrycks and colleagues (2023) released a report earlier this month warning that AI poses  catastrophic risks to humanity.  Answer the following:\n\nAfter reading the O’Neil article, explore further the work of either Buolamwini, Chowdhury, Gangadharan, Gebru, or Noble. Describe the site/content you reviewed, why you chose it, and what you learned.\nThe report by Hendrycks and colleagues (2023) underpins the AI risks of the “AI Doomers” as described in the O’Neil article (2023). One of the signers of the Statement on AI risks is Geoffry Hinton, an Emeritus Professor of Computer Science at the University of Toronto. He is quoted by O’Neil as saying “I believe that the possibility that digital intelligence will become much smarter than humans and will replace us as the apex intelligence is a more serious threat to humanity than bias and discrimination, even though bias and discrimination are happening now and need to be confronted urgently.” Do you agree? Why or why not?\nObermeyer and colleagues (2021) provide practical steps that organizations can take to diminish harmful effect of AI. Which of these overlap with calls from Buolamwini, Chowdhury, Gangadharan, Gebru, Noble, and the “AI Doomers”? Which steps do you think are most urgent and why?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words.\nDue by: 10/22 11:59 pm EST", "crumbs": [ - "Module 3 - Discovery & Insights", - "3.1 Open Data and Discovery" + "Module 4 - Bias in Big Data", + "4.2 Bias in Modeling" ] }, { - "objectID": "modules/module3-2.html#read", - "href": "modules/module3-2.html#read", - "title": "3.2 Machine Learning & Prediction", - "section": "Read", - "text": "Read\n\nBrown, S. Machine Learning, Explained. MIT Sloan School of Management. 2021.\nMaini, V. Machine Learning for Humans. 2017. *Read Introduction** and dive deeper into types of ML per your interests.\nOffice for Artificial Intelligence (UK.GOV), 2020. A guide to using AI in the public sector. *Read p. 1-20. Pay attention to the examples of uses for ML and consider how your big data project fits with the method(s).*", + "objectID": "modules/module4-2.html#respond", + "href": "modules/module4-2.html#respond", + "title": "4.2 Bias in Modeling", + "section": "Respond", + "text": "Respond\nThis week you are assigned to small groups again to learn what content your peers explored and their thoughts on the risks of AI. Read two of your group members’ posts and describe how your viewpoints converge and diverge. Please make sure that everyone in your group gets at least one set of comments.\nDue by: 10/26 at 11:59pm EST", "crumbs": [ - "Module 3 - Discovery & Insights", - "3.2 Eyes in the Skies & Remote Sensing" + "Module 4 - Bias in Big Data", + "4.2 Bias in Modeling" ] }, { - "objectID": "modules/module3-2.html#post-discussion-1", - "href": "modules/module3-2.html#post-discussion-1", - "title": "3.2 Eyes in the Skies & Remote Sensing", - "section": "Post Discussion 1", - "text": "Post Discussion 1\nAddress the following:\n\nDescribe at least two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data from the readings this week that I can address in my weekly announcement?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in text using APA style. Posts should range between 400-500 words.\nDue by: 8/27 11:59 pm EST", + "objectID": "modules/module4-2.html#complete", + "href": "modules/module4-2.html#complete", + "title": "4.2 Bias in Modeling", + "section": "Complete", + "text": "Complete\nThe final lab of the semester explores bias in machine learning. Read the Lab 4 instructions and answer all bolded questions in each section.\nDue by: 11/2 at 11:59pm EST", "crumbs": [ - "Module 3 - Discovery & Insights", - "3.2 Eyes in the Skies & Remote Sensing" + "Module 4 - Bias in Big Data", + "4.2 Bias in Modeling" ] }, { - "objectID": "modules/module4-2.html#read", - "href": "modules/module4-2.html#read", - "title": "Module 4 - Bias in Big Data", + "objectID": "modules/module5-1.html#read", + "href": "modules/module5-1.html#read", + "title": "5.2 Data Privacy", "section": "Read", - "text": "Read", + "text": "Read\n\n\nRead and Listen\n\n\n\n\nHot off the press this month from the Pew Research Center. . .\n\nFirst, take this quiz to test your knowledge of digital topics relative to a nationally representative survey of 5,101 randomly selected U.S. adults in May 2023.\nNext, read the report titled How Americans View Data Privacy by Colleen McClain, Michelle Faverio, Monica Anderson and Eugenie Parkand (October 2023). **Read p. 1-12 and take a deeper dive into a set of responses that interest you.**\n\nWNYC Note to Self, Privacy Paradox, 2017. Listen to at least one of the Day 1-5 Challenges AND complete the challenge it poses at the end. All five are interesting and worth your time. See this list of the actions discussed in the podcasts that you can take to protect your data privacy.\nTed Radio Hour, Edward Snowden: Why does online privacy matter?, 2020. \nCongressional Research Service, “Overview of the American Data Privacy and Protection Act (ADPPA), H.R. 8152,” 2022.\n\n\n\nPost\nAddress the following on the 5.2 Data Privacy discussion board:\n\n\n\nDescribe one statistic from the Pew survey that you feel is most urgent to address by policy-makers. Why?\nDescribe which Privacy Paradox Challenge that you chose, what information surprised you from the podcast, and what you learned from completing the challenge at the end. \nThe proposed American Data Privacy and Protection Act (ADPPA) from 2022 is the most recent action that Congress has taken to protect your data privacy. Describe one of the facets of the bill that you think will address the statistic you cited from the Pew survey and how it will work. Also discuss how it may fall short.\n\n\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words. \nDue by: 11/5 11:59 EST\nRespond\nThis week you are assigned to small groups for the last time this semester to learn what content your peers explored and their thoughts on data privacy. Read two of your group members’ posts and describe how your viewpoints converge and diverge. Please make sure that everyone in your group gets at least one set of comments.\nDue by: 11/9 at 11:59pm EST", "crumbs": [ - "Module 4 - Bias in Big Data", - "4.2 Bias in Modeling" + "Module 5 - Data Privacy & Stewardship", + "5.1 Data Privacy" ] }, { - "objectID": "modules/module4-2.html#post-discussion-1", - "href": "modules/module4-2.html#post-discussion-1", - "title": "Module 4 - Bias in Big Data", + "objectID": "modules/module5-1.html#post-discussion-1", + "href": "modules/module5-1.html#post-discussion-1", + "title": "5.2 Data Privacy", "section": "Post Discussion 1", "text": "Post Discussion 1\nAddress the following:\n\nDescribe at least two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data from the readings this week that I can address in my weekly announcement?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in text using APA style. Posts should range between 400-500 words.\nDue by: 8/27 11:59 pm EST", "crumbs": [ - "Module 4 - Bias in Big Data", - "4.2 Bias in Modeling" + "Module 5 - Data Privacy & Stewardship", + "5.1 Data Privacy" ] }, { - "objectID": "modules/module1-3.html#post", - "href": "modules/module1-3.html#post", - "title": "1.3 The Challenges of Big Data", - "section": "Post", - "text": "Post\nAddress the following in Discussion 2:\n\nUsing all three of this week’s readings, describe the challenges of big data that you feel are urgent to address by the public and why.\nWhat issues do the authors address that you don’t feel are urgent? Why?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words.\nDue by: 9/10 11:59 pm EST", + "objectID": "modules/module6-0.html#data-certifications", + "href": "modules/module6-0.html#data-certifications", + "title": "Course Completion Checklist", + "section": "Data Certifications", + "text": "Data Certifications\nAt the beginning of the semester you submitted a plan to complete five workshops to advance your data knowledge and skills. To complete this requirement for the course:\n\nWrite a short report after each data workshop by describing:\n\nWhat you learned (in your own words, not the workshop description);\n\nHow you’ll use the new skills in your project proposal (or elsewhere); and,\n\nWhether you recommend the workshop to other students.\n\nSubmit an example of work that you generated in the training. You can embed it in your reflection or attach it separately.\nLabel each report with the workshop title, link, and date completed.\nSubmit each report to its separate assignment folder. All data workshop reflections are due by 12/3 at 11:59 pm.\nDon’t forget to add your data certifications to your resume and/or LinkedIn profile!", "crumbs": [ - "Module 1 - Introduction to Big Data", - "1.3 The Challenges of Big Data" + "Module 6 - Course Conclusion", + "Course Completion Checklist" ] }, { - "objectID": "modules/module2-2.html#post", - "href": "modules/module2-2.html#post", - "title": "2.2 Eyes in the Skies & Remote Sensing", - "section": "Post", - "text": "Post\nAddress the following in Discussion 3:\n\nThe readings from the last two submodules (2.1 and 2.2) describe use cases for social data, surveillance systems, satellite and aerial imagery, and other types of sensing data (e.g., biometric, IOT, etc.). Describe what promise these types of data hold for the public sector in your interest areas. What concerns do you have about the use of these data for the purposes you describe or otherwise?\nRevisit the research question that you’re exploring for your big data project proposal. Describe types of data from the last two submodules (2.1 and 2.2) that you hadn’t considered before but could incorporate into your proposal. Describe how they might improve your project and their limitations.\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in-text using APA style. Posts should range between 400-500 words.\nDue by: 9/24 11:59 pm EST", + "objectID": "modules/module6-0.html#big-data-project-proposal", + "href": "modules/module6-0.html#big-data-project-proposal", + "title": "Course Completion Checklist", + "section": "Big Data Project Proposal", + "text": "Big Data Project Proposal\n\nEarly in the semester you identified a research question and examined existing literature on the topic of your big data project proposal, so you should have a good start on this final deliverable for the class. As you complete your proposal, review the instructions to make sure you include requested information in each section and follow formatting guidelines. You also can view these sample proposals from prior students in this course:\n\nSample 1\nSample 2\n\nAs you complete your proposal, use Grammarly to check for poor writing, spelling, and plagiarism.\nSubmit the proposal to the Big Data Project Proposal assignment folder by 12/10 11:59pm. **This is a slightly later date than the syllabus.**\nIt may take a few minutes after you upload your paper, but you will have access to a Similarity Report that checks for plagiarism in your paper. You can access the Similarity Report using these instructions. If you need to make changes to your paper, do so and upload it again to the same assignment folder.\n\nHave a wonderful winter break!", "crumbs": [ - "Module 2 - Types of big data", - "2.2 Eyes in the Skies & Remote Sensing" + "Module 6 - Course Conclusion", + "Course Completion Checklist" ] }, { - "objectID": "discussions/M1-3.html#m1.3-the-challenges-of-big-data", - "href": "discussions/M1-3.html#m1.3-the-challenges-of-big-data", - "title": "Discussion 2", + "objectID": "resources/about-course-project.html#instruction-on-how-to-use-the-course-materiterals", + "href": "resources/about-course-project.html#instruction-on-how-to-use-the-course-materiterals", + "title": "About the Big Data for Public Good Open Course", + "section": "Instruction on how to use the course materiterals", + "text": "Instruction on how to use the course materiterals" + }, + { + "objectID": "resources/about-course-project.html#contact", + "href": "resources/about-course-project.html#contact", + "title": "About the Big Data for Public Good Open Course", + "section": "Contact", + "text": "Contact" + }, + { + "objectID": "resources/rubrics-discussion.html", + "href": "resources/rubrics-discussion.html", + "title": "Discussion Rubrics", "section": "", - "text": "Post\nAddress the following:\n\nUsing all three of this week’s readings, describe the challenges of big data that you feel are urgent to address by the public and why.\nWhat issues do the authors address that you don’t feel are urgent? Why?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words.\nDue by: 9/10 11:59 pm EST\nSee here for Rubrics", + "text": "Criteria\nAdvanced\nProficient\nDeveloping\nEmerging\n\n\nAssignment Specific Grading Items\n5 points\nAssignment questions and required items are addressed comprehensively, accurately and specifically. Offers strong support for arguments or points.\n4 points\nMost questions and requirements are addressed accurately and specifically. Support for arguments or points may be weak.\n3 points\nMissing answers to questions or doesn’t address many requirements. Support for arguments or points is weak.\n2 points\nDoes not address questions or requirements, or attempts are largely inaccurate.\n\n\nCritical Analysis (Understanding of Topic)\n10 points\nWriting displays an excellent understanding of the required readings and underlying concepts including correct use of terminology. Integrates lectures, readings, or relevant research, or specific real-life application (work experience, prior coursework, etc.) to support important points.\n8 points\nWriting repeats and summarizes basic, correct information, but does not apply enough lectures, readings, or relevant research or specific real-life applications\n6 points\nWriting shows little or no evidence that class lectures and readings were understood. Writings are largely personal opinions or feelings, or “I agree” or “Great idea,” without supporting statements with concepts from the readings, outside resources, relevant research, or specific real-life application.\n4 points\nWriting lacks evidence of critical analysis, and poor use of supportive evidence.\n\n\nWriting & Clarity\n5 points\nAnswers are clear, well-written and free of grammatical and spelling mistakes. No errors in APA style. Scholarly style. Cites all data obtained from other sources. APA citation style is used in both text and/or bibliography.\n4 points\nAnswers are clear, a few small grammatical and spelling mistakes. Rare errors in APA style that do not detract from the paper. Scholarly style. Cites most data obtained from other sources.\n3 points\nAnswers are clear but there are many spelling and grammatical mistakes. Errors in APA style are noticeable. Cites some data obtained from other sources. Citation style is either inconsistent or incorrect.\n2 points\nAnswers are unclear or difficult to understand, and/or there are many spelling and grammatical mistakes. Errors in APA style detracts substantially from the paper. Word choice is informal in tone. Does not cite sources.\n\n\n\nTotal Score of Grading Rubric_Discussion Posts, / 20", "crumbs": [ - "Discussion 2" + "Discussion Rubrics" ] }, { - "objectID": "discussions/M1-3.html#respond", - "href": "discussions/M1-3.html#respond", - "title": "Discussion 2", - "section": "Respond", - "text": "Respond\nYou are assigned to a small discussion group this week. Read the posts of your peers and critique two group members’ concern/lack of concern about big data. Is the concern outweighed by the benefits of the data/its use? What’s missing from your peers’ argument?\nDue by: 9/14 11:59 pm EST", - "crumbs": [ - "Discussion 2" - ] + "objectID": "resources.html", + "href": "resources.html", + "title": "Resources", + "section": "", + "text": "Add class resources here." }, { - "objectID": "modules/module3-0.html", - "href": "modules/module3-0.html", - "title": "Module 3 - Discovery & Insights", + "objectID": "assignments/project_proposal.html#requirements", + "href": "assignments/project_proposal.html#requirements", + "title": "Big Data Project Proposal", + "section": "Requirements", + "text": "Requirements\nFor this assignment, you’ll write a proposal in no less than five pages (1.15” line spacing, 1” margins) and describe how you could conduct a big data project on your topic of interest. Your proposals should include the following sections:\n\nI. Introduction\nTreat the introduction as the initial pitch of your idea or a summary of the significance of a research problem. After reading the introduction, I should have an understanding of what you want to do and a sense of why it’s worth doing.\nThink about your introduction as a narrative written in two to four paragraphs that succinctly answers the following four questions:\n\nWhat is the central research problem?\nWhat is the topic of study related to that research problem?\nWhat methods should be used to analyze the research problem?\nWhere does this project fit in the research that’s already been done? What’s its contribution?\n\n\n\nII. Purpose\nThis section explains the context of your proposal and describe in detail why it’s important. While there are no prescribed rules, you should:\n\nProvide a more detailed explanation about the purpose of the study than what you stated in the introduction. This is particularly important if the problem is complex or multifaceted.\nDescribe the major issues or problems to be addressed by your research. This can be in the form of questions to be addressed or hypotheses.\nEnd this section with a short paragraph that describes the organization of the remainder your proposal.\n\n\n\nIII. Literature Review\nConnected to the purpose of your study is a deliberate review and synthesis of prior studies related to your research question. The purpose is to place your project within the larger context of what’s already been done on the topic, while demonstrating that your work is original and innovative. Describe what questions other researchers have asked, what methods they have used, and your understanding of their findings. This section should be no less than one - two pages.\nSince a literature review is information dense, it is crucial that this section is well-structured to enable a reader to grasp the key arguments underpinning your proposed study in relation to that of other researchers. You can organize it historically, by methodology, by themes within the subject, etc. You must synthesize the research, discuss what gaps remain, and state how your research fills the gap.\n\n\nIV. Research Design and Methods\nThe objective of this section is to convince the reader that your overall research design and proposed methods of analysis will correctly address the problem and that the methods will effectively interpret the potential results.\nDescribe the overall research design by building upon and drawing examples from your review of the literature. Consider not only methods that other researchers have used but methods of data gathering that have not been used but could be. Be specific about the methodological approaches you plan to undertake to obtain information, the techniques you will use to analyze the data, and the assessments of external validity that apply.\nWhen describing the methods, include the following:\n\nDescribe the data source(s) you will use. Who/what generates the data? What time frames will you use? What’s the unit of analysis (e.g., individuals, events, etc.)\nDescribe how you plan to obtain the data, or how you got it if you already have it. Describe the tools you need to use to get the data (if not downloading it in a structured form.)\nGive a summary of the cleaning/joining of data that you expect to do before you begin your analysis and any tools/applications you need to do it.\nDescribe the phases of the project that involve analysis. These will include exploratory data analysis and visualization. Some projects may stop here. Others will move further into the data science cycle and require an explanation of modeling using causal inference, machine learning, etc.\nDescribe the data products of your project, which may include results of statistical tests, performance analyses of learning algorithms, visualizations of the data or model parameters.\n\n\n\nV. Challenges/Limitations\nAnticipate and acknowledge any potential barriers and pitfalls in carrying out your research design and explain if you can address them. No method is perfect so you need to describe where you believe challenges may exist in obtaining data or accessing information.\n\n\nVI. Citations\nAs with any scholarly research paper, you must cite the sources you used. List only the literature that you actually used or cited in your proposal. This section does not count toward the five page minimum." + }, + { + "objectID": "modules/module5-0.html", + "href": "modules/module5-0.html", + "title": "Module 5 - Data Privacy & Stewardship", "section": "", - "text": "This model explores use cases of big data for insights and prediction. We’ll spend the first week learning about open data and its value to the public, followed by two weeks learning about algorithms and machine learning.", + "text": "Your personal data is everywhere and is being used by private and public entities for purposes you may not like. This module explores issues of data privacy and what options we have for protecting it. It also describes the responsibility that public entities have to collect, manage, and secure it.", "crumbs": [ - "Module 3 - Discovery & Insights" + "Module 5 - Data Privacy & Stewardship" ] }, { - "objectID": "modules/module3-0.html#introduction", - "href": "modules/module3-0.html#introduction", - "title": "Module 3 - Discovery & Insights", + "objectID": "modules/module5-0.html#introduction", + "href": "modules/module5-0.html#introduction", + "title": "Module 5 - Data Privacy & Stewardship", "section": "", - "text": "This model explores use cases of big data for insights and prediction. We’ll spend the first week learning about open data and its value to the public, followed by two weeks learning about algorithms and machine learning.", + "text": "Your personal data is everywhere and is being used by private and public entities for purposes you may not like. This module explores issues of data privacy and what options we have for protecting it. It also describes the responsibility that public entities have to collect, manage, and secure it.", "crumbs": [ - "Module 3 - Discovery & Insights" + "Module 5 - Data Privacy & Stewardship" ] }, { - "objectID": "modules/module3-0.html#content", - "href": "modules/module3-0.html#content", - "title": "Module 3 - Discovery & Insights", + "objectID": "modules/module5-0.html#content", + "href": "modules/module5-0.html#content", + "title": "Module 5 - Data Privacy & Stewardship", "section": "Content", - "text": "Content\n\n\n\n\n\n\n\n\nSection\nAssignment\nDue Date\n\n\n\n\n3.1 Open Data and Discovery\nLab 2\n9/21\n\n\n3.2 Machine Learning & Prediction\nDiscussion Post\n9/24\n\n\n3.2 Machine Learning & Prediction\nDiscussion Peer Response\n10/5", + "text": "Content\n\n\n\nSection\nAssignment\nDue Date\n\n\n\n\n5.1 Data Privacy\nDiscussion Post\n11/5\n\n\n5.1 Data Privacy\nDiscussion Peer Response\n11/9\n\n\n5.2 Data Stewardship\nDiscussion Post\n11/16", "crumbs": [ - "Module 3 - Discovery & Insights" + "Module 5 - Data Privacy & Stewardship" ] }, { - "objectID": "modules/module3-1.html#reflect", - "href": "modules/module3-1.html#reflect", - "title": "3.1 Open Data and Discovery", - "section": "Reflect", - "text": "Reflect\nFor the past several weeks you’ve exposed to types of big data that may be useful for your project proposal that’s due at the end of the semester. This week’s readings introduce you to open data that may be available for your project. Through the readings, you may learn about new repositories or sources of data for your project, including those made available through formal research.\nThis week you’re also being exposed to a very important part of a data project - exploratory data analysis (EDA). The blog by Bourke (2019) describes the key steps in exploratory data analysis that you’ll need to consider as you determine the focus and feasibility of your big data project. Although you won’t be responsible for manipulating or analyzing data for your project, you do need to know where you’ll get it (source), its type, structure, and features.", + "objectID": "modules/module5-1.html#read-and-listen", + "href": "modules/module5-1.html#read-and-listen", + "title": "5.1 Data Privacy", + "section": "Read and Listen", + "text": "Read and Listen\n\nHot off the press this month from the Pew Research Center. . .\n\nFirst, take this quiz to test your knowledge of digital topics relative to a nationally representative survey of 5,101 randomly selected U.S. adults in May 2023.\nNext, read the report titled “How Americans View Data Privacy” by Colleen McClain, Michelle Faverio, Monica Anderson and Eugenie Parkand (October 2023). **Read p. 1-12 and take a deeper dive into a set of responses that interest you.**\n\nWNYC Note to Self, Privacy Paradox, 2017. Listen to at least one of the Day 1-5 Challenges AND complete the challenge it poses at the end. All five are interesting and worth your time. See this list of the actions discussed in the podcasts that you can take to protect your data privacy.\nTed Radio Hour, Edward Snowden: Why does online privacy matter?, 2020. \nCongressional Research Service, “Overview of the American Data Privacy and Protection Act (ADPPA), H.R. 8152,” 2022.", "crumbs": [ - "Module 3 - Discovery & Insights", - "3.1 Open Data and Discovery" + "Module 5 - Data Privacy & Stewardship", + "5.1 Data Privacy" ] }, { - "objectID": "modules/module3-1.html#complete", - "href": "modules/module3-1.html#complete", - "title": "3.1 Open Data and Discovery", - "section": "Complete", - "text": "Complete\nThis week you’ll complete two tasks.\n\nThe first task is Lab 2. It’s a continuation of Lab 1 and explores “feature selection” in the data project life cycle. It asks you to consider the quality and types of data that are used in research and measurement. Lab 2 is due on Thursday, 9/28.\nThe second task is to read the instructions for your big data project and begin compiling existing research on the topic you’ve chosen. Reading peer-reviewed articles in academic journals or think tank reports can help you identify big data sources and how they’ve been used to explore your topic. The data and methods sections of research articles/reports should detail the source, type, structure, and variables used in the study, which will give you insight into data that you can use for your research question. You should compile this research into an annotated bibliography that you’ll use for the literature review in your big data project proposal. The annotated bibliography should include 7 - 10 works on your research topic that you’ll annotate and share with a peer discussion group. This discussion post will be due on Sunday, 10/1.\n\nDue by: Lab 2 - 9/28 11:59 pm EST; Discussion Post 3.3 - 10/1 11:59 pm EST.", + "objectID": "modules/module5-1.html#post", + "href": "modules/module5-1.html#post", + "title": "5.1 Data Privacy", + "section": "Post", + "text": "Post\nAddress the following in the 5.1 Data Privacy discussion board:\n\nDescribe at least two characteristics of big data and how they apply to data produced or used by your field of interest.\nDescribe ideas presented in two of this week’s readings that you had not considered before.\nWhat questions do you have about big data from the readings this week that I can address in my weekly announcement?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings you describe using in text using APA style. Posts should range between 400-500 words.\nDue by: 8/27 11:59 pm EST", "crumbs": [ - "Module 3 - Discovery & Insights", - "3.1 Open Data and Discovery" + "Module 5 - Data Privacy & Stewardship", + "5.1 Data Privacy" ] }, { - "objectID": "modules/module3-2.html#respond", - "href": "modules/module3-2.html#respond", - "title": "3.2 Machine Learning & Prediction", - "section": "Respond", - "text": "Respond\nYou posted an annotated bibliography for your big data project that summarizes 7-10 articles/reports related to your research topic. This exercise prepares you for the literature review section of your proposal and exposes you to data and methods used by other researchers to answer your research question (or a related one.)\nThis week you are assigned to small groups to offer feedback on your peers’ questions and progress on their big data proposals. Read two of your group members’ annotated bibliographies and provide constructive comments on strengths and weaknesses. Please make sure that everyone in your group gets at least one set of comments.\nDue by: 10/5 at 11:59pm EST", + "objectID": "discussions/M5-1.html", + "href": "discussions/M5-1.html", + "title": "Discussion 7", + "section": "", + "text": "Post\nAddress the following:\n\nDescribe two differences between US and EU data privacy laws. What are the implications of these differences for safeguarding your personal information?\nAs we’ve learned this semester, your personal data can be used by different entities for economic gain as well as improvements to how we live, work, and play. If you could choose how your data are used, what would you permit versus prohibit and why?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words.\nDue by: 11/16 11:59 pm EST\nSee here for Rubrics", "crumbs": [ - "Module 3 - Discovery & Insights", - "3.2 Eyes in the Skies & Remote Sensing" + "Discussion 7" ] }, { - "objectID": "modules/module3-2.html#complete", - "href": "modules/module3-2.html#complete", - "title": "3.2 Machine Learning & Prediction", - "section": "Complete", - "text": "Complete\nLab 3 unpacks an essential process of machine learning called feature engineering. Sometimes data as “features” nicely fit the construct that needs to be measured (e.g., height as measured in inches or cm). Other times, the data must be converted to a form that can be useful for prediction. For example, when you’re trying to use digital data to predict images that contain cats, the machine needs to know what features to look for to distinguish a cat from a dog. This lab pulls back the curtain on how these processes are engineered with digital data to give you insights into how machines “learn” to predict X. The lab is largely instructional; however, as you learn about these processes, consider how they influence the output of machine learning and how they can introduce errors.\nDue by: 10/8 at 11:59pm EST", + "objectID": "discussions/M5-1.html#m5.2-data-stewardship", + "href": "discussions/M5-1.html#m5.2-data-stewardship", + "title": "Discussion 7", + "section": "", + "text": "Post\nAddress the following:\n\nDescribe two differences between US and EU data privacy laws. What are the implications of these differences for safeguarding your personal information?\nAs we’ve learned this semester, your personal data can be used by different entities for economic gain as well as improvements to how we live, work, and play. If you could choose how your data are used, what would you permit versus prohibit and why?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words.\nDue by: 11/16 11:59 pm EST\nSee here for Rubrics", "crumbs": [ - "Module 3 - Discovery & Insights", - "3.2 Eyes in the Skies & Remote Sensing" + "Discussion 7" + ] + }, + { + "objectID": "discussions/M5-1.html#post", + "href": "discussions/M5-1.html#post", + "title": "Discussion 7", + "section": "Post", + "text": "Post\nAddress the following:\n\nDescribe two differences between US and EU data privacy laws. What are the implications of these differences for safeguarding your personal information?\nAs we’ve learned this semester, your personal data can be used by different entities for economic gain as well as improvements to how we live, work, and play. If you could choose how your data are used, what would you permit versus prohibit and why?\n\nDiscussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words.\nDue by: 11/16 11:59 pm EST\nSee here for Rubrics", + "crumbs": [ + "Discussion 7" ] } ] \ No newline at end of file diff --git a/docs/site_libs/quarto-html/quarto.js b/docs/site_libs/quarto-html/quarto.js index 91be522..3ebd49c 100644 --- a/docs/site_libs/quarto-html/quarto.js +++ b/docs/site_libs/quarto-html/quarto.js @@ -9,7 +9,7 @@ const layoutMarginEls = () => { // Find any conflicting margin elements and add margins to the // top to prevent overlap const marginChildren = window.document.querySelectorAll( - ".column-margin.column-container > * " + ".column-margin.column-container > *, .margin-caption, .aside" ); let lastBottom = 0; @@ -19,7 +19,9 @@ const layoutMarginEls = () => { marginChild.style.marginTop = null; const top = marginChild.getBoundingClientRect().top + window.scrollY; if (top < lastBottom) { - const margin = lastBottom - top; + const marginChildStyle = window.getComputedStyle(marginChild); + const marginBottom = parseFloat(marginChildStyle["marginBottom"]); + const margin = lastBottom - top + marginBottom; marginChild.style.marginTop = `${margin}px`; } const styles = window.getComputedStyle(marginChild); @@ -33,7 +35,15 @@ window.document.addEventListener("DOMContentLoaded", function (_event) { // Recompute the position of margin elements anytime the body size changes if (window.ResizeObserver) { const resizeObserver = new window.ResizeObserver( - throttle(layoutMarginEls, 50) + throttle(() => { + layoutMarginEls(); + if ( + window.document.body.getBoundingClientRect().width < 990 && + isReaderMode() + ) { + quartoToggleReader(); + } + }, 50) ); resizeObserver.observe(window.document.body); } diff --git a/docs/site_libs/quarto-nav/quarto-nav.js b/docs/site_libs/quarto-nav/quarto-nav.js index ebfc262..f6a53b1 100644 --- a/docs/site_libs/quarto-nav/quarto-nav.js +++ b/docs/site_libs/quarto-nav/quarto-nav.js @@ -237,6 +237,7 @@ window.document.addEventListener("DOMContentLoaded", function () { const links = window.document.querySelectorAll("a"); for (let i = 0; i < links.length; i++) { if (links[i].href) { + links[i].dataset.originalHref = links[i].href; links[i].href = links[i].href.replace(/\/index\.html/, "/"); } } diff --git a/docs/site_libs/quarto-search/quarto-search.js b/docs/site_libs/quarto-search/quarto-search.js index 4a6f7e2..5f723d7 100644 --- a/docs/site_libs/quarto-search/quarto-search.js +++ b/docs/site_libs/quarto-search/quarto-search.js @@ -98,6 +98,7 @@ window.document.addEventListener("DOMContentLoaded", function (_event) { classNames: { form: "d-flex", }, + placeholder: language["search-text-placeholder"], translations: { clearButtonTitle: language["search-clear-button-title"], detachedCancelButtonText: language["search-detached-cancel-button-title"], @@ -392,7 +393,12 @@ window.document.addEventListener("DOMContentLoaded", function (_event) { return focusedEl.tagName.toLowerCase() === tag; }); - if (kbds && kbds.includes(key) && !isFormElFocused) { + if ( + kbds && + kbds.includes(key) && + !isFormElFocused && + !document.activeElement.isContentEditable + ) { event.preventDefault(); window.quartoOpenSearch(); } @@ -669,6 +675,18 @@ function showCopyLink(query, options) { // create the index var fuseIndex = undefined; var shownWarning = false; + +// fuse index options +const kFuseIndexOptions = { + keys: [ + { name: "title", weight: 20 }, + { name: "section", weight: 20 }, + { name: "text", weight: 10 }, + ], + ignoreLocation: true, + threshold: 0.1, +}; + async function readSearchData() { // Initialize the search index on demand if (fuseIndex === undefined) { @@ -679,17 +697,7 @@ async function readSearchData() { shownWarning = true; return; } - // create fuse index - const options = { - keys: [ - { name: "title", weight: 20 }, - { name: "section", weight: 20 }, - { name: "text", weight: 10 }, - ], - ignoreLocation: true, - threshold: 0.1, - }; - const fuse = new window.Fuse([], options); + const fuse = new window.Fuse([], kFuseIndexOptions); // fetch the main search.json const response = await fetch(offsetURL("search.json")); @@ -1220,8 +1228,34 @@ function algoliaSearch(query, limit, algoliaOptions) { }); } -function fuseSearch(query, fuse, fuseOptions) { - return fuse.search(query, fuseOptions).map((result) => { +let subSearchTerm = undefined; +let subSearchFuse = undefined; +const kFuseMaxWait = 125; + +async function fuseSearch(query, fuse, fuseOptions) { + let index = fuse; + // Fuse.js using the Bitap algorithm for text matching which runs in + // O(nm) time (no matter the structure of the text). In our case this + // means that long search terms mixed with large index gets very slow + // + // This injects a subIndex that will be used once the terms get long enough + // Usually making this subindex is cheap since there will typically be + // a subset of results matching the existing query + if (subSearchFuse !== undefined && query.startsWith(subSearchTerm)) { + // Use the existing subSearchFuse + index = subSearchFuse; + } else if (subSearchFuse !== undefined) { + // The term changed, discard the existing fuse + subSearchFuse = undefined; + subSearchTerm = undefined; + } + + // Search using the active fuse + const then = performance.now(); + const resultsRaw = await index.search(query, fuseOptions); + const now = performance.now(); + + const results = resultsRaw.map((result) => { const addParam = (url, name, value) => { const anchorParts = url.split("#"); const baseUrl = anchorParts[0]; @@ -1238,4 +1272,15 @@ function fuseSearch(query, fuse, fuseOptions) { crumbs: result.item.crumbs, }; }); + + // If we don't have a subfuse and the query is long enough, go ahead + // and create a subfuse to use for subsequent queries + if (now - then > kFuseMaxWait && subSearchFuse === undefined) { + subSearchTerm = query; + subSearchFuse = new window.Fuse([], kFuseIndexOptions); + resultsRaw.forEach((rr) => { + subSearchFuse.add(rr.item); + }); + } + return results; } diff --git a/docs/syllabus.pdf b/docs/syllabus.pdf deleted file mode 100644 index 19d77b2..0000000 Binary files a/docs/syllabus.pdf and /dev/null differ diff --git a/modules/module5-0.qmd b/modules/module5-0.qmd index eec819d..a2acb6f 100644 --- a/modules/module5-0.qmd +++ b/modules/module5-0.qmd @@ -1,5 +1,15 @@ --- -title: Module 5 - Data Privacy & Stewardship +title: "Module 5 - Data Privacy & Stewardship" --- +## Introduction +Your personal data is everywhere and is being used by private and public entities for purposes you may not like. This module explores issues of data privacy and what options we have for protecting it. It also describes the responsibility that public entities have to collect, manage, and secure it. + +## Content + +| Section | Assignment | Due Date | +|----------------------|---------------------|----------| +| 5.1 Data Privacy | Discussion Post | 11/5 | +| 5.1 Data Privacy | Discussion Peer Response | 11/9 | +| 5.2 Data Stewardship | Discussion Post | 11/16 | \ No newline at end of file diff --git a/modules/module5-1.qmd b/modules/module5-1.qmd index 02120d9..64173b9 100644 --- a/modules/module5-1.qmd +++ b/modules/module5-1.qmd @@ -1,28 +1,30 @@ --- -title: Module 5 - Data Privacy & Stewardship +title: "5.1 Data Privacy" --- - - +:---------------------------:+ -| ### Data Skepticism | +| ### The Privacy Paradox | +-----------------------------+ -> This week's readings discuss some challenges of big data and how they're used. They also discuss privacy concerns about use of our digital data and why you should care about it. +> Although a majority of people claim that privacy is important to them, few take steps to protect it online or through devices that sense location, voice, images, and other identifiable features. This module explores the reasons for this paradox and what actions policy-makers (and you) can take to protect the big data that you generate. + +## Read and Listen + +1. Hot off the press this month from the Pew Research Center. . . -## Read + 1. First, [take this quiz to test your knowledge of digital topics](https://www.pewresearch.org/internet/quiz/digital-knowledge-quiz-2023/) relative to a nationally representative survey of 5,101 randomly selected U.S. adults in May 2023. -1. + 2. Next, read the report titled "How Americans View Data Privacy" by Colleen McClain, Michelle Faverio, Monica Anderson and Eugenie Parkand (October 2023). \*\***Read** **p. 1-12 and take a deeper dive into a set of responses that interest you.\*\*** -2. +2. WNYC Note to Self, [Privacy Paradox](https://project.wnyc.org/privacy-paradox/), 2017. Listen to **at least one of the Day 1-5 Challenges AND complete the challenge it poses at the end**. All five are interesting and worth your time. [See this list of the actions](https://www.wnyc.org/story/privacy-paradox-tip-sheet/) discussed in the podcasts that you can take to protect your data privacy. -3. +3. Ted Radio Hour, [Edward Snowden: Why does online privacy matter?](https://www.npr.org/transcripts/818341273), 2020.  -4. +4. Congressional Research Service, "[Overview of the American Data Privacy and Protection Act (ADPPA), H.R. 8152](https://crsreports.congress.gov/product/pdf/LSB/LSB10776)," 2022. -## Post [Discussion 1](/discussions/M1-1.qmd) +## Post -Address the following: +Address the following in the [5.1 Data Privacy](/discussions/M5-1.qmd) discussion board: 1. Describe at least two characteristics of big data and how they apply to data produced or used by your field of interest. diff --git a/modules/module6-0.qmd b/modules/module6-0.qmd index 269024e..38b9a46 100644 --- a/modules/module6-0.qmd +++ b/modules/module6-0.qmd @@ -1,4 +1,42 @@ --- -title: Module 6 - Course Conclusion +title: "Course Completion Checklist" --- ++------------------------------------+ +| ### Great Job this Semester | ++------------------------------------+ + +> I've enjoyed getting to know you this semester and encourage you to stay in touch as you finish your graduate degrees. Feel free to contact me about topics in this course or if you just want to talk about your career. + +## Data Certifications + +At the beginning of the semester you submitted a plan to complete five workshops to advance your data knowledge and skills. To complete this requirement for the course: + +- Write a short report after each data workshop by describing: + + 1. What you learned (in your own words, not the workshop description);\ + 2. How you'll use the new skills in your project proposal (or elsewhere); and,\ + 3. Whether you recommend the workshop to other students. + + **Submit an example of work that you generated in the training**. You can embed it in your reflection or attach it separately. + + Label each report with the workshop title, link, and date completed. + +- Submit each report to its separate assignment folder. **All data workshop reflections are due by 12/3 at 11:59 pm**. + +- Don't forget to add your data certifications to your resume and/or LinkedIn profile! + +## Big Data Project Proposal + +- Early in the semester you identified a research question and examined existing literature on the topic of your big data project proposal, so you should have a good start on this final deliverable for the class. As you complete your proposal, [review the instructions](https://gastate.view.usg.edu/content/enforced2/2885410-CO.090.EC.ECON8000.XLS.PZ11.20242/Files_Module%203/Big%20Data%20Project%20Proposal%20Instructions.docx?ou=2885410) to make sure you include requested information in each section and follow formatting guidelines. You also can view these sample proposals from prior students in this course: + + - Sample 1 + + - Sample 2 +- As you complete your proposal, use Grammarly to check for poor writing, spelling, and plagiarism. + +- Submit the proposal to the [Big Data Project Proposal](/assignments/project_proposal.qmd) assignment folder by **12/10 11:59pm**. \*\*This is a slightly later date than the syllabus.\*\* + +- It may take a few minutes after you upload your paper, but you will have access to a [Similarity Report](https://help.turnitin.com/feedback-studio/d2l/student/the-similarity-report/interpreting-the-similarity-report.htm) that checks for plagiarism in your paper. You can [access the Similarity Report using these instructions](https://help.turnitin.com/feedback-studio/d2l/student/the-similarity-report/accessing-the-similarity-report.htm). If you need to make changes to your paper, do so and upload it again to the same assignment folder. + +**Have a wonderful winter break!**