diff --git a/gradproject.md b/gradproject.md index f8dfcbf..6f06c65 100644 --- a/gradproject.md +++ b/gradproject.md @@ -8,9 +8,6 @@ markdown: kramdown # Graduate Project {:.no_toc} - - * TOC {:toc} @@ -20,11 +17,6 @@ The graduate project is **offered only to students enrolled in Data C200, CS C20 The purpose of the project is to give students experience in both open-ended data science analysis and research in general. - - - - - ## Deliverables The graduate project element will require the following deliverables: @@ -69,7 +61,6 @@ The graduate project element will require the following deliverables: ## Datasets This section contains the topics we will provide to you to explore your research questions. Please choose one of the following datasets to work on. **You will be expected to complete all (2) tasks provided for your chosen dataset.** - ### Accessing Datasets @@ -246,99 +237,26 @@ Additionally, here are some example questions about the project that you are wel The first deliverable of your group project is just to form your group, choose a dataset, and submit your implementation plan to [this google form](https://forms.gle/DcBp3ZbM8TpTfSRD6){:target="_blank"} by 11:59 pm on 3/15. The implementation plan should consist of a series of steps for completing the project along with a timeline. You may form groups of 2 or 3 people with any Data 200/200A/200S student. ## Checkpoint 2: -For the check-in we would like for you to prepare brief answers to the following questions about the modeling process - +The purpose of this checkpoint is to ensure you are making progress and are on schedule to submit the first draft of the project in approximately two weeks time. You will be required to submit a pdf document summarizing 1) all of your progress so far and 2) future plans. Guiding questions for the content of the document are detailed below. You will be required to submit the report to Gradescope before the meeting. The staff member will skim the report before the meeting and give you guidance on the project as a whole. Please refer to the [rubrics](#rubrics) section for the grading breakdown. + +### Progress So Far +- What type of data were you exploring? +- What were your EDA questions? +- What was the granularity of the data? +- What did the distribution of the data look like? Were there any outliers? Were there any missing or invalid entries? +- If the data was not in a featurized format, what features did you explore and why? +- Was there any correlation between the variables you were interested in exploring? +- How did you try to cleanly and accurately visualize the relationship among variables? +- Did you need to perform data transformations? + +### Future Plans - What model do you plan on using and why? -- Does your model require hyperparameter tuning? If so, how do you approach it? -- How do you engineer the features for your model? What are the rationales behind selecting these features? -- How do you perform cross validation on your model? -- What loss metrics are you using to evaluate your model? -- From a bias-variance tradeoff standpoint, how do you assess the performance of your model? How do you check if it is overfitting? -- How would you improve your model based on the outcome? - - - - +- Will your model require hyperparameter tuning? If so, how will you approach it? +- How will you engineer the features for your model? What are the rationales behind selecting these features? +- How will you perform cross validation on your model? +- What loss metrics are you going to use to evaluate your model? +- From a bias-variance tradeoff standpoint, how will you assess the performance of your model? How will you check if it is overfitting? +- How will you improve your model based on the outcome? ## Rubrics This section includes a rubric for how different project deliverables are going to be graded. This section will be updated as we get further along the project timeline. @@ -347,84 +265,8 @@ This section includes a rubric for how different project deliverables are going - Short paragraph description of implementation plan and timeline (2%). - Forming teams by the deadline (3%). - - - - - +- Preliminary Results (1%). \ No newline at end of file