2013-09-28
This week was an interesting week. After our guest lecturers, we were all supposed to come back to class on the same page. To my dismay on Tuesday, not everyone in the class was caught up, with the reasons unclear. The expectation was that students would have completed all the installations and have access to iPython Notebook in his or her browser. Because of this reason, it was half the class waiting for the other half of the class to catch up. I felt like my group has done a good job of working collaboratively and supporting each other. We remind each other of the reflections, assignments, and let each other know what happened in class if someone was sick. Unfortunately, we spent a significant portion of the class debating when the deadline for reflections should be. As I pointed out in class, we should all be held responsible for our commitments, and execute on what we commit to. For the first time in a few weeks, we were assigned homework: reading up on dexy.it and also working through hypothes.is/alpha to learn in-browser annotations. We also were told to browse through the online Python for Data Analysis text.
Come Thursday, I had completed the assignment and was wholly ready to learn about how we were going to use these tools, however, neither one was mentioned at all. Further, we were going to have an assignment on the questionnaire that we all filled out, but that assignment never came into fruition. I think building off of what classmates had voiced during class, reproducibility is indeed a very interesting topic, and realizing that in the real world, reproducibility is challenging and we run into road blocks. However, I have one comment and one problem. My comment is that perhaps it would be helpful to learn about reproducibility if we were more familiar and already using the tools. I actually walked by a Stat133 class this past week, and saw that in section they were writing code in iPython Notebook. What they had already likely been working through for 2 weeks, our class had not yet caught up on the 4th or 5th week. More familiarity and digging deeper into the tools would help teach us better about reproducibility in my opinion. The problem I have with reproducibility as well, is that, given how much it is stressed in this course, I have trouble seeing how even this course itself is reproducible. Everything we do in class is rarely documented, and no instructions are ever jotted down. The steps that we follow as a class are verbally shared, and then lost. Since it's better to run into trouble earlier on than suffer the consequences later, I would also say that it'd be better to start documenting and writing down progress as it goes, then at the end of the semester when we have a fuzzier idea of what we did weeks 1-4, and to spend copious time remembering. Going off of being familiar with tools, I am disappointed that the new accounts I had to make for this class are also not being used. I am not sure why we have Twitter and Feedly accounts, Pidgin and the IRC channel are rarely used.
Class on Thursday was also not helpful in the sense that as Statistics majors (for the most part), we have an understanding of how the observation, data, analysis process works. While the diagram was a helpful visual, it did not add significant value. Asking about our majors and the diversity of the group, could have been a more useful and enlightening topic on day one or day two, but for ninth day of class, was very out of place. When we spoke of setting weekly goals for ourselves a couple weeks back, I do not see us adhering to that goal. There has not been a clear lesson plan, and it's very difficult to see us making progress. Out of the reproducibility, collaboration, and data science, I feel that I have learned most about collaboration, touched upon high level reproducibility, and not data science. Currently, I primarily value my group and being able to work together as a team.
I think accountability is important, as a student and as an instructor. I hope that moving forward the class will be more organized, focused, goal-driven, and more prepared (as the announcement from September 24th states in the Notes section). We have a long way to go if we're trying to write a paper and do research on earthquake predictions 10 weeks from now. Particularly given that other universities are interested in this particular class, I hope that we will focus more effort into this course.
With regards to the course title and purpose of the course, Reproducible and Collaborative Data Science, when I first signed up for the class, I expected it to be about the data science, which I related to the analysis of large data sets, which also adheres to what the data science careers email mentions. I see that while it could be tied to big data in industry, data science is also highly applicable in research. Given the backgrounds of the class, it seems like very few students are invovled in research, particularly extensively. I think a course, while it should abide by what it set out to do, should also focus on the students and where their interests are.
For reproducibility, I think the major way we have learned about it is through GitHub and sharing our work with the online community. Similarly, once and hopefully when we get to writing code and creating something tangible, it will be key to share our work with our collaborators. I think the scientific community and in particularly those in computational science disciplines are wise in encouraging reproducibility above and beyond just replicability, to also encourage innovation and creativity. I think these logs for future researchers and scientists and professionals is very important, likewise in our class. I hope that given how much we emphasize reproducibility, that we will more strongly adhere to that on a daily basis so as students, we can reproduce what the instructors and our classmates are doing, such as installation processes.