-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.tex
1834 lines (1493 loc) · 76.9 KB
/
index.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% Options for packages loaded elsewhere
\PassOptionsToPackage{unicode}{hyperref}
\PassOptionsToPackage{hyphens}{url}
\PassOptionsToPackage{dvipsnames,svgnames,x11names}{xcolor}
%
\documentclass[
letterpaper,
DIV=11,
numbers=noendperiod]{scrreprt}
\usepackage{amsmath,amssymb}
\usepackage{iftex}
\ifPDFTeX
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{textcomp} % provide euro and other symbols
\else % if luatex or xetex
\usepackage{unicode-math}
\defaultfontfeatures{Scale=MatchLowercase}
\defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1}
\fi
\usepackage{lmodern}
\ifPDFTeX\else
% xetex/luatex font selection
\fi
% Use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\IfFileExists{microtype.sty}{% use microtype if available
\usepackage[]{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\makeatletter
\@ifundefined{KOMAClassName}{% if non-KOMA class
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}}
}{% if KOMA class
\KOMAoptions{parskip=half}}
\makeatother
\usepackage{xcolor}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\setcounter{secnumdepth}{5}
% Make \paragraph and \subparagraph free-standing
\ifx\paragraph\undefined\else
\let\oldparagraph\paragraph
\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\undefined\else
\let\oldsubparagraph\subparagraph
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}\usepackage{longtable,booktabs,array}
\usepackage{calc} % for calculating minipage widths
% Correct order of tables after \paragraph or \subparagraph
\usepackage{etoolbox}
\makeatletter
\patchcmd\longtable{\par}{\if@noskipsec\mbox{}\fi\par}{}{}
\makeatother
% Allow footnotes in longtable head/foot
\IfFileExists{footnotehyper.sty}{\usepackage{footnotehyper}}{\usepackage{footnote}}
\makesavenoteenv{longtable}
\usepackage{graphicx}
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
\makeatother
% Scale images if necessary, so that they will not overflow the page
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
% Set default figure placement to htbp
\makeatletter
\def\fps@figure{htbp}
\makeatother
\KOMAoption{captions}{tableheading}
\makeatletter
\@ifpackageloaded{tcolorbox}{}{\usepackage[skins,breakable]{tcolorbox}}
\@ifpackageloaded{fontawesome5}{}{\usepackage{fontawesome5}}
\definecolor{quarto-callout-color}{HTML}{909090}
\definecolor{quarto-callout-note-color}{HTML}{0758E5}
\definecolor{quarto-callout-important-color}{HTML}{CC1914}
\definecolor{quarto-callout-warning-color}{HTML}{EB9113}
\definecolor{quarto-callout-tip-color}{HTML}{00A047}
\definecolor{quarto-callout-caution-color}{HTML}{FC5300}
\definecolor{quarto-callout-color-frame}{HTML}{acacac}
\definecolor{quarto-callout-note-color-frame}{HTML}{4582ec}
\definecolor{quarto-callout-important-color-frame}{HTML}{d9534f}
\definecolor{quarto-callout-warning-color-frame}{HTML}{f0ad4e}
\definecolor{quarto-callout-tip-color-frame}{HTML}{02b875}
\definecolor{quarto-callout-caution-color-frame}{HTML}{fd7e14}
\makeatother
\makeatletter
\@ifpackageloaded{bookmark}{}{\usepackage{bookmark}}
\makeatother
\makeatletter
\@ifpackageloaded{caption}{}{\usepackage{caption}}
\AtBeginDocument{%
\ifdefined\contentsname
\renewcommand*\contentsname{Table of contents}
\else
\newcommand\contentsname{Table of contents}
\fi
\ifdefined\listfigurename
\renewcommand*\listfigurename{List of Figures}
\else
\newcommand\listfigurename{List of Figures}
\fi
\ifdefined\listtablename
\renewcommand*\listtablename{List of Tables}
\else
\newcommand\listtablename{List of Tables}
\fi
\ifdefined\figurename
\renewcommand*\figurename{Figure}
\else
\newcommand\figurename{Figure}
\fi
\ifdefined\tablename
\renewcommand*\tablename{Table}
\else
\newcommand\tablename{Table}
\fi
}
\@ifpackageloaded{float}{}{\usepackage{float}}
\floatstyle{ruled}
\@ifundefined{c@chapter}{\newfloat{codelisting}{h}{lop}}{\newfloat{codelisting}{h}{lop}[chapter]}
\floatname{codelisting}{Listing}
\newcommand*\listoflistings{\listof{codelisting}{List of Listings}}
\makeatother
\makeatletter
\makeatother
\makeatletter
\@ifpackageloaded{caption}{}{\usepackage{caption}}
\@ifpackageloaded{subcaption}{}{\usepackage{subcaption}}
\makeatother
\ifLuaTeX
\usepackage{selnolig} % disable illegal ligatures
\fi
\usepackage{bookmark}
\IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available
\urlstyle{same} % disable monospaced font for URLs
\hypersetup{
pdftitle={Data 100 Debugging Guide},
pdfauthor={Yash Dave; Lillian Weng},
colorlinks=true,
linkcolor={blue},
filecolor={Maroon},
citecolor={Blue},
urlcolor={Blue},
pdfcreator={LaTeX via pandoc}}
\title{Data 100 Debugging Guide}
\author{Yash Dave \and Lillian Weng}
\date{}
\begin{document}
\maketitle
\renewcommand*\contentsname{Table of contents}
{
\hypersetup{linkcolor=}
\setcounter{tocdepth}{2}
\tableofcontents
}
\bookmarksetup{startatroot}
\chapter*{About}\label{about}
\addcontentsline{toc}{chapter}{About}
\markboth{About}{About}
This text offers pointers for keyboard shortcuts or common mistakes that
accompany the coursework in the Fall 2024 Edition of the UC Berkeley
course Data 100: Principles and Techniques of Data Science.
Inspiration for this guide was taken from the UC San Diego course DSC
10: Principles of Data Science and their
\href{https://dsc10.com/debugging/}{debugging guide}.
If you spot any typos or would like to suggest any changes, please email
us at \textbf{[email protected]}
\bookmarksetup{startatroot}
\chapter{Jupyter 101}\label{jupyter-101}
\begin{tcolorbox}[enhanced jigsaw, breakable, title=\textcolor{quarto-callout-note-color}{\faInfo}\hspace{0.5em}{Note}, arc=.35mm, rightrule=.15mm, coltitle=black, opacityback=0, colbacktitle=quarto-callout-note-color!10!white, opacitybacktitle=0.6, colframe=quarto-callout-note-color-frame, bottomtitle=1mm, colback=white, titlerule=0mm, leftrule=.75mm, toptitle=1mm, bottomrule=.15mm, toprule=.15mm, left=2mm]
If you're using a MacBook, replace \texttt{ctrl} with \texttt{cmd}.
\end{tcolorbox}
\section{Shortcuts for Cells}\label{shortcuts-for-cells}
For the following commands, make sure you're in command mode. You can
enter this mode by pressing \texttt{esc}.
\begin{itemize}
\tightlist
\item
\texttt{a}: create a cell above
\item
\texttt{b}: create a cell below
\item
\texttt{dd}: delete current cell
\item
\texttt{m}: convert a cell to markdown (text cell)
\item
\texttt{y}: convert a cell to code
\end{itemize}
\section{Running Cells}\label{running-cells}
For individual cells,
\begin{itemize}
\tightlist
\item
\texttt{ctrl} + \texttt{return}: run the current cell
\item
\texttt{shift} + \texttt{return}: run the current cell and move to the
next cell
\end{itemize}
To run all cells in a notebook:
\begin{itemize}
\item
In the menu bar on the left, click \texttt{Run}. From here, you have
several options. The ones we use most commonly are:
\begin{itemize}
\tightlist
\item
\texttt{Run\ All\ Above\ Selected\ Cell}: this runs every cell above
the selected cell
\item
\texttt{Run\ Selected\ Cell\ and\ All\ Below}: this runs the
selected cell and all cells below
\item
\texttt{Run\ All}: this runs every cell in the notebook from
top-to-bottom
\end{itemize}
\end{itemize}
\section{Saving your notebook}\label{saving-your-notebook}
Jupyter autosaves your work, but there can be a delay. As such, it's a
good idea to save your work as often as you remember and especially
before submitting assignments. To do so, press \texttt{ctrl} +
\texttt{s}.
\section{Restarting Kernel}\label{restarting-kernel}
In the menu bar on the left, click \texttt{Kernel}. From here, you have
several options. The ones we use most commonly are:
\begin{itemize}
\tightlist
\item
\texttt{Restart\ Kernel...}
\item
\texttt{Restart\ Kernel\ and\ Run\ up\ to\ Selected\ Cell}
\item
\texttt{Restart\ Kernel\ and\ Run\ All\ Cells}
\end{itemize}
\section{Automatically Closing
Brackets}\label{automatically-closing-brackets}
Many IDEs like VSCode have a functionality that automatically closes
brackets. For example, pressing \texttt{(}, \texttt{\{}, or \texttt{{[}}
would automatically add the second bracket at the other end \texttt{)},
\texttt{\}}, and \texttt{{]}}, respectively. Datahub does not have this
functionality turned on by default, but you can do so by going into
\texttt{Settings} -\textgreater{} \texttt{Auto\ Close\ Brackets}. If you
see a check mark to the left of \texttt{Auto\ Close\ Brackets}, then
it's enabled.
\bookmarksetup{startatroot}
\chapter{Jupyter / Datahub}\label{jupyter-datahub}
\section{My kernel died, restarted, or is very
slow}\label{my-kernel-died-restarted-or-is-very-slow}
Jupyterhub connects you to an external container to run your code. That
connection could be slow/severed because:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\tightlist
\item
you haven't made any changes to the notebook for a while
\item
a cell took too much time to run
\item
a cell took up too many resources to compute
\end{enumerate}
When you see a message like this:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\tightlist
\item
Either press the ``Ok'' button or reload the page
\item
\href{https://ds100.org/debugging-guide/jupyter101/jupyter101.html\#restarting-kernel}{Restart
your kernel}
\item
\href{https://ds100.org/debugging-guide/jupyter101/jupyter101.html\#running-cells}{Rerun
your cells}
\end{enumerate}
Note that you may lose some recent work if your kernel restarted when
you were in the middle of editing a cell. As such, we recommend
\href{https://ds100.org/debugging-guide/jupyter101/jupyter101.html\#saving-your-notebook}{saving
your work} as often as possible.
If this does not fix the issue, it could be a problem with your code,
usually the last cell that executed before your kernel crashed. Double
check your logic, and feel free to make a private post on Ed if you're
stuck!
\section{I can't edit a cell}\label{i-cant-edit-a-cell}
We set some cells to read-only mode prevent accidental modification. To
make the cell writeable,
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\tightlist
\item
Click the cell
\item
Click setting on the top right corner
\item
Under ``Common Tools'', you can toggle between ``Editable'' (can edit
the cell) and ``Read-Only'' (cannot edit the cell)
\end{enumerate}
\section{My text cell looks like
code}\label{my-text-cell-looks-like-code}
If you double-click on a text (markdown) cell, it'll appear in its raw
format. To fix this, simply run the cell. If this doesn't fix the
problem, check out the commonly asked question below.
\section{My text cell changed to a code cell / My code cell changed to a
text
cell}\label{my-text-cell-changed-to-a-code-cell-my-code-cell-changed-to-a-text-cell}
Sometimes, a text (markdown) cell was changed to a code cell, or a code
cell can't be run because it's been changed to a text (markdown) or raw
cell. To fix this, toggle the desired cell type in the top bar.
\section{Why does running a particular cell cause my kernel to
die?}\label{why-does-running-a-particular-cell-cause-my-kernel-to-die}
If one particular cell seems to cause your kernel to die, this is likely
because the computer is trying to use more memory than it has available.
For instance: your code is trying to create a gigantic array. To prevent
the entire server from crashing, the kernel will ``die''. This is an
indication that there is a mistake in your code that you need to fix.
\section{I accidentally deleted something in a cell that was provided to
me -- how do I get it
back?}\label{i-accidentally-deleted-something-in-a-cell-that-was-provided-to-me-how-do-i-get-it-back}
Suppose you're working on Lab 5. One solution is to go directly to
DataHub and rename your lab05 folder to something else, like lab05-old.
Then, click the Lab 5 link on the course website again, and it'll bring
you to a brand-new version of Lab 5. You can then copy your work from
your old Lab 5 to this new one, which should have the original version
of the assignment.
Alternatively, you can access this
\href{https://github.com/DS-100/fa24-student}{public repo} and navigate
to a blank copy of the assignment you were working on. In the case of
Lab 5 for example, the notebook would be located at
\texttt{lab/lab05/lab05.ipynb}. You can then check and copy over the
contents of the deleted cell into a new cell in your existing notebook.
\section{``Click here to download zip file'' is not
working}\label{click-here-to-download-zip-file-is-not-working}
When this happens, you can download the zip file through the menu on the
left.
Right click on the generated zip file and click ``Download''.
\section{\texorpdfstring{I can't export my assignment as a PDF due to a
\texttt{LatexFailed}
error}{I can't export my assignment as a PDF due to a LatexFailed error}}\label{i-cant-export-my-assignment-as-a-pdf-due-to-a-latexfailed-error}
Occasionally when running the \texttt{grader.export(run\_tests=True)}
cell at the end of the notebook, you run into an error where the PDF
failed to generate:
Converting a Jupyter notebook to a PDF involves formatting some of the
markdown text in \href{https://www.latex-project.org/}{LaTeX}. However,
this process will fail if your free response answers have (unresolved)
LaTeX characters like \texttt{\textbackslash{}n}, \texttt{\$}, or
\texttt{\$\$}. There are several ways to resolve this:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\tightlist
\item
\textbf{Export the notebook as a PDF}: In the upper left hand menu, go
to \texttt{File} -\textgreater{}
\texttt{Save\ and\ Export\ Notebook\ As} -\textgreater{} \texttt{PDF}.
Upload this file to Gradescope under the ``Submit PDF'' option.
\item
\textbf{Print the notebook from HTML}: In the upper left hand menu, go
to \texttt{File} -\textgreater{}
\texttt{Save\ and\ Export\ Notebook\ As} -\textgreater{}
\texttt{HTML}. In the new tab that will open up, print the website by
typing \texttt{ctrl} + \texttt{p} (Windows) or \texttt{cmd} +
\texttt{p} (Mac).
\item
\textbf{Take screenshots}: If you're short on time, your best bet is
to take screenshots of your free response answers. When submitting to
Gradescope, choose the ``Submit Images'' options instead of the
``Submit PDF'' option.
\item
\textbf{Removing special LaTeX characters}: If you have more time and
would like the Datahub-generated PDF, please remove any special LaTeX
characters from your free response answers.
\end{enumerate}
If you use an alternate form of submission listed above, you don't need
to worry if you can't select pages or if the selection doesn't align.
We'll manually look through your submission when grading, and will
account for that.
\section{\texorpdfstring{I can't open Jupyter:
\texttt{HTTP\ ERROR\ 431}}{I can't open Jupyter: HTTP ERROR 431}}\label{i-cant-open-jupyter-http-error-431}
If this happens, try
\href{https://support.google.com/accounts/answer/32050?hl=en&co=GENIE.Platform\%3DDesktop}{clearing
your browser cache} or opening Datahub in an incognito window.
\section{Datahub is not loading}\label{datahub-is-not-loading}
If your link to Datahub is not loading, go to
\url{https://data100.datahub.berkeley.edu/hub/home} and restart your
server.
\bookmarksetup{startatroot}
\chapter{Autograder and Gradescope}\label{autograder-and-gradescope}
Citation
Many of these common questions were taken and modified from the UC San
Diego course DSC 10: Principles of Data Science and their
\href{https://dsc10.com/debugging/}{debugging guide}.
\section{Autograder}\label{autograder}
\subsection{Understanding autograder error
messages}\label{understanding-autograder-error-messages}
When you pass a test, you'll see a nice message and a cute emoji!
When you don't, however, the message can be a little confusing.
The best course of action is to find the test case that failed and use
that as a starting point to debug your code.
In the example above, we see that the test case in green,
\texttt{max\_swing\ in\ set(bus{[}\textquotesingle{}name\textquotesingle{}{]})},
is not passing. The actual output (in blue) is often hard to parse, so
the best course of action is to:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
Make a new (temporary) cell after the \texttt{grader.check(...)} cell.
Please do not make a new cell in between the given code cell and the
\texttt{grader.check(...)} cell, as it could mess with the results.
\item
Copy and paste the failing test case into your temporary cell and run
it.
\begin{enumerate}
\def\labelenumii{\alph{enumii}.}
\tightlist
\item
If it's giving you an error like in the example above, look at the
last line of the error and use the Debugging Guide's search
functionality in the top left menu to find the corresponding guide.
\item
If it's not giving you an error, it'll likely give you an output
like \texttt{False}. This means that your code does not cause an
error (yay!), but it returns an incorrect output. In these casses,
inspect each individual element of the test case. The example above
checks if \texttt{max\_swing} is in
\texttt{set(bus{[}\textquotesingle{}name\textquotesingle{}{]})}, so
it might be a good idea to display both variables and do a visual
check.
\item
If you're still having issues, post on Ed!
\end{enumerate}
\item
After your \texttt{grader.check(...)} passes, feel free to delete the
temporary cell.
\end{enumerate}
\subsection{\texorpdfstring{Why do I get an error saying
``\texttt{grader\ is\ not\ defined}''?}{Why do I get an error saying ``grader is not defined''?}}\label{why-do-i-get-an-error-saying-grader-is-not-defined}
If it has been a while since you've worked on an assignment, the kernel
will shut itself down to preserve memory. When this happens, all of your
variables are forgotten, including the grader. That's OK. The easiest
way to fix this is by
\href{https://ds100.org/debugging-guide/jupyter101/jupyter101.html\#restarting-kernel}{restarting
your kernel and rerunning all the cells}. To do this, in the top left
menu, click \texttt{Kernel} -\textgreater{}
\texttt{Restart\ and\ Run\ All\ Cells}.
\subsection{I'm positive I have the right answer, but the test fails. Is
there a mistake in the
test?}\label{im-positive-i-have-the-right-answer-but-the-test-fails.-is-there-a-mistake-in-the-test}
While you might see the correct answer displayed as the result of the
cell, chances are your solution isn't being stored in the answer
variable. Make sure you are assigning the result to the answer variable
and that there are no typos in the variable name. Finally,
\href{https://ds100.org/debugging-guide/jupyter101/jupyter101.html\#restarting-kernel}{restart
your kernel and run all the cells in order}: \texttt{Kernel}
-\textgreater{} \texttt{Restart\ and\ Run\ All\ Cells}.
\subsection{\texorpdfstring{Why does the last \texttt{grader.export}
cell fail if all previous tests
passed?}{Why does the last grader.export cell fail if all previous tests passed?}}\label{why-does-the-last-grader.export-cell-fail-if-all-previous-tests-passed}
This can happen if you ``overwrite'' a variable that is used in a
question. For instance, say Question 1 asks you to store your answer in
a variable named \texttt{stat} and, later on in the notebook, you change
the value of \texttt{stat}; the test right after Question 1 will pass,
but the test at the end of the notebook will fail. It is good
programming practice to give your variables informative names and to
avoid repeating the same variable name for more than one purpose.
\subsection{Why does a notebook test fail now when it passed before, and
I didn't change my
code?}\label{why-does-a-notebook-test-fail-now-when-it-passed-before-and-i-didnt-change-my-code}
You probably ran your notebook out of order.
\href{https://ds100.org/debugging-guide/jupyter101/jupyter101.html\#running-cells}{Re-run
all previous cells} in order, which is how your code will be graded.
\section{Gradescope}\label{gradescope}
When submitting to Gradescope, there are often unexpected errors that
make students lose more points than expected. Thus, it is imperative
that you \textbf{stay on the submission page until the autograder
finishes running}, and the results are displayed.
\subsection{Why did a Gradescope test fail when all the Jupyter
notebook's tests
passed?}\label{why-did-a-gradescope-test-fail-when-all-the-jupyter-notebooks-tests-passed}
This can happen if you're running your notebook's cells out of order.
The autograder runs your notebook from top-to-bottom. If you're defining
a variable at the bottom of your notebook and using it at the top, the
Gradescope autograder will fail because it doesn't recognize the
variable when it encounters it.
This is why we recommend going into the top left menu and clicking
\texttt{Kernel} -\textgreater{} \texttt{Restart} -\textgreater{}
\texttt{Run\ All}. The autograder ``forgets'' all of the variables and
runs the notebook from top-to-bottom like the Gradescope autograder
does. This will highlight any issues.
Find the first cell that raises an error. Make sure that all of the
variables used in that cell have been defined above that cell, and not
below.
\subsection{\texorpdfstring{Why do I get a
\texttt{NameError:\ name\ \_\_\_\ is\ not\ defined} when I run a grader
check?}{Why do I get a NameError: name \_\_\_ is not defined when I run a grader check?}}\label{why-do-i-get-a-nameerror-name-___-is-not-defined-when-i-run-a-grader-check}
This happens when you try to access a variable that has not been defined
yet. Since the autograder runs all the cells in-order, if you happened
to define a variable in a cell further down and accessed it before that
cell, the autograder will likely throw this error. Another reason this
could occur is because the notebook was not saved before the autograder
tests are run. When in doubt, it is good practice to restart your
kernel, run all the cells again, and save the notebook before running
the cell that causes this error.
\subsection{My autograder keeps running/timed
out}\label{my-autograder-keeps-runningtimed-out}
If your Gradescope submission page has been stuck running on this page
for a while:
or if it times out:
it means that the Gradescope autograder failed to execute in the
expected amount of time. This could be due to an inefficiency in your
code or a problem on Gradescope's end, so we recommend resubmitting and
letting the autograder rerun. \textbf{It is your responsibility to
ensure that the autograder runs properly}, and, if it still fails, to
follow up by making a private Ed post.
\bookmarksetup{startatroot}
\chapter{Pandas}\label{pandas}
\section{\texorpdfstring{Understanding \texttt{pandas}
errors}{Understanding pandas errors}}\label{understanding-pandas-errors}
\texttt{pandas} errors can look red, scary, and very long. Fortunately,
we don't need to understand the entire thing! The most important parts
of an error message are at the \textbf{top}, which tells you which line
of code is causing the issue, and at the \textbf{bottom}, which tells
you exactly what the error message is.
This note is (mostly) structured around the error messages that show up
at the bottom.
\section{My code is taking a really long time to
run}\label{my-code-is-taking-a-really-long-time-to-run}
It is normal for a cell to take a few seconds -- sometimes a few minutes
-- to run. If it's is taking too long, however, you have several
options:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\tightlist
\item
Try restarting the kernel. Sometimes, Datahub glitches or lags,
causing the code to run slower than expected.
\href{https://ds100.org/debugging-guide/jupyter101/jupyter101.html\#restarting-kernel}{Restarting
the kernel} should fix this problem, but if the cell is still taking a
while to run, it is likely a problem with your code.
\item
Scrutinize your code. Am I using too many for loops? Is there a
repeated operation that I can substitute with a \texttt{pandas}
function?
\end{enumerate}
\section{Why is it generally better avoid using loops or list
comprehensions when
possible?}\label{why-is-it-generally-better-avoid-using-loops-or-list-comprehensions-when-possible}
In one word: performance. \texttt{NumPy} and \texttt{pandas} functions
are optimized to handle large amounts of data in an efficient manner.
Even for simple operations, like the elementwise addition of two arrays,
\texttt{NumPy} arrays are much faster and scale better (feel free to
experiment with this yourself using \texttt{\%\%time}). This is why we
encourage you to \textbf{vectorize} your code (ie. using \texttt{NumPy}
arrays, \texttt{Series}, or \texttt{DataFrames} instead of Python lists)
and use in-built \texttt{NumPy}/\texttt{pandas} functions wherever
possible.
\section{KeyErrors}\label{keyerrors}
\subsection{\texorpdfstring{\texttt{KeyError:\ \textquotesingle{}column\_name\textquotesingle{}}}{KeyError: \textquotesingle column\_name\textquotesingle{}}}\label{keyerror-column_name}
This error usually happens when we have a \texttt{DataFame} called
\texttt{df}, and we're trying to do an operation on a column
\texttt{\textquotesingle{}column\_name\textquotesingle{}} that does not
exist. If you encounter this error, double check that you're operating
on the right column. It might be a good idea to display \texttt{df} and
see what it looks like. You could also call \texttt{df.columns} to list
all the columns in \texttt{df}.
\section{TypeErrors}\label{typeerrors}
\subsection{\texorpdfstring{\texttt{TypeError:\ \textquotesingle{}\_\_\_\textquotesingle{}\ object\ is\ not\ callable}}{TypeError: \textquotesingle\_\_\_\textquotesingle{} object is not callable}}\label{typeerror-___-object-is-not-callable}
This often happens when you use a default keyword (like \texttt{str},
\texttt{list}, \texttt{range}, \texttt{bool}, \texttt{sum}, or
\texttt{max}) as a variable name, for instance:
\begin{verbatim}
sum = 1 + 2 + 3
\end{verbatim}
These errors can be tricky because they don't error on their own but
cause problems when we try to use the name \texttt{sum} (for example)
later on in the notebook.
To fix the issue, identify any such lines of code (Ctrl+F on
``\texttt{sum\ =}'' for example), change your variable names to be
something more informative, and
\href{https://ds100.org/debugging-guide/jupyter101/jupyter101.html\#restarting-kernel}{restart
your notebook}.
Python keywords like \texttt{str} and \texttt{list} appear in green
text, so be on the lookout if any of your variable names appear in
green!
\subsection{\texorpdfstring{\texttt{TypeError:\ could\ not\ convert\ string\ to\ a\ float}}{TypeError: could not convert string to a float}}\label{typeerror-could-not-convert-string-to-a-float}
This error often occurs when we try to do math operations (ie.
\texttt{sum}, \texttt{average}, \texttt{min}, \texttt{max}) on a
\texttt{DataFrame} column or \texttt{Series} that contains strings
instead of numbers (note that we can do math operations with booleans;
Python treats \texttt{True} as 1 and \texttt{False} as 0).
Double check that the column you're interested in is a numerical type
(\texttt{int}, \texttt{float}, or \texttt{double}). If it looks like a
number, but you're still getting this error, you can use
\texttt{.astype(...)}
(\href{https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html}{documentation})
to change the datatype of a \texttt{DataFrame} or \texttt{Series}.
\subsection{\texorpdfstring{\texttt{TypeError:\ Could\ not\ convert\ \textless{}string\textgreater{}\ to\ numeric}}{TypeError: Could not convert \textless string\textgreater{} to numeric}}\label{typeerror-could-not-convert-string-to-numeric}
Related to the above (but distinct), you may run into this error when
performing a numeric aggregation function (like \texttt{mean} or
\texttt{sum} functions that take integer arguments) after doing a
\texttt{groupby} operation on a \texttt{DataFrame} with non-numeric
columns.
Working with the \texttt{elections} dataset for example,
\begin{verbatim}
elections.groupby('Year').agg('mean')
\end{verbatim}
would error because \texttt{pandas} cannot compute the mean of the names
of presidents. There are three ways to get around this:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\tightlist
\item
Select only the numeric columns you are interested in before applying
the aggregation function. In the above case, both
\texttt{elections.groupby(\textquotesingle{}Year\textquotesingle{}){[}\textquotesingle{}Popular\ Vote\textquotesingle{}{]}}
or
\texttt{elections{[}\textquotesingle{}Popular\ vote\textquotesingle{}{]}.groupby(\textquotesingle{}Year\textquotesingle{})}
would work.
\item
Setting the \texttt{numeric\_only} argument to \texttt{True} in the
\texttt{.agg} call, thereby applying the aggregation function only to
numeric columns. For example,
\texttt{elections.groupby(\textquotesingle{}Year\textquotesingle{}).agg(\textquotesingle{}mean\textquotesingle{},\ numeric\_only=True)}.
\item
Passing in a dictionary to \texttt{.agg} where you specify the column
you are applying a particular aggregation function to. Continuing the
same example, this looks like
\texttt{elections.groupby(\textquotesingle{}Year\textquotesingle{}).agg(\{\textquotesingle{}Popular\ vote\textquotesingle{}\ :\ \textquotesingle{}mean\textquotesingle{})}.
\end{enumerate}
\subsection{\texorpdfstring{\texttt{TypeError:\ \textquotesingle{}NoneType\textquotesingle{}\ object\ is\ not\ subscriptable}
/
\texttt{AttributeError:\ \textquotesingle{}NoneType\textquotesingle{}\ object\ has\ no\ attribute\ \textquotesingle{}shape\textquotesingle{}}}{TypeError: \textquotesingle NoneType\textquotesingle{} object is not subscriptable / AttributeError: \textquotesingle NoneType\textquotesingle{} object has no attribute \textquotesingle shape\textquotesingle{}}}\label{typeerror-nonetype-object-is-not-subscriptable-attributeerror-nonetype-object-has-no-attribute-shape}
This usually occurs as you assign a \texttt{None} value to a variable,
then try to either index into or access some attribute of that variable.
For Python functions like \texttt{append} and \texttt{extend}, you do
not need to do any variable assignment because they mutate the variable
directly and return \texttt{None}. Assigning \texttt{None} tends to
happen as a result of code like:
\begin{verbatim}
some_list = some_list.append(element)
\end{verbatim}
In contrast, an operation like \texttt{np.append} does not mutate the
variable in place and, instead, returns a copy. In these cases,
(re)assignment is necessary:
\begin{verbatim}
some_array = np.append(some_array, element)
\end{verbatim}
\subsection{\texorpdfstring{\texttt{TypeError:\ \textquotesingle{}int\textquotesingle{}/\textquotesingle{}float\textquotesingle{}\ object\ is\ not\ subscriptable}}{TypeError: \textquotesingle int\textquotesingle/\textquotesingle float\textquotesingle{} object is not subscriptable}}\label{typeerror-intfloat-object-is-not-subscriptable}
This occurs when you try and index into an integer or other numeric
\texttt{Python} data type. It can be confusing to debug amidst a muddle
of code, but you can use the error message to identify which variable is
causing this error. Using \texttt{type(var\_name)} to check the data
type of the variable in question can be a good starting point.
\section{IndexErrors}\label{indexerrors}
\subsection{\texorpdfstring{\texttt{IndexError:\ invalid\ index\ to\ scalar\ variable.}}{IndexError: invalid index to scalar variable.}}\label{indexerror-invalid-index-to-scalar-variable.}
This error is similar to the last \texttt{TypeError} in the previous
section. However, it is slightly different in that scalar variables come
up in the context of \texttt{NumPy} data types which have slightly
different attributes.
For a concrete example, if you defined
\begin{verbatim}
numpy_arr = np.array([1])
\end{verbatim}
and indexed into it twice (\texttt{numpy\_arr{[}0{]}{[}0{]}}), you would
get the above error. Unlike a Python integer whose type is \texttt{int},
\texttt{type(numpy\_arr{[}0{]})} returns the \texttt{NumPy} version of
an integer, \texttt{numpy.int64}. Additionally, you can check the data
type by accessing the \texttt{.dtype} attribute of \texttt{NumPy} array
(\texttt{numpy\_arr.dtype}) or scalar variable
(\texttt{numpy\_arr{[}0{]}.dtype}).
\subsection{\texorpdfstring{\texttt{IndexError:\ index\ \_\ is\ out\ of\ bounds\ for\ axis\ \_\ with\ size\ \_}}{IndexError: index \_ is out of bounds for axis \_ with size \_}}\label{indexerror-index-_-is-out-of-bounds-for-axis-_-with-size-_}
This error usually happens when you try to index a value that's greater
than the size of the array/list/\texttt{DataFrame}/\texttt{Series}. For
example,
\begin{verbatim}
some_list = [2, 4, 6, 8]
\end{verbatim}
\texttt{some\_list} has a length of 4. Trying \texttt{some\_list{[}6{]}}
will error because index 6 is greater than the length of the array. Note
that \texttt{some\_list{[}4{]}} will also cause an \texttt{IndexError}
because Python and \texttt{pandas} uses zero indexing, which means that
the first element has index 0, the second element has index 1, etc.;
\texttt{some\_list{[}4{]}} would grab the fifth element, which is
impossible when the list only has 4 elements.
\section{ValueErrors}\label{valueerrors}
\subsection{\texorpdfstring{\texttt{ValueError:\ Truth\ value\ of\ a\ Series\ is\ ambiguous}}{ValueError: Truth value of a Series is ambiguous}}\label{valueerror-truth-value-of-a-series-is-ambiguous}
This error could occur when you apply Python logical operators
(\texttt{or}, \texttt{and}, \texttt{not}), which only operate on a
single boolean values, to \texttt{NumPy} arrays or \texttt{Series}
objects, which can contain multiple values. The fix is to use bitwise
operators \texttt{\textbar{}}, \texttt{\&}, \texttt{\textasciitilde{}} ,
respectively, to allow for element-wise comparisons between values in
arrays or \texttt{Series}.
Alternatively, these errors could emerge due to overwriting Python
keywords like \texttt{bool} and \texttt{sum} that may be used in the
autograder tests, similar to what's described
\href{https://ds100.org/debugging-guide/pandas/pandas.html\#typeerror-___-object-is-not-callable}{here}.
You should follow a similar procedure of identifying the line of code
erroring, checking if you've overwritten any Python keywords using
Ctrl+F, and renaming those variables to something more informative
before restarting your kernel and running the erroring tests again.
\subsection{\texorpdfstring{\texttt{ValueError:\ Can\ only\ compare\ identically-labeled\ Series\ objects}}{ValueError: Can only compare identically-labeled Series objects}}\label{valueerror-can-only-compare-identically-labeled-series-objects}
As the message would suggest, this error occurs when comparing two
\texttt{Series} objects that have different lengths. You can double
check the lengths of the \texttt{Series} using
\texttt{len(series\_name)} or \texttt{series\_name.size}.
\subsection{\texorpdfstring{\texttt{ValueError:\ -1\ is\ not\ in\ range}
/
\texttt{KeyError:\ -1}}{ValueError: -1 is not in range / KeyError: -1}}\label{valueerror--1-is-not-in-range-keyerror--1}
This error occurs when you try and index into a \texttt{Series} or
\texttt{DataFrame} as you would a Python list. Unlike a list where
passing an index of -1 gives the last element, \texttt{pandas}
interprets \texttt{df{[}-1{]}} as an attempt to find the row
corresponding to index -1 (that is, \texttt{df.loc{[}-1{]}}). If your
intention is to pick out the last row in \texttt{df}, consider using
integer-position based indexing by doing \texttt{df.iloc{[}-1{]}}. In
general, to avoid ambiguity in these cases, it is also good practice to
write out both the row and column indices you want with
\texttt{df.iloc{[}-1,\ :{]}}.
\bookmarksetup{startatroot}
\chapter{RegEx}\label{regex}
RegEx syntax can be incredibly confusing, so we highly encourage using
sources like the Data 100 Exam reference sheet (you can find this under
the ``Exam Resources'' section on our
\href{https://ds100.org/sp24/resources/}{Resources page}) or websites
like \href{https://regex101.com/}{regex101.com} to help build your
understanding.
\section{How to Interpret regex101}\label{how-to-interpret-regex101}
\href{https://regex101.com/}{Regex101} is a great tool that helps you
visually interact with RegEx patterns. Let's take a look at its
components with a simple example.
\subsection{Example 1: Basic}\label{example-1-basic}
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\setcounter{enumi}{-1}
\tightlist
\item
\textbf{Flavor}: Regular expressions work slightly differently
depending on the programming language you use. In Data 100, we only
use the \texttt{Python} flavor. By default, regex101 opens on the
PCRE2 flavor, so make sure to change to \texttt{Python} before
experimenting.
\item
\textbf{Regular Expression}: This is where the RegEx expression goes.
For this example, our pattern is \texttt{Data\ 100}. In
\texttt{Python}, we denote it as a string \texttt{r"Data\ 100"} with
the prefix \texttt{r} to indicate that this is a RegEx expression, not
a normal \texttt{Python} string. In regex101, because we changed to
the \texttt{Python} flavor, we don't need to type out the \texttt{r"}
at the start or the \texttt{"} at the end, as that's already set up
for us.
\item
\textbf{Explanation}: This portion of the website explains each
component of the pattern above. Since it does not contain any special
characters, \texttt{Data\ 100} will match any portion of a string
containing \texttt{Data\ 100}.
\item
\textbf{Test String}: This is where you can try out different inputs
and see if they match the RegEx pattern. Of the 4 example sentences,
we see that only the first sentence contains characters that match the
pattern, highlighted in blue. (Note that while sentence 3 does contain
\texttt{data\ 100}, RegEx is sensitive to capitalization. \texttt{d}
and \texttt{D} are different characters)
\item
\textbf{Match Information}: Each match between the RegEx expression
and test strings is shown here.
\end{enumerate}
\subsection{Example 2: Greedy}\label{example-2-greedy}
For this example, let's replace the \texttt{100} in our original
expression with \texttt{\textbackslash{}d+} so that our pattern is
\texttt{Data\ \textbackslash{}d+}
\texttt{\textbackslash{}d} and \texttt{+} are both special operators,
and the explanation on the top right (boxed in red) tells us what they
do:
\begin{itemize}
\tightlist
\item
\texttt{\textbackslash{}d} matches digits, or any number between 0 and
9. It's equivalent to \texttt{{[}0-9{]}}.
\item
\texttt{+} matches the previous token \(\geq 1\) times. It is a
\emph{greedy operation}, meaning it will match as many characters as
possible.
\end{itemize}
Altogether, the expression \texttt{\textbackslash{}d+} will match any
digit one or more times. Look at each match under ``Match Information''.
Can you see why they align with \texttt{Data\ \textbackslash{}d+}?
\subsection{Example 3: Capturing
Groups}\label{example-3-capturing-groups}
Let's say we're given a body of text with dates formatted as
\texttt{DD/Month/YYYY} (ie. 04/Jan/2014), and we're interested in
extracting the dates. An expression like
\texttt{r"\textbackslash{}d+\textbackslash{}/\textbackslash{}w+\textbackslash{}/\textbackslash{}d+"}
would match any string with the \texttt{DD/Month/YYYY} format:
\begin{itemize}
\tightlist
\item
the first \texttt{\textbackslash{}d+} matches \texttt{DD} patterns
(ie. \texttt{04})
\item
\texttt{\textbackslash{}/} matches the \texttt{/} separator. Since
\texttt{/} is a special operator in RegEx, we need to escape it with
\texttt{\textbackslash{}} to get the literal character.
\item
\texttt{\textbackslash{}w+} in the middle matches \texttt{Month}
patterns we're interested in (ie. \texttt{Jan}, \texttt{January})
\item
lastly, \texttt{\textbackslash{}d+} matches \texttt{YYYY} patterns
(ie. \texttt{2014})
\end{itemize}
That's great! This pattern will match the entirety of
\texttt{DD/Month/YYYY}, but what if we want to access \texttt{DD}
individually? What about \texttt{YYYY}? This is where \textbf{capturing
groups} comes in handy. Capturing groups are RegEx expressions
surrounded by parenthesis \texttt{()} that are used to remember the text
they match so that it can be referenced later. Putting capturing groups
around \texttt{\textbackslash{}d+} and \texttt{\textbackslash{}w+} to
get
\texttt{r"(\textbackslash{}d+)\textbackslash{}/(\textbackslash{}w+)\textbackslash{}/(\textbackslash{}d+)"}gives
us the following:
\begin{itemize}
\tightlist
\item
The ``Explanation'' section now shows an explanation for each of the 3
capturing groups.
\item
In our test strings, the portion matching the RegEx expression is