added more

seashello · Dec 10, 2024 · 1076c2e · 1076c2e
1 parent 201ef74
commit 1076c2e
Show file tree

Hide file tree

Showing 8 changed files with 56 additions and 16 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,7 @@
 # Exploring Power Outages + Intentional Attacks
 
 Michelle Hong
+
 This is a project from DSC 80: Practice and Application of Data Science, exploring data regarding power outages.
 
 
@@ -9,15 +10,15 @@ In this project, we will be exploring a power outage dataset and focusing on int
 
 This dataset contains information regarding 1500+ power outages across the US, from 2000-2016. It contains other information, such as the state it occurred in, the cause of the power outage, the state's real GSP that year, etc.. This project focuses on predicting if the cause category of an outage was an intentional attack or not. We also explore other aspects about power outages, such as looking at missingness in the dataset, or plotting different interesting distributions of data. This dataset and question are important to note because power outages affect entire communities, and especially as a Californian, we have experienced many. Additionally, intentional attacks causing power outages can harm millions of people, and different areas may be affected differently. There are 1534 rows of data, and 56 columns (but we will be keeping 8 and creating a new one).
 
-- `US State`: The state that the power outage occurred in
-- `Climate Category`: The climate category during the outage (normal, cold, or warm) (str)
-- `Cause Category`: The reason for the power outage (str)
-- `Outage Duration`: Duration of the power outage in minutes (int)
-- `Demand Loss`: The amount of peak demand loss in Megawatts (int)
-- `Customers Affected`: The number of customers affected by power outage (int)
-- `Population`: Population at the given US state in a year (int)
-- `Popden Rural`: Population density of the urban areas (persons per square mile) (float)
-- `Attack`: Whether the cause of the outage was an intentional attack (bool)
+- `US State:` The state that the power outage occurred in
+- `Climate Category:` The climate category during the outage (normal, cold, or warm) (str)
+- `Cause Category:` The reason for the power outage (str)
+- `Outage Duration:` Duration of the power outage in minutes (int)
+- `Demand Loss:` The amount of peak demand loss in Megawatts (int)
+- `Customers Affected:` The number of customers affected by power outage (int)
+- `Population:` Population at the given US state in a year (int)
+- `Popden Rural:` Population density of the urban areas (persons per square mile) (float)
+- `Attack:` Whether the cause of the outage was an intentional attack (bool)
 
 ## Data Cleaning and Exploratory Data Analysis
 ### Data Cleaning
@@ -101,6 +102,14 @@ Empirical Distribution:
   frameborder="0"
 ></iframe>
 
+Separated by State:
+<iframe
+  src="assets/missingness_by_state.html"
+  width="800"
+  height="600"
+  frameborder="0"
+></iframe>
+
 ## Hypothesis Testing
 Are the demand losses from California and Washington the same? Do they come from the same population, or is one bigger than the other?
 Null: "Demand Loss Mw" of California or Washington come from the same distribution 
@@ -116,7 +125,8 @@ We fail to reject that they come from the same population, with a p-value of 0.1
 ></iframe>
 
 ## Framing a Prediction Problem
-I will predict if a power outage is an intentional attack or not. We will be performing binary classification. I chose to predict if a power outage is an intentional attack or not because there were a lot of outages classified as intentional attacks, and I think it is an interesting topic. I will be using F-score instead of accuracy because only 27% of the cause categories in my dataset are intentional attacks. Accuracy wouldn't be a good measure due to the disparity of the groups sizes. F-score is useful when data is imbalanced and takes into consideration False Negatives and False Positives (and not just True Positives).
+I will predict if a power outage is an intentional attack or not. We will be performing binary classification. I chose to predict if a power outage is an intentional attack or not because there were a lot of outages classified as intentional attacks, and I think it is an interesting topic. I will be using F-score instead of accuracy because only 27% of the cause categories in my dataset are intentional attacks (but still will show the accuracy). Accuracy wouldn't be a good measure due to the disparity of the groups sizes. F-score is useful when data is imbalanced and takes into consideration False Negatives and False Positives (and not just True Positives).
+- Even though we will be focusing on F-score, I will still include accuracy. We want it to be over 73% because the dumbest model (which would predict not "intentional attack" 100% of the time) would have an accuracy of 73%, which is meaningless.
 
 ## Baseline Model
 To predict if a power outage is an intentional attack, we will fit a `DecisionTreeClassifier()`. My model uses the columns:
@@ -146,4 +156,20 @@ Using a RandomForestClassifier() over a DecisionTreeClassifier() was beneficial
 
 
 ## Fairness Analysis
-Does this model perform fairly for 
+Does this model perform fairly for  higher/lower values of `Popden Rural`? We will use the mean of the entire dataset's `Popden Rural` column as a threshold to split values into high and low Popden Rural groups.
+- Group X: Popden Rural > `outages["Popden Rural"].mean()` (39.47349081364819)
+- Group Y: Popden Rural <= 39.47349081364819
+- Null Hypothesis: Our model is fair. Its accuracy for lower and higher rural population densities are roughly the same, and any differences are due to random chance.
+- Alternative Hypothesis: Our model is unfair. The accuracy for rural communities with higher population densities is higher than in rural areas with less dense populations.
+- Evaluation Metric: Accuracy
+- Test statistic: Differences in accuracy
+- Significance level: p = 0.05
+- P-value: 0.412
+- **Conclusion**: With a p-value that is greater than 0.05, we are unable to reject the null. This means that we don't have significant information to prove that our model performs unequally between higher/lower Popden Rural groups. This meas that it likely achieves Demographic Parity.
+
+<iframe
+  src="assets/fairness.html"
+  width="800"
+  height="600"
+  frameborder="0"
+></iframe>
diff --git a/assets/attack_purity.html b/assets/attack_purity.html
diff --git a/assets/cali_wash.html b/assets/cali_wash.html
diff --git a/assets/fairness.html b/assets/fairness.html
diff --git a/assets/missingness_by_state.html b/assets/missingness_by_state.html
@@ -0,0 +1,7 @@
+<html>
+<head><meta charset="utf-8" /></head>
+<body>
+    <div>                        <script type="text/javascript">window.PlotlyConfig = {MathJaxConfig: 'local'};</script>
+        <script charset="utf-8" src="https://cdn.plot.ly/plotly-2.35.2.min.js"></script>                <div id="0a85aa48-6803-4db9-9609-3fa0d5f335a2" class="plotly-graph-div" style="height:100%; width:1000px;"></div>            <script type="text/javascript">                                    window.PLOTLYENV=window.PLOTLYENV || {};                                    if (document.getElementById("0a85aa48-6803-4db9-9609-3fa0d5f335a2")) {                    Plotly.newPlot(                        "0a85aa48-6803-4db9-9609-3fa0d5f335a2",                        [{"alignmentgroup":"True","hovertemplate":"Missing Demand Loss=False\u003cbr\u003eUS State=%{x}\u003cbr\u003evalue=%{y}\u003cextra\u003e\u003c\u002fextra\u003e","legendgroup":"False","marker":{"color":"#636efa","pattern":{"shape":""}},"name":"False","offsetgroup":"False","orientation":"v","showlegend":true,"textposition":"auto","x":["Alabama","Alaska","Arizona","Arkansas","California","Colorado","Connecticut","Delaware","District of Columbia","Florida","Georgia","Hawaii","Idaho","Illinois","Indiana","Iowa","Kansas","Kentucky","Louisiana","Maine","Maryland","Massachusetts","Michigan","Minnesota","Mississippi","Missouri","Montana","Nebraska","Nevada","New Hampshire","New Jersey","New Mexico","New York","North Carolina","North Dakota","Ohio","Oklahoma","Oregon","Pennsylvania","South Carolina","South Dakota","Tennessee","Texas","Utah","Vermont","Virginia","Washington","West Virginia","Wisconsin","Wyoming"],"xaxis":"x","y":[4.0,1.0,11.0,12.0,124.0,5.0,10.0,16.0,6.0,20.0,8.0,4.0,5.0,29.0,22.0,3.0,5.0,6.0,24.0,14.0,41.0,10.0,53.0,8.0,3.0,11.0,2.0,4.0,3.0,7.0,21.0,1.0,30.0,19.0,2.0,20.0,12.0,16.0,32.0,4.0,null,17.0,74.0,20.0,3.0,19.0,51.0,3.0,10.0,4.0],"yaxis":"y","type":"bar"},{"alignmentgroup":"True","hovertemplate":"Missing Demand Loss=True\u003cbr\u003eUS State=%{x}\u003cbr\u003evalue=%{y}\u003cextra\u003e\u003c\u002fextra\u003e","legendgroup":"True","marker":{"color":"#EF553B","pattern":{"shape":""}},"name":"True","offsetgroup":"True","orientation":"v","showlegend":true,"textposition":"auto","x":["Alabama","Alaska","Arizona","Arkansas","California","Colorado","Connecticut","Delaware","District of Columbia","Florida","Georgia","Hawaii","Idaho","Illinois","Indiana","Iowa","Kansas","Kentucky","Louisiana","Maine","Maryland","Massachusetts","Michigan","Minnesota","Mississippi","Missouri","Montana","Nebraska","Nevada","New Hampshire","New Jersey","New Mexico","New York","North Carolina","North Dakota","Ohio","Oklahoma","Oregon","Pennsylvania","South Carolina","South Dakota","Tennessee","Texas","Utah","Vermont","Virginia","Washington","West Virginia","Wisconsin","Wyoming"],"xaxis":"x","y":[2.0,null,17.0,13.0,86.0,10.0,8.0,25.0,4.0,25.0,9.0,1.0,4.0,17.0,21.0,5.0,4.0,7.0,16.0,5.0,17.0,8.0,42.0,7.0,1.0,6.0,1.0,null,4.0,7.0,14.0,7.0,41.0,21.0,null,23.0,12.0,10.0,25.0,4.0,2.0,17.0,53.0,21.0,6.0,18.0,46.0,1.0,10.0,2.0],"yaxis":"y","type":"bar"}],                        {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"}}},"xaxis":{"anchor":"y","domain":[0.0,1.0],"title":{"text":"US State"}},"yaxis":{"anchor":"x","domain":[0.0,1.0],"title":{"text":"value"}},"legend":{"title":{"text":"Missing Demand Loss"},"tracegroupgap":0},"title":{"text":"Missing Demand Loss Counts by State"},"barmode":"group","width":1000},                        {"responsive": true}                    )                };                            </script>        </div>
+</body>
+</html>
diff --git a/assets/missingness_tvd.html b/assets/missingness_tvd.html
diff --git a/assets/states_outage_nums.html b/assets/states_outage_nums.html