-
Notifications
You must be signed in to change notification settings - Fork 0
/
ml.html
210 lines (199 loc) · 11.6 KB
/
ml.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
<!DOCTYPE HTML>
<!--
Forty by HTML5 UP
html5up.net | @ajlkn
Free for personal and commercial use under the CCA 3.0 license (html5up.net/license)
-->
<html>
<head>
<title>Machine learning module</title>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no" />
<link rel="stylesheet" href="assets/css/main.css" />
<noscript><link rel="stylesheet" href="assets/css/noscript.css" /></noscript>
</head>
<body class="is-preload">
<!-- Wrapper -->
<div id="wrapper">
<!-- Header -->
<!-- Note: The "styleN" class below should match that of the banner element. -->
<header id="header" class="alt style2">
<a href="index.html" class="logo"><span>PGDip AI ePortfolio </span><strong>Rory Maclean</strong></a>
<nav>
<a href="#menu">Menu</a>
</nav>
</header>
<!-- Menu -->
<nav id="menu">
<ul class="links">
<li><a href="index.html">Home</a></li>
<li><a href="about-me.html">About me</a></li>
<li><a href="#">One: induction</a></li>
<li><a href="#">Two: understanding artificial intelligence</a></li>
<li><a href="#">Three: numerical analysis</a></li>
<li><a href="ml.html">Four: machine learning</a></li>
<li><a href="#">Five: knowledge representation and reasoning</a></li>
<li><a href="intell-agents.html">Six: intelligent agents</a></li>
<li><a href="#">Seven: research methods and professional practice</a></li>
</ul>
</nav>
<!-- Banner -->
<!-- Note: The "styleN" class below should match that of the header element. -->
<section id="banner" class="style2">
<div class="inner">
<span class="image">
<img src="images/pic07.jpg" alt="" />
</span>
<header class="major">
<h1>Machine learning module</h1>
</header>
<div class="content">
<p>Reflections & artefacts created during the module</p>
</div>
</div>
</section>
<!-- Main -->
<div id="main">
<!-- One -->
<section id="one">
<div class="inner">
<header class="major">
<h2>Learning outcomes</h2>
</header>
<ol>
<li>Learn about the key paradigms and algorithms in machine learning.</li>
<li>Get an understanding of data analytics based on machine learning and using modern programming tools, such as Python or R.</li>
<li>Experience how machine learning and data analytics can be used in real-world applications.</li>
<li>Acquire the ability to gather and synthesise information from multiple sources to aid in the systematic analysis of complex problems using machine learning tools and algorithms.</li>
</ol>
</div>
</section>
<!-- Two -->
<section id="two" class="spotlights">
<section>
<a href="artefact_mno.html" class="image">
<img src="images/pic08.jpg" alt="" data-position="center center" />
</a>
<div class="content">
<div class="inner">
<header class="major">
<h3>Exploratory Data Analysis</h3>
</header>
<p>After the introduction to machine learning, one of the most useful exercises was the EDA tutorial,
the focus of Unit 2. I have used EDA on many medical datasets, in particular focusing on missing
values, but I had not ever used the python package missingno before; this was very instructive,
especially the heatmap method to look for nullity correlation (Fig. 1), which I will use again in future EDA</p>
<p><a href="artefact_mno.html">Click to see details about `missingno`</a></p>
</div>
</div>
</section>
<section>
<a href="artefact_ml_kmeans.html" class="image">
<img src="images/pic09.jpg" alt="" data-position="top center" />
</a>
<div class="content">
<div class="inner">
<header class="major">
<h3>K-means</h3>
</header>
<p>The next notable technique to be introduced in the module was clustering, we focused on K-means clustering. This is a very useful and informative technique, which I used in my coursework in a previous module to develop a prototype product to segment the customers of a hypothetical bank. It was useful however to return to this technique, as I learned about how to choose the optimal number of clusters. Going into the unit, I was aware that the number of clusters, k, is a parameter of the algorithm that must be chosen for each run. But it was useful to learn, through the workbook and seminar, that methods such as silhouette analysis can be used to check for the optimal (or approximately optimal) number of clusters </p>
<p><a href="artefact_ml_kmeans.html">Click to see details about K-means silhouette plot</a></p>
</div>
</div>
</section>
<section>
<a href="artefact_shap.html" class="image">
<img src="images/pic09.jpg" alt="" data-position="top center" />
</a>
<div class="content">
<div class="inner">
<header class="major">
<h3>AirBnB Team Task</h3>
</header>
<p>The first team task to be undertaken was the AirBnB task. Here the problem setup was to define an important business question using the AirBnB listing dataset from Kaggle. Piotr and I discussed over video chat and had a range of hypotheses. However, we both agreed that the most important business question was around which factor affected the listing price the most. We moved from linear regression to XGBoost regression, a tree-based method. Whilst XGBoost has in-build feature importance, we decided instead to use permutation importance which is universal. I had heard about SHAPley additive explanations as a useful ‘explainable AI’ tool and explained this method to the team. Piotr thought it was great, and especially the visually interesting plots that show which are the influential variables</p>
<p>The key reflections from the AirBnB task were around the remote collaboration on the project and task. We wanted to use a shared jupyter notebook but were not yet using google colab to share the code (we used this later in the CIFAR task). In terms of collaboration, we met regularly, and were able to keep to our deadlines to progress with the project. This meant that getting the document prepared on time was straightforward.</p>
<p><a href="artefact_shap.html">Click to see details about SHAP analysis</a></p>
</div>
</div>
</section>
<section>
<a href="#" class="image">
<img src="images/pic09.jpg" alt="" data-position="top center" />
</a>
<div class="content">
<div class="inner">
<header class="major">
<h3>Learning about neural networks</h3>
</header>
<p>As the module further progressed, we moved towards neural networks. I was particularly looking forward to this phase of the module, as this was new territory for me. I was keen to use the workbooks and seminars to get a grounding in how neural networks, starting with a simple feedforward model, functions and is constructed as I knew that we would have to build and test a neural network in the team task. Using the workbooks, I was able to examine the basic unit of the neural network, the perceptron, which functions a lot like a logistic regression: with multiple inputs, a coefficient for each one, and a sigmoid function. However, there are differences too, in that the neurons in a more complex neural network can have differing activation functions such as rectified linear or linear.</p>
</div>
</div>
</section>
<section>
<a href="artefact_activationmap.html" class="image">
<img src="images/pic09.jpg" alt="" data-position="top center" />
</a>
<div class="content">
<div class="inner">
<header class="major">
<h3>Building an image recognition neural network (CIFAR-10)</h3>
</header>
<p>In the CIFAR neural network project, we met initially to determine a strategy for the project. This was a difficult task, and neither of us had experience in building a neural network. We decided to use the same python package ‘keras’, so that we could combine our code. We split the task, so that I would perform the initial experiment, trying an ANN structure on the image data, and then building in a single convolutional layer. This clearly showed the advantage of convolutional architecture for image data. Then adding in layers such as pooling, and finally plotting the activation maps for the convolutional layers. This was a very interesting process, and we really got to grips with how the CNN works 'under the hood’ </p>
<p><a href="artefact_activationmap.html">Click to see details of the CNN actiation map</a></p>
</div>
</div>
</section>
<section>
<a href="#" class="image">
<img src="images/pic09.jpg" alt="" data-position="top center" />
</a>
<div class="content">
<div class="inner">
<header class="major">
<h3>Cause for concern?</h3>
</header>
<p>Another cause for concern is deepfakes; in the reading list of unit 12 a government independent report for the centre for data ethics and innovation reported on the risk of deepfakes (‘Snapshot Paper - Deepfakes and Audiovisual Disinformation’, ND). In particular, that deepfakes can pose a threat to society (e.g. political disinformation) and the individual (e.g. identity theft). Strategies to combat deepfakes including tools to detect deepfakes, or awareness of the public of deepfakes may help.</p>
<p><a href="https://www.gov.uk/government/publications/cdei-publishes-its-first-series-of-three-snapshot-papers-ethical-issues-in-ai/snapshot-paper-deepfakes-and-audiovisual-disinformation">Government Paper on the risk of Deepfakes and Disinformation</a></p>
</div>
</div>
</section>
<!--
<section>
<a href="href" class="image">
<img src="images/pic09.jpg" alt="" data-position="top center" />
</a>
<div class="content">
<div class="inner">
<header class="major">
<h3>title</h3>
</header>
<p>content</p>
<p><a href="href">Click for details</a></p>
</div>
</div>
</section>
-->
</section>
</div>
<!-- Footer -->
<footer id="footer">
<div class="inner">
<ul class="icons">
<li><a href="https://www.linkedin.com/in/rory-maclean/" class="icon brands alt fa-linkedin-in"><span class="label">LinkedIn</span></a></li>
</ul>
<ul class="copyright">
<li>© Rory Maclean 2022</li><li>Design: <a href="https://html5up.net">HTML5 UP</a></li>
</ul>
</div>
</footer>
</div>
<!-- Scripts -->
<script src="assets/js/jquery.min.js"></script>
<script src="assets/js/jquery.scrolly.min.js"></script>
<script src="assets/js/jquery.scrollex.min.js"></script>
<script src="assets/js/browser.min.js"></script>
<script src="assets/js/breakpoints.min.js"></script>
<script src="assets/js/util.js"></script>
<script src="assets/js/main.js"></script>
</body>
</html>