-
Notifications
You must be signed in to change notification settings - Fork 13
/
mlmm.html
2049 lines (1950 loc) · 53.7 KB
/
mlmm.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: The Institute for Ethical AI & Machine Learning
description: The Institute for Ethical AI & Machine Learning is a Europe-based research centre that brings togethers technologists, academics and policy-makers to develop industry frameworks that support the responsible development, design and operation of machine learning systems.
image-banner: https://ethical.institute/images/flyer.jpg
---
<html>
<head>
{% include header.html %}
</head>
<body>
<div id="page-wrapper">
{% include navbar.html %}
<!-- Main -->
<div id="main" class="wrapper style1">
<div class="container">
<header class="major">
<h2>AI-RFX Procurement Framework v1.0</h2>
<p>Machine Learning Maturity Model, AI & Machine Learning Solutions</p>
</header>
<!-- Text -->
<section>
<h2 class="western"><a name="docs-internal-guid-bdf78ad6-7fff-9f92-90bf-6b88ae31b755"></a>
0 - Introduction</h2>
<hr>
<h3 class="western">0.1 - Overview</h3>
<p>This “Machine Learning Maturity Model v1.0” is part of the
AI-RFX Procurement Framework, and it is the core of all the templates
including the <a href="https://ethical.institute/rfx.html">”AI
Request for Proposal Template”</a> & the <a href="https://ethical.institute/rfx.html">“AI
Tender Competition Template”</a>.
</p>
<p>The Machine Learning Maturity Model is an extension of <a href="https://ethical.institute/principles.html">The
Principles for Responsible Machine Learning</a>, which aims to
convert the high level Responsible ML Principles into a practical
checklist-style assessment criteria. This “checklist” goes beyond
the machine learning algorithms themselves, and provides an
assessment criteria to evaluate the maturity of the infrastructure
and processes around the algorithms. The concept of “Maturity” is
not just defined as a matter of technical excellence, scientific
rigor, and robust products. It also essentially involves responsible
innovation and development processes, with sensitivity to the
relevant domains of expert knowledge and consideration of all
relevant direct and indirect stakeholders.</p>
<p>The Machine Learning Maturity Model should be a subset of the
overall assessment criteria required to evaluate a proposed solution,
and it is specific to the machine learning piece. It should be
complemented with a traditional assessment of other areas such as the
specific features requested, services needed, and more
domain-specific areas.</p>
<p>Each of the criteria was designed to be linked to each one of the
<a href="https://ethical.institute/principles.html">Principles for
Responsible Machine Learning</a>, and consists of the following:</p>
<div class="table-wrapper">
<table >
<tbody><tr>
<td >
<p>#</p>
</td>
<td >
<p>Assessment Criteria</p>
</td>
<td >
<p>Responsible ML Principle</p>
</td>
</tr>
<tr>
<td >
<p>#1</p>
</td>
<td >
<p>Practical benchmarks</p>
</td>
<td >
<p><a href="https://ethical.institute/principles.html#commitment-6">Principle
#6: Practical accuracy</a></p>
</td>
</tr>
<tr>
<td >
<p>#2</p>
</td>
<td >
<p>Explainability by justification</p>
</td>
<td >
<p><a href="https://ethical.institute/principles.html#commitment-3">Principle
#3: Explainability by justification</a></p>
</td>
</tr>
<tr>
<td >
<p>#3</p>
</td>
<td >
<p>Infrastructure for reproducible operations</p>
</td>
<td >
<p><a href="https://ethical.institute/principles.html#commitment-4">Principle
#4: Reproducible operations</a></p>
</td>
</tr>
<tr>
<td >
<p>#4</p>
</td>
<td >
<p>Data and model assessment processes</p>
</td>
<td >
<p><a href="https://ethical.institute/principles.html#commitment-2">Principle
#2: Bias Evaluation</a></p>
</td>
</tr>
<tr>
<td >
<p>#5</p>
</td>
<td >
<p>Privacy enforcing infrastructure</p>
</td>
<td >
<p><a href="https://ethical.institute/principles.html#commitment-7">Principle
#7: Trust by privacy</a></p>
</td>
</tr>
<tr>
<td >
<p>#6</p>
</td>
<td >
<p>Operational process design</p>
</td>
<td >
<p><a href="https://ethical.institute/principles.html#commitment-1">Principle
#1: Human Augmentation</a></p>
</td>
</tr>
<tr>
<td >
<p>#7</p>
</td>
<td >
<p>Change management capabilities</p>
</td>
<td >
<p><a href="https://ethical.institute/principles.html#commitment-5">Principle
#5: Displacement strategy</a></p>
</td>
</tr>
<tr>
<td >
<p>#8</p>
</td>
<td >
<p>Security risk processes</p>
</td>
<td >
<p><a href="https://ethical.institute/principles.html#commitment-8">Principle
#8: Security risks</a></p>
</td>
</tr>
</tbody></table></div>
<hr>
<h3 class="western">0.2 - About Us</h3>
<p>The Institute for Ethical AI & Machine Learning is a Europe-based
research centre that carries out world class research into
responsible machine learning systems. We are formed by cross
functional teams of applied STEM researchers, philosophers, industry
experts, data scientists and software engineers.</p>
<p>Our vision is to mitigate risks of AI and unlock its full
potential through frameworks that ensure ethical and conscientious
development of intelligent systems across industrial sectors. We are
building the Bell Labs of the 21st Century by delivering breakthrough
contributions through applied AI research. You can find more
information about us at <a href="https://ethical.institute/">https://ethical.institute</a>.</p>
<hr>
<h3 class="western">0.3 - Motivation
<img src="images/MLENG-3.png" name="Image1" width="500" height="500" border="0" style="float:right" align="bottom">
</h3>
<p>There is currently a growing number of companies that are working
towards introducing machine learning systems to automate critical
processes at scale. This has required the “productisation” of
machine learning models, which introduces new complexities. This
complexity revolves around a new set of roles that fall under the
umbrella of “Machine Learning Engineering”. This new set of roles
fall in the intersection between DevOps, data science and
software engineering.</p>
<br style="clear: both">
<img src="images/MLENG-4.png" name="Image2" width="500" height="500" style="float:left" border="0" align="bottom">
<p><br>
</p>
<p>To make things harder, the deployment of machine learning
solutions in industry introduces an even bigger complexity. This
involves the intersection of the new abstract “Machine Learning
Engineering” roles, together with the industry domain experts and
policy makers.</p>
<p>Because of this, there is a strong need to set the AI & ML
standards, so practitioners are empowered to raise the bar for
safety, quality and performance around AI solutions. The AI-RFX
Procurement Framework aims to achieve the first steps towards
this.
</p>
<br style="clear: both">
<hr>
<h3 class="western">0.4 - How to use this document</h3>
<h4 class="western">0.4.1 - Using as reference</h4>
<p>Many procurement managers may already own internally-approved
assessment criteria. If that is the case, this document can be
treated as a reference to obtain insights on key areas that should be
taken into consideration when procuring and evaluation an AI /
Machine Learning solution.</p>
<h4 class="western">0.4.2 - Structure</h4>
<p>Each subsection below consists of a detailed explanation of the
criteria. It is followed by an summary overview of the requirements
expected by the suppliers. Finally it contains a set of detailed
questions that the supplier is expected to answer whether explicitly
or implicitly in their proposal, together with red flags to look out
for in each of the detailed questions.</p>
<h4 class="western">0.4.3 - Example</h4>
<p>The Machine Learning Maturity Model was used to build the <a href="https://ethical.institute/rfx.html">”AI
Request for Proposal Template”</a> & the <a href="https://ethical.institute/rfx.html">“AI
Tender Competition Template”</a>, which are part of the AI-RFX
Procurement Framework.</p>
<h4 class="western">0.4.4 - When to use</h4>
<p>This template is relevant only for the procurement of machine
learning systems, and hence it is only suitable when looking to
automate a process that involves data analysis that is too complex to
be tackled using simple RPA tools or rule-based systems.
</p>
<hr>
<h3 class="western">0.5 - Template vs Reality</h3>
<p>This document should serve as a guide, and doesn’t require
everything to be completed exactly as it’s stated. Especially for
smaller projects, the level of detail required may vary
significantly, and some sections can be left out as required. This
template attempts to to provide a high level overview on each chapter
(and respective sections) so the procurement manager and suppliers
can provide as much content as reasonable.
</p>
<hr>
<h3 class="western">0.6 - Open Source License - Free as in freedom</h3>
<h4 class="western">0.6.1 - Open source License</h4>
<p>This document is open source, which means that it can be updated
by the community. The motivation to release this as open source is so
that it is continuously improved by the community. This will ensure
that the standards for safety, quality and performance of what is
expected in machine learning systems will keep increasing, whilst
being kept in check on a realistic level by both suppliers and
companies.</p>
<h4 class="western">0.6.2 - Contributing.md</h4>
<p>The Institute for Ethical AI & Machine Learning’s AI-RFX
committee is in charge of the contributing community for all of the
templates under the AI-RFX Procurement Framework. Anyone who
would like to contribute, add suggestions, or provide example and
practical uses of this template, please contact us through the
website, or send us an email via <a href="mailto:[email protected]">[email protected]</a>.</p>
<h4 class="western">0.6.3 - License</h4>
<p>This document is registered under <a href="https://github.com/EthicalML/ai-rfx-procurement-framework">this
MIT License</a> (<a href="https://raw.githubusercontent.com/EthicalML/ai-rfx-procurement-framework/master/LICENSE">raw
file</a>), which means that anyone can re-use, modify or enhance this
document as long as credit is given to The Institute for Ethical AI &
Machine Learning. It also includes an “as is” disclaimer. Please
read the license before using this template.</p>
<h2 class="western" >Machine
Learning Maturity Model
</h2>
<hr>
<h3 class="western">1 - Practical benchmarks</h3>
<p>This Machine Learning Maturity Model assessment criteria is
directly aligned with the <a href="https://ethical.institute/principles.html#commitment-1">Responsible
Machine Learning Principle #6 - Practical accuracy</a>.
</p>
<h4 class="western">Explanation</h4>
<ul>
<li>
<p >Having the right benchmark
metrics is one of the most important points to consider during the
evaluation of machine learning solutions. Relevant benchmarks that
are considered in this section include accuracy, time,
time-to-accuracy, and computational resources.</p>
</li><li>
<p >The criteria of what makes
good benchmarks can vary significantly depending on the task
complexity, dataset size, etc. However the objective of this
criteria is to assess that suppliers are able to follow best
practices in data science, and make sure these are aligned with the
use-case requirements.</p>
</li></ul>
<h4 class="western">Requirements</h4>
<ul>
<li>
<p >Suppliers must be able to
demonstrate best practices in software development, data science and
industry-specific knowledge when presenting benchmarks. These
benchmarks include:</p>
<ul>
<li>
<p >Time - Supplier must
provide estimated processing times</p>
</li><li>
<p >Accuracy - Supplier must
provide metrics beyond accuracy as relevant</p>
</li><li>
<p >Time-to-accuracy -
Supplier must provide information on the estimate time and
resources it takes to train new models to a reasonable accuracy
</p>
</li><li>
<p >Computational resources -
Supplier must provide insight on computational resources required
for efficient use of their system</p>
</li></ul>
</li></ul>
<div class="table-wrapper">
<table id="1-checklist">
<tbody><tr>
<td >
<p>#</p>
</td>
<td >
<p>Question</p>
</td>
<td >
<p>Red flags</p>
</td>
</tr>
<tr>
<td >
<p>1.1</p>
</td>
<td >
<p>Does the supplier have a process and/or infrastructure to make
available statistical metrics beyond accuracy?</p>
</td>
<td >
<ul>
<li>
<p >Supplier doesn’t have
a process and/or infrastructure to provide statistical metrics
beyond simple accuracy (e.g. true positive rate, false positive
rate, precision, etc).</p>
</li><li>
<p >Supplier doesn’t
provide reasonable insights (i.e. confusion matrix, learning
curves, error bars, etc)</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>1.2</p>
</td>
<td >
<p>Does the supplier have a process to ensure their machine
learning evaluation metrics (i.e. cost functions & benchmarks)
are aligned to the objective of the use-case?</p>
</td>
<td >
<ul>
<li>
<p >Supplier doesn’t have
a process to ensure that the cost functions they selected reflect
the objectives of the use-case</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>1.3</p>
</td>
<td >
<p>Does the supplier have a process to validate the way they
evaluate predictions as correct or incorrect?</p>
</td>
<td >
<ul>
<li>
<p >Supplier doesn’t have
a process to ensure the methods / function(s) they use to
evaluate a prediction as correct or incorrect is aligned to the
way the relevant domain expert would.</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>1.4</p>
</td>
<td >
<p>Does the supplier use reasonable statistical methods when
comparing performance of different models?</p>
</td>
<td >
<ul>
<li>
<p >Supplier does not use
standard comparison methods such as t-tests, ROC curves, or
relevant metrics when comparing different solutions proposed.</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>1.5</p>
</td>
<td >
<p>Does the supplier provide comprehensible information on the
time performance of their solution?</p>
</td>
<td >
<ul>
<li>
<p >Supplier doesn’t
provide reasonable time benchmarks for tasks, and how the time
behaves as other variables change (data instance size, batch
volumes, etc)
</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>1.6</p>
</td>
<td >
<p>Does the supplier provide comprehensible estimates on time and
resources required to develop a model from scratch to a reasonable
accuracy?</p>
</td>
<td >
<ul>
<li>
<p >Supplier doesn’t have
reasonable estimates for time/resources required to build new
models/capability that is of a reasonable or required accuracy</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>1.7</p>
</td>
<td >
<p>Does the supplier provide minimum and recommended system
requirements?</p>
</td>
<td >
<ul>
<li>
<p >Supplier doesn’t have
reasonable guidance on minimum system requirements to operate
platform efficiently in regards to number of cores, ram required
based on load, storage, etc. unless not relevant (e.g. hosted in
external cloud)</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>1.8</p>
</td>
<td >
<p>Does the supplier provide comprehensible documentation around
their benchmarks?</p>
</td>
<td >
<ul>
<li>
<p >Supplier doesn’t have
a reasonable level of documentation provided with information
about performance metrics
</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>1.9</p>
</td>
<td >
<p>Does the supplier ensure staff in the benchmark processes have
the right exp.?</p>
</td>
<td >
<ul>
<li>
<p >Supplier is not able to
show the staff involved in the setting the benchmarks have a
reasonable level of statistics, and that the domain experts are
involved for decisions where reasonable</p>
</li></ul>
</td>
</tr>
</tbody></table></div>
<hr>
<h3 class="western">2 - Explainability by justification</h3>
<p>This Machine Learning Maturity Model assessment criteria is
directly aligned with the <a href="https://ethical.institute/principles.html#commitment-3">Responsible
Machine Learning Principle #3 - Explainability by justification</a>.
</p>
<h4 class="western">Explanation</h4>
<ul>
<li>
<p >When domain experts are
asked how they came to a specific conclusion, they don’t answer by
pointing to the neurons that fired in their brains. Instead domain
experts provide a “justifiable” explanation of how they came to
that conclusion.</p>
</li><li>
<p >Similarly, with a machine
learning model the objective is not to demand an explanation for
every single weight in the algorithm. Instead, we look for a
justifiable level of reasoning on the end-to-end process around and
within the algorithm.</p>
</li><li>
<p >The level of scrutiny for
an explanation to be “justifiable” will most certainly vary
depending on the critical nature of the use-case, as well as the
level of feedback that can be analysed by humans.</p>
</li></ul>
<h4 class="western">Requirements</h4>
<ul>
<li>
<p >This criteria is heavily
dependent on <a href="https://docs.google.com/document/d/1BmlL-bFJ7nQinnGyseilPEsIZYm0HW6xmA0e4wmXrNU/edit#heading=h.ayj8sjbrkt4q">Criteria
1 - Practical benchmarks</a>, as suppliers have the right processes
and capabilities around their accuracy metrics.</p>
</li><li>
<p >Suppliers must make a
reasonable case about how their solution (or solution + human) will
be able to provide at least the same level (or higher) of
justification when making a final decision on an instance of data
analysis as a domain expert would.
</p>
</li><li>
<p >In order for suppliers to
propose at least the same level of justification, they must also
provide the current level of justification as a benchmark, from a
quantitative perspective.</p>
</li></ul>
<div class="table-wrapper">
<table id="2-checklist">
<tbody><tr>
<td >
<p>#</p>
</td>
<td >
<p>Question</p>
</td>
<td >
<p>Red flags</p>
</td>
</tr>
<tr>
<td >
<p>2.1</p>
</td>
<td >
<p>Does the supplier provide audit trails to assess the data that
went through the models?</p>
</td>
<td >
<ul>
<li>
<p >Supplier doesn’t have
capabilities to provide human-readable audit trails where
reasonable</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>2.2</p>
</td>
<td >
<p>Does the supplier have a process and/or infrastructure to
explain input/feature importance?</p>
</td>
<td >
<ul>
<li>
<p >Supplier doesn’t have
a process and/or infrastructure in place to assess how
inputs/features interact to result in specific predictions</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>2.3</p>
</td>
<td >
<p>Does the supplier provide capabilities to explain how
input/features affect results?</p>
</td>
<td >
<ul>
<li>
<p >Supplier doesn’t
provide ways to explain how inputs/features result in the
inference outcomes where justification is required (e.g. when
there’s a lack of human review, or critical nature of a
use-case)</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>2.4</p>
</td>
<td >
<p>Does the supplier have the process and/or infrastructure to use
model explainability techniques when developing deep learning /
more complex models?</p>
</td>
<td >
<ul>
<li>
<p >Supplier doesn’t have
processes and/or infrastructure to use explainability techniques
(such as SHAP, LIME, aLIME, etc) to increase explainability of
models where required</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>2.5</p>
</td>
<td >
<p>Does the supplier have process and/or infrastructure to work
with domain experts to abstract their knowledge into models?</p>
</td>
<td >
<ul>
<li>
<p >Supplier doesn’t have
a process and/or infrastructure to work with relevant domain
experts and convert key knowledge into inputs/features that can
introduce more levels of explainability to the machine learning
process where reasonable</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>2.6</p>
</td>
<td >
<p>Does the supplier provide comprehensible information around
their explainability processes?</p>
</td>
<td >
<ul>
<li>
<p >Supplier doesn’t have
a reasonable level of documentation provided with information
about the processes they involve around explainability</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>2.7</p>
</td>
<td >
<p>Does the supplier ensure the staff involved in the
explainability processes have the right experience?</p>
</td>
<td >
<ul>
<li>
<p >Supplier is not able to
show the staff involved in the analysis of machine learning
models have a reasonable understanding of machine learning</p>
</li><li>
<p >Supplier is not able to
show the processes ensure they involve domain experts where
reasonable</p>
</li></ul>
</td>
</tr>
</tbody></table></div>
<hr>
<h3 class="western" >3 - Data and
model assessment processes</h3>
<p>This Machine Learning Maturity Model assessment criteria is
directly aligned with the <a href="https://ethical.institute/principles.html#commitment-2">Responsible
Machine Learning Principle #2: Bias evaluation</a>.
</p>
<h4 class="western">Explanation</h4>
<ul>
<li>
<p >Any any non-trivial
decisions (defined as having more than 1 option) always carry an
inherent bias without exception</p>
</li><li>
<p >Hence the objective is not
to remove bias from a machine learning completely. Instead, the
objective is to ensure that the "desired bias" is aligned
with our accuracy/objectives, and "undesired bias" is
identified and mitigated.</p>
</li><li>
<p >To be more specific, bias
in machine learning boils down to the error between development and
production. As a result of this, all machine learning models start
to “degrade” as soon as they are put in production. The reasons
for this include:</p>
<ul>
<li>
<p >Unseen data is not
representative to the data used in development</p>
</li><li>
<p >Temporal data changes as
time goes on (e.g. inflation affects price)</p>
</li><li>
<p >Human-generated data
changes as people and projects change</p>
</li></ul>
</li><li>
<p >Bias in machine learning is
a challenge that can be tackled by ensuring there are processes in
place to identify, document and mitigate bias</p>
</li></ul>
<h4 class="western">Requirements</h4>
<ul>
<li>
<p >This assessment criteria is
heavily dependent on <a href="https://docs.google.com/document/d/1BmlL-bFJ7nQinnGyseilPEsIZYm0HW6xmA0e4wmXrNU/edit#heading=h.l4u8kdy5b7qq">Criteria
1 - Explainability by justification</a>, and <a href="https://docs.google.com/document/d/1BmlL-bFJ7nQinnGyseilPEsIZYm0HW6xmA0e4wmXrNU/edit#heading=h.ayj8sjbrkt4q">Criteria
2 - Practical benchmarks</a> being in place.</p>
</li><li>
<p >Suppliers must be able to
demonstrate processes and infrastructure they have to identify
undesired bias through best practices in data science as well as
awareness of domain-specific considerations</p>
</li></ul>
<p>The Institute for Ethical AI & Machine Learning is working
with the IEEE p7003 working group to develop the <a href="https://standards.ieee.org/project/7003.html">p7003
Algorithmic Bias Considerations standard</a> that will facilitate
this assessment criteria once it is released as suppliers that obtain
this certification will verify that they have the relevant process
towards data and model assessment.</p>
<div class="table-wrapper">
<table id="3-checklist">
<tbody><tr>
<td >
<p>#</p>
</td>
<td >
<p>Question</p>
</td>
<td >
<p>Red flags</p>
</td>
</tr>
<tr>
<td >
<p>3.1</p>
</td>
<td >
<p>Does the supplier have a process to assess representability of
datasets?</p>
</td>
<td >
<ul>
<li>
<p >Supplier doesn’t have
a process in place to assess representability of training data
</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>3.2</p>
</td>
<td >
<p>Does the supplier have a process to identify and document
undesired biases during the development of their models?</p>
</td>
<td >
<ul>
<li>
<p >No process in place to
analyse input/feature importance during the development of a
model</p>
</li></ul>
<ul>
<li>
<p >No process in place to
obtain a breakdown of accuracy metrics on an input/feature level
to identify undesired bias where reasonable</p>
</li><li>
<p >No process in place to
identify wanted/unwanted correlations within the input/features
where reasonable</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>3.3</p>
</td>
<td >
<p>Does the supplier have capabilities to track performance
metrics in production to identify and mitigate new bias?</p>
</td>
<td >
<ul>
<li>
<p >No process and/or
infrastructure in place to identify metrics that should be
tracked in production to alert when a model drops under certain
thresholds where reasonable</p>
</li><li>
<p >If metrics are tracked,
there is no explicit awareness of why they need to be tracked
where required or where not obvious</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>3.4</p>
</td>
<td >
<p>Does the supplier provide comprehensible information around
their data and model evaluation processes?</p>
</td>
<td >
<ul>
<li>
<p >Supplier doesn’t have
a reasonable level of documentation provided with information
about the processes they involve around explainability</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>3.5</p>
</td>
<td >
<p>Does the supplier demonstrate the team they have allocated has
the right expertise to perform the data and model assessment
efficiently?</p>
</td>
<td >
<ul>
<li>
<p >Supplier is not able to
show the staff involved in the analysis of machine learning
models have strong background on statistics and/or machine
learning</p>
</li><li>
<p >Supplier is not able to
show the processes ensure they involve domain experts where
reasonable</p>
</li></ul>
</td>
</tr>
</tbody></table></div>
<p ><br>
<br>
</p>
<hr>
<h3 class="western">4 - Infrastructure for reproducible operations</h3>
<p>This Machine Learning Maturity Model assessment criteria is
directly aligned with the <a href="https://ethical.institute/principles.html#commitment-4">Responsible
Machine Learning Principle #4 - Reproducible operations</a>.
</p>
<h4 class="western">Explanation</h4>
<ul>
<li>
<p >Similar to production
software, machine learning requires infrastructure to ensure
reliable and robust service offerings</p>
</li><li>
<p >Different to traditional
software however, machine learning introduces complexities beyond
the code, such as versioning and orchestration of models</p>
</li><li>
<p >This requirements demand
the suppliers to be conscious of this, and ensure their
infrastructure is able to cope with these challenges</p>
</li></ul>
<h4 class="western">Requirements</h4>
<ul>
<li>
<p >Suppliers must also be able
to demonstrate their capabilities to version, roll-back, diagnose
and/or deploy models to production</p>
</li><li>
<p >Suppliers must have the
processes and/or infrastructure to be able to separate the
development of new models (i.e. new capabilities) from the serving
in production of the models</p>
</li><li>
<p >Suppliers must demonstrate
the ability to scale their services as required by the use-case.
</p>
</li></ul>
<div class="table-wrapper">
<table id="4-checklist">
<tbody><tr>
<td >
<p>#</p>
</td>
<td >
<p>Question</p>
</td>
<td >
<p>Red flags</p>
</td>
</tr>
<tr>
<td >
<p>4.1</p>
</td>
<td >
<p>Does the supplier have process and/or infrastructure to version
models?</p>
</td>
<td >
<ul>
<li>
<p >No infrastructure and/or
processes to version different machine learning models where
reasonable</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>4.2</p>
</td>
<td >
<p>Does the supplier have process/infrastructure to re-train
previous version of models?</p>
</td>
<td >
<ul>
<li>
<p >No infrastructure and/or
processes to re-train previous versions of models where
reasonable</p>
</li></ul>
</td>
</tr>
<tr>
<td >
<p>4.3</p>