-
Notifications
You must be signed in to change notification settings - Fork 0
/
evolution-typology.html
371 lines (346 loc) · 38.9 KB
/
evolution-typology.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
<!DOCTYPE html>
<html><!DOCTYPE html>
<html>
<head>
<!-- Meta -->
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0">
<meta name="description" content="Making Voices Heard | A study by the Centre for Internet and Society, India, supported by Mozilla Corporation" />
<!-- Title + CSS + Favicon -->
<title>Making Voices Heard</title>
<link rel="stylesheet" type="text/css" href="css/semantic.min.css">
<link rel="stylesheet" type="text/css" href="css/style.css">
<link rel="shortcut icon" type="image/x-icon" href="img/favicon.ico" />
<!-- Font Awesome -->
<script src="https://kit.fontawesome.com/4c415b9185.js" crossorigin="anonymous"></script>
</head>
<body>
<!-- Header -->
<div>
<div class="ui fluid container banner">
<div class="banner-image" aria-label="Cats are shown as people using various devices including voice interfaces in shops and houses, with a central banner that shows the title ‘Making Voices Heard’."></div>
</div>
</div>
<!-- Top Navigation Bar -->
<div class="blue nav">
<div class="ui container">
<div class="nav-entries">
<a href="index.html">Home</a>     <a href="design-brief.html">Design Brief</a> <a href="policy-brief.html">Policy Brief</a> <a href="mapping-actors.html">Mapping Actors</a> <a href="index.html#case-studies">Case Studies</a> <a href="index.html#literature-surveys">Literature Surveys</a> <a href="index.html#resources">Resources</a> <span id="report"><a href="docs/MakingVoicesHeard_FullReport.pdf"><i class="fas fa-arrow-circle-down"></i> Get Full Report</a></span>
</div>
</div>
</div>
<!-- Title -->
<div class="grey">
<div class="ui container four column stackable grid">
<div class="one wide column empty">
</div>
<div class="fourteen wide column text">
<h2>Evolution and Typology of Voice Interfaces</h2>
</div>
<div class="one wide column empty">
</div>
<div class="one wide column empty">
</div>
<div class="nine wide column text">
<h3 id="Background">Background</h3>
<p> The availability of multiple modes of interaction such as voice and gesture makes devices accessible to a wide variety of people. Voice interfaces (VI), in particular, create a level playing field for those who are limited by single-language, text-based interfaces.</p>
<p> Schnelle-Walka defines VIs as “user interfaces using speech input through a speech recognizer and speech output through speech synthesis or prerecorded audio”.<sup class="superscript"><a href="#fn1">1</a></sup><a name="ref1"></a> In essence, VI technologies involve two processes: one converting the language to code that a computer understands, and converting the computer language back to a language that the human understands. Considering that the predominant means of input for VIs is speech, they are also known as natural language interfaces.<sup class="superscript"><a href="#fn2">2</a></sup><a name="ref2"></a></p>
<h3 id="Tracing the evolution of VIs">Tracing the evolution of VIs</h3>
<p> Before Siri and Alexa, we had ‘Audrey’, created by Bell Laboratories’ Harry Fletcher and Homer Dudley, who are considered the pioneers of VIs for their groundbreaking research on speech synthesis and human speech modelling.<sup class="superscript"><a href="#fn3">3</a></sup><a name="ref3"></a> In 1952, Audrey was used for number recognition through spoken input.<sup class="superscript"><a href="#fn4">4</a></sup><a name="ref4"></a> A decade later, IBM’s ‘Shoebox’ could not only recognise digits from zero to nine but also comprehend 16 words.<sup class="superscript"><a href="#fn5">5</a></sup><a name="ref5"></a></p>
<p> In 1992, AT&T Telefonica developed a speech-to-speech prototype, VESTS (Voice English/Spanish Translator), which relied heavily on spoken language translation.<sup class="superscript"><a href="#fn6">6</a></sup><a name="ref6"></a> VESTS, a speaker-trained system that could process over 450 words, was exhibited at the Seville World's Fair in Spain. VIs have come a long way from these early prototypes to modern voice assistants, such as Alexa, Siri, Cortana, and the Google Assistant, which are now accessible to consumers worldover.<sup class="superscript"><a href="#fn7">7</a></sup><a name="ref7"></a></p>
<p> One of the main reasons for the proliferation of VIs today is that since 2012 smartphones come with a built-in VI. According to a 2018 PwC survey, consumers issued voice commands most commonly on smartphones from among a plethora of voice-enabled devices.<sup class="superscript"><a href="#fn8">8</a></sup><a name="ref8"></a> Mobile phones now operate almost like ‘shrunken desktops’ because of their inherent operational versatility. However, the reduced screen size is the primary structural limitation of these devices. To overcome this limitation, voice has become an important input to complete tasks without having to use the touch function or type on their phones.<sup class="superscript"><a href="#fn9">9</a></sup><a name="ref9"></a> Hence, developers have now integrated cloud-based voice technologies into devices – as in the case of Amazon Echo and Google Home as well as through open-source initiatives such as Mozilla’s Deep Speech, which is an open-source speech-to-text engine.<sup class="superscript"><a href="#fn10">10</a></sup><a name="ref10"></a></p>
<h3 id="Features of VIs">Features of VIs</h3>
<p>In the early 90s, researchers identified the five basic elements<sup class="superscript"><a href="#fn11">11</a></sup><a name="ref11"></a> of voice processing technologies:</p>
<ol>
<li><p><b>Voice coding:</b> the process of compressing the information transmitted through the voice signal to transmit or store it economically in systems of a lower capacity. </li></p>
<p> </p>
<li><p><b>Voice synthesis:</b> the synthetic replication of voice signals to facilitate the transmission of information from machine to human.</li></p>
<p> </p>
<li><p><b>Speech recognition:</b> the extraction of information that is there in a voice signal to control the actions taken by the device in response to spoken commands.<sup class="superscript"><a href="#fn12">12</a></sup><a name="ref12"></a></li></p>
<li><p><b>Speaker recognition:</b> the identification of voice characteristics for speaker verification. This process ensures that the speaker is verified through their voice characteristics.</li></p>
<li><p><b>Spoken language translation:</b> On recognising the language the person is speaking in, the translation of a message from one language to another. Through this process, two individuals who do not speak the same language can communicate.<sup class="superscript"><a href="#fn13">13</a></sup><a name="ref13"></a></li></p>
<p> Voice output is of two distinct categories: pre-recorded speech and synthetic speech.<sup class="superscript"><a href="#fn14">14</a></sup><a name="ref14"></a> Pre-recorded speech is natural speech that is recorded and stored for future use. In contrast, synthetic speech employs natural language processing (NLP) for the automatic generation of appropriate natural-language responses or output in the form of written text.<sup class="superscript"><a href="#fn15">15</a></sup><a name="ref15"></a></p>
<p> </p>
</ol>
</p>
<p>NLP involves the conversion of textual information into speech and vice-versa, which enables a device to discern and process natural language data. The system then processes this data by standardising text inputs and splitting it into words and sentences. Then, the device can ascertain the syntax of the input provided. NLP comprises two main natural-language principles:<sup class="superscript"><a href="#fn16">16</a></sup><a name="ref16"></a> </p>
<ol>
<li><p><b>Natural language understanding (NLU):</b> NLU is a branch of NLP that deals with reading comprehension, synonyms, themes, and lexical semantics. It is used to construct the responses of VIs through algorithms.<sup class="superscript"><a href="#fn17">17</a></sup><a name="ref17"></a></li></p>
<p> </p>
<li><p><b>Natural language generation (NLG):</b> The first step of NLG involves processing relevant content from databases. This is followed by sentence planning, which involves the formation of natural-language responses through text realisation. As a consequence, the NLG process delivers a meaningful and personalised response, as opposed to a pre-scripted one .<sup class="superscript"><a href="#fn18">18</a></sup><a name="ref18"></a></li></p>
<p> </p>
</ol>
</p>
<p>Synthetic speech employs NLP for its characteristically high ‘segmental intelligibility’ – or its ability to understand each segment of speech. However, pre-recorded speech outputs tend to be preferred by all for their human voice and pronunciation characteristics. These characteristics exist on the condition that the pre-recorded speech maintains the delicate balance between natural prosody<sup class="superscript"><a href="#fn19">19</a></sup><a name="ref19"></a> and the recorded elements. Since it successfully maintains the quality of natural speech, the natural prosody of pre-recorded speech output is higher than that of synthetic speech.<sup class="superscript"><a href="#fn20">20</a></sup><a name="ref20"></a></p>
<h3 id="Types of VIs">Types of VIs</h3>
<p>A plethora of developers are creating VIs that can perform various functions, thereby giving a wide array of definitions to similar interfaces. Interactive voice response (IVR), voice channels, voice bots, and voice assistants are variations of voice-based customer service solutions.<sup class="superscript"><a href="#fn21">21</a></sup><a name="ref21"></a> Although these terms are sometimes used interchangeably, some authors opine that there are nuanced differences that set them apart.<sup class="superscript"><a href="#fn22">22</a></sup><a name="ref22"></a> </p>
<h4 id="Interactive voice response (IVR)">Interactive voice response (IVR)</h4>
<p>IVR systems are one of the oldest VIs in public use. These do not require a smartphone and are still used in several domains. Corkey and Parkinson (2002) define IVR as “a telephone interviewing technique in which the human speaker is replaced by a high-quality recorded interactive script to which the respondent provides answers by pressing the keys of a touch telephone (touch-phone).”<sup class="superscript"><a href="#fn23">23</a></sup><a name="ref23"></a> The recorded scripts used single voices,<sup class="superscript"><a href="#fn24">24</a></sup><a name="ref24"></a> combinations of male and female voices,<sup class="superscript"><a href="#fn25">25</a></sup><a name="ref25"></a> combinations of many female voices speaking in different languages,<sup class="superscript"><a href="#fn26">26</a></sup><a name="ref26"></a> or synthetic voices.<sup class="superscript"><a href="#fn27">27</a></sup><a name="ref27"></a></p>
<h4 id="Chatbots">Chatbots</h4>
<p>The terms voice bots, chatbots, and automated conversational interfaces are used synonymously. They are enhanced by AI, NLP, and machine learning.<sup class="superscript"><a href="#fn28">28</a></sup><a name="ref28"></a> The term ‘voice bot’ is shorthand for ‘voice robot’.<sup class="superscript"><a href="#fn29">29</a></sup><a name="ref29"></a> Here, voice is the primary medium of input.<sup class="superscript"><a href="#fn30">30</a></sup><a name="ref30"></a> They use automated speech recognition (ASR) technology to convert input into text. ‘Chatbot’ has a wider connotation, as it allows people to provide inputs in the form of text, gesture, touch, and voice. In this section, we use the term chatbot in the context of voice-enabled chatbots. The chatbot’s output may be in the form of written text or voice, for which it uses text-to-speech (TTS) technology.<sup class="superscript"><a href="#fn31">31</a></sup><a name="ref31"></a> Voice chatbots can be further classified into two major categories: task-oriented (declarative) chatbots and data-driven (predictive or conversational) chatbots.<sup class="superscript"><a href="#fn32">32</a></sup><a name="ref32"></a></p>
<ol>
<li><p><b>Task-oriented chatbots:</b>Task-oriented chatbots, also referred to as ‘linguistic-based’ or ‘rule-based’ chatbots, are devices that employ VIs that focus on a single purpose.<sup class="superscript"><a href="#fn33">33</a></sup><a name="ref33"></a> Due to this characteristic, they are considered to lack flexibility of functionality. They generate automated, conversational responses using NLP and logic. The functions of these chatbots are fairly limited, and hence they are used for specific purposes. A common example of these chatbots is interactive FAQs.</li></p>
<li><p><b>Data-driven chatbots:</b>Data-driven chatbots, also known as machine-learning or AI chatbots,<sup class="superscript"><a href="#fn34">34</a></sup><a name="ref34"></a> are enhanced with AI, NLP, NLU, and machine learning to deliver personalised and meaningful responses. They are considered more interactive and contextually aware than rule-based chatbots, as their functioning is more complex and predictive.<sup class="superscript"><a href="#fn35">35</a></sup><a name="ref35"></a></li></p>
</ol>
<p>This is because they learn the individual preferences and consequently create a profile of the person based on the data received. Some refer to these types of bots as ‘virtual assistants’.<sup class="superscript"><a href="#fn36">36</a></sup><a name="ref36"></a> However, other literature argues that these bots can be distinguished from virtual assistants.</p>
<h4 id="Virtual assistants">Virtual assistants</h4>
<p>According to scholars, there is no standardised definition of virtual personal assistants.<sup class="superscript"><a href="#fn37">37</a></sup><a name="ref37"></a> They list several names that other scholars have given to these systems, such as virtual assistants; vocal social agents or digital assistants; voice assistants; intelligent agents; and interactive personal assistants. Virtual assistants such as Siri use the speaker’s voice and content and process it to respond in different contexts, like tasks to be performed or an action directed towards the person.<sup class="superscript"><a href="#fn38">38</a></sup><a name="ref38"></a> Virtual assistants are now increasingly used in several areas of everyday life; some common names are Siri, Google Now, Microsoft Cortana, Amazon Echo, and Google Home. These assistants interact with people in a conversational manner, thereby providing them with a wide range of functionalities.<sup class="superscript"><a href="#fn39">39</a></sup><a name="ref39"></a></p>
<p>The conundrum in using the terms ‘chatbot’ and ‘virtual assistant’ interchangeably comes from the lack of universally accepted definitions. Some opine that they come under the umbrella term ‘chatbots’, and, in specific, ‘data-driven chatbots’; the opposing view is that a virtual assistant is a completely different branch in the typology of VIs. These dissenting approaches come about because chatbots are characterised as data-obtaining interfaces.<sup class="superscript"><a href="#fn40">40</a></sup><a name="ref40"></a> In contrast, ‘virtual assistant’ is a distinct classification, as it is considered better than a chatbot with respect to understanding the context and the request, proficiency, nature of responses, and the rendering of a personalised experience.<sup class="superscript"><a href="#fn41">41</a></sup><a name="ref41"></a></p>
<h3 id="future">The future of VIs</h3>
<p>VIs are slowly becoming more accessible as they are being integrated into cheaper mobile phones. The next stage is the development of smart devices for homes that can work with voice assistants, such as Google Home and Amazon Echo. On the business side, voice bots could be used for more complex customer questions. Interestingly, researchers have now also built a prototype linked with Alexa, to provide farmers with a ‘smart irrigation voice assistant’.<sup class="superscript"><a href="#fn42">42</a></sup><a name="ref42"></a> Similarly, a voice application named ‘Avaaj Otalo’ was launched by UC Berkeley School of Information, Stanford HCI Group, IBM India Research Laboratory and Development Support Center (DSC), an NGO in Gujarat, to help farmers with agriculture-related queries.<sup class="superscript"><a href="#fn43">43</a></sup><a name="ref43"></a> Lastly, another significant use of VIs, according to Joshi and Patki (2015), is in increasing the safety of the computer system. Passwords set for systems via keyboards can be duplicated. However, when it comes to securing systems via VIs, duplication becomes far more difficult.<sup class="superscript"><a href="#fn44">44</a></sup><a name="ref44"></a></p>
<h3 id="conclusion">Conclusion</h3>
<p>The reduction in smartphone prices and data, as well as the increase in the functions that they can perform, have enabled the integration of VIs far more complex than IVR systems. One can hope that with further data and research, there will be an increase in not just their variety, but also in their ability to communicate with people who speak different languages.</p>
</div>
<div class="one wide column empty">
</div>
<div class="five wide column meta">
<p><span id="grey">Research and Writing by</span> <br />Deepika Nandagudi Srinivasa <span id="grey">and</span> Shweta Mohandas
<br />
<span id="grey">Review and Editing by</span> <br /> Saumyaa Naidu, Puthiya Purayil Sneha, <br /><span id="grey">and</span> Pranav M.B <br />
<span id="grey">Research Inputs by</span> <br />Sumandro Chattapadhyay<br />
<br />
<a href="docs/MozVoice_LitSurvey_Evolution_01.pdf"><i class="fas fa-arrow-circle-down" style="color: black;" ></i> Download Voice Interfaces and Language Literature Survey
</a></p>
<br />
<hr />
<br />
<p><span style="line-height: 3em;">CONTENTS</span></p>
<p><a href="#Background"><strong>Background</strong></a></p>
<p><a href="#Tracing the evolution of VIs"><strong>Tracing the evolution of VIs</strong></a></p>
<p><a href="#Features of VIs"><strong>Features of VIs</strong></a></p>
<p><a href="#Types of VIs"><strong>Types of VIs</strong></a></p>
<p><a href="#future"><strong>The future of VIs</strong></a></p>
<p><a href="#conclusion"><strong>Conclusion</strong></a></p>
</div>
<div class="one wide column empty">
</div>
<div class="nine wide column text">
<div class="ten wide column content">
</div>
<div class="ten wide column content">
<br />
<h3>Notes</h3>
<table class="footnote">
<tr>
<td class="number">1</td>
<td class="reference"><a name="fn1"></a>Schnelle-Walka, D., “I Tell You Something,” <em> Proceedings of the 16th European Conference on Pattern Languages of Programs - EuroPLoP ’11, <em> 2011. <span class="internal-nav"><a href="#ref1">↑</a></span></td>
</tr>
<tr>
<td class="number">2</td>
<td class="reference"><a name="fn2"></a>Miller, L., “Natural Language Interfaces,” <em> Journal of the Washington Academy of Sciences 80, </em> no. 3 (1990): 91–115, accessed on 3 June 2020, <a href="https://www.jstor.org/stable/24531256"_blank">www.jstor.org/stable/24531256.</a> <span class="internal-nav"><a href="#ref2">↑</a></span></td>
</tr>
<tr>
<td class="number">3</td>
<td class="reference"><a name="fn3"></a>Bhowmik, A. K., <em> Interactive Displays: Natural Human-Interface Technologies </em> (John Wiley & Sons, Incorporated, 2014). <span class="internal-nav"><a href="#ref3">↑</a></span></td>
</tr>
<tr>
<td class="number">4</td>
<td class="reference"><a name="fn4"></a>Carbone, C., “Audrey, Sibyl, and Alice in the Technical Information Libraries,” <em> STWP Review 9, </em> no. 1 (1962): 14–15, accessed on 19 June 2020, <a href="https://www.jstor.org/stable/43091178"_blank">www.jstor.org/stable/43091178</a> <span class="internal-nav"><a href="#ref4">↑</a></span></td>
</tr>
<tr>
<td class="number">5</td>
<td class="reference"><a name="fn5"></a>“IBM Shoebox,” <em> IBM Archives, </em> accessed on 2 November 2021, <a href="https://www.ibm.com/ibm/history/exhibits/specialprod1/specialprod1_7.html"_blank">https://www.ibm.com/ibm/history/exhibits/specialprod1/specialprod1_7.html</a> <span class="internal-nav"><a href="#ref5">↑</a></span></td>
</tr>
<tr>
<td class="number">6</td>
<td class="reference"><a name="fn6"></a>IBM Archives, IBM Shoebox. Retrieved from <a href="https://www.ibm.com/ibm/history/exhibits/specialprod1/specialprod1_7.html"_blank">https://www.ibm.com/ibm/history/exhibits/specialprod1/specialprod1_7.html</a> <span class="internal-nav"><a href="#ref6">↑</a></span></td>
</tr>
<tr>
<td class="number">7</td>
<td class="reference"><a name="fn7"></a>Tank, N., “Voice User Interface (VUI) – A Definition,” <em> Bot Society Blog, </em>2018, <a href="https://botsociety.io/blog/2018/04/voice-user-interface/"_blank">https://botsociety.io/blog/2018/04/voice-user-interface/</a> <span class="internal-nav"><a href="#ref7">↑</a></span></td>
</tr>
<tr>
<td class="number">8</td>
<td class="reference"><a name="fn8"></a>“Consumer Intelligence Series: Prepare for the Voice Revolution,” <em> PwC Survey</em>, 2018. <a href="https://www.pwc.com/us/en/services/consulting/library/consumer-intelligence-series/voice-assistants.html"_blank">https://www.pwc.com/us/en/services/consulting/library/consumer-intelligence-series/voice-assistants.html</a> <span class="internal-nav"><a href="#ref8">↑</a></span></td>
</tr>
<tr>
<td class="number">9</td>
<td class="reference"><a name="fn9"></a>Breen, A., et al., “Voice in the User Interface,” in <em> Interactive Displays: Natural Human-Interface Technologies, </em> ed. Bhowmik, A. K. (John Wiley & Sons, Incorporated, 2014): 107. <span class="internal-nav"><a href="#ref9">↑</a></span></td>
</tr>
<tr>
<td class="number">10</td>
<td class="reference"><a name="fn10"></a>Lawrence, H. M. “Beyond the Graphic User Interface,” In Rhetorical Speculations: <em> The Future of Rhetoric, Writing, and Technology,</em> ed. Sundvall, S., (Logan: University Press of Colorado, 2019). <span class="internal-nav"><a href="#ref10">↑</a></span></td>
</tr>
<tr>
<td class="number">11</td>
<td class="reference"><a name="fn11"></a>Rabiner, L. R., “Voice Communication Between Humans and Machines –An Introduction,” in <em> Voice Communication Between Humans and Machines, </em> ed. D. B. Roa and J. G. Wilpon (The National Academies Press, 1994), <a href="https://doi.org/10.17226/2308"_blank">https://doi.org/10.17226/2308.</a> <span class="internal-nav"><a href="#ref11">↑</a></span></td>
</tr>
<tr>
<td class="number">12</td>
<td class="reference"><a name="fn12"></a>Rabiner, “Voice Communication between Humans and Machines.” <span class="internal-nav"><a href="#ref12">↑</a></span></td>
</tr>
<tr>
<td class="number">13</td>
<td class="reference"><a name="fn13"></a>“Voice User Interfaces,” Interaction Design Foundation, <a href="https://www.interaction-design.org/literature/topics/voice-user-interfaces"_blank">https://www.interaction-design.org/literature/topics/voice-user-interfaces</a> <span class="internal-nav"><a href="#ref13">↑</a></span></td>
</tr>
<tr>
<td class="number">14</td>
<td class="reference"><a name="fn14"></a>Candace Kamm, “User Interface for Voice Applications”, in <em> Voice Communication Between Humans and Machines, </em> eds. David B. Roe and Jay G. Wilpon (The National Academies Press, 1995), 428-429. <span class="internal-nav"><a href="#ref14">↑</a></span></td>
</tr>
<tr>
<td class="number">15</td>
<td class="reference"><a name="fn15"></a>Androutsopoulos, I., <em> Exploring Time, Tense and Aspect in Natural Language Database Interfaces, </em>(John Benjamins Publishing Company, 2002). <span class="internal-nav"><a href="#ref15">↑</a></span></td>
</tr>
<tr>
<td class="number">16</td>
<td class="reference"><a name="fn16"></a>“AI – Natural Language Processing”, <em> Tutorials Point </em> accessed on 11 November 2021,
<a href="https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_natural_language_processing.htm"_blank">https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_natural_language_processing.htm.</a> <span class="internal-nav"><a href="#ref16">↑</a></span></td>
</tr>
<tr>
<td class="number">17</td>
<td class="reference"><a name="fn17"></a>“Chatbots: The Definitive Guide (2020)”, Artificial Solutions, 24 December, 2019, <a href="http://marketing.artificial-solutions.com/rs/177-TDV-970/images/Chatbots-the-definitive-guide-2020.pdf"_blank">http://marketing.artificial-solutions.com/rs/177-TDV-970/images/Chatbots-the-definitive-guide-2020.pdf</a> <span class="internal-nav"><a href="#ref17">↑</a></span></td>
</tr>
<tr>
<td class="number">18</td>
<td class="reference"><a name="fn18"></a>“Chatbots: The Definitive Guide (2020)”, Artificial Solutions. <span class="internal-nav"><a href="#ref18">↑</a></span></td>
</tr>
<tr>
<td class="number">19</td>
<td class="reference"><a name="fn19"></a>Lauren Applebaum, et al. (2015) note that “Prosody, the intonation, rhythm, or ‘music’ of language, is an important aspect of all natural languages. Prosody can convey structural information that, at times, affects the meaning we take from a sentence.” In “Prosody In a Communication System Developed without a Language Model,” <em> Sign language and linguistics vol. 17, no. 2 </em> (2014): 181–212, doi:10.1075/sll.17.2.02app.</a> <span class="internal-nav"><a href="#ref19">↑</a></span></td>
</tr>
<tr>
<td class="number">20</td>
<td class="reference"><a name="fn20"></a>Kamm, “User Interface.” <span class="internal-nav"><a href="#ref20">↑</a></span></td>
</tr>
<tr>
<td class="number">21</td>
<td class="reference"><a name="fn21"></a>Caile, C., “Keto or Atkins? IVR or Voice bots?” <em> Nuance, </em> 2019, accessed on 4 January 2022. <a href="https://whatsnext.nuance.com/enterprise/voice-bots-and-ivr-similarities/"_blank">https://whatsnext.nuance.com/enterprise/voice-bots-and-ivr-similarities/ </a> <span class="internal-nav"><a href="#ref21">↑</a></span></td>
</tr>
<tr>
<td class="number">22</td>
<td class="reference"><a name="fn22"></a>Ghanchi, J., “Chatbots vs Virtual Assistants: Right Solution for Customer Engagement,” <em> Medium, </em>22 October 2019, accessed on 4 January 2022, <a href="https://theconversation.com/amazon-echos-privacy-issues-go-way-beyond-voice-recordings-130016"_blank">https://theconversation.com/amazon-echos-privacy-issues-go-way-beyond-voice-recordings-130016.</a>; Joshi, N., “Yes, Chatbots and Virtual Assistants are Different!” <em>Forbes</em>, 23 December 2018, <a href="https://www.forbes.com/sites/cognitiveworld/2018/12/23/yes-chatbots-and-virtual-assistants-are-different/#6b41450b6d7"_blank">https://www.forbes.com/sites/cognitiveworld/2018/12/23/yes-chatbots-and-virtual-assistants-are-different/#6b41450b6d7</a> <span class="internal-nav"><a href="#ref22">↑</a></span></td>
</tr>
<tr>
<td class="number">23</td>
<td class="reference"><a name="fn23"></a>Corkrey, R., Parkinson, L., “Interactive Voice Response: Review of Studies 1989–2000,” <em>Behaviour Research Methods, Instruments, & Computers </em>34 (2002): 342–353, <a href="https://doi.org/10.3758/BF03195462"_blank">https://doi.org/10.3758/BF03195462.</a> <span class="internal-nav"><a href="#ref23">↑</a></span></td>
</tr>
<tr>
<td class="number">24</td>
<td class="reference"><a name="fn24"></a>Piette, J. D., Weinberger, M., and McPhee, S. J., “The Effect of Automated Calls with Telephone Nurse Follow-Up on Patient-Centered Outcomes of Diabetes Care: A Randomised, Controlled Trial,” <em> Medical Care </em> 38 (2000): 218–230. <span class="internal-nav"><a href="#ref24">↑</a></span></td>
</tr>
<tr>
<td class="number">25</td>
<td class="reference"><a name="fn25"></a>Baer, L., Jacobs, D. G., Cukor, P., O’Laughlen, J., Coyle, J. T., and Magruder, K. M., “Automated Telephone Screening Survey for Depression,” <em>Journal of the American Medical Association, </em>273 (1995): 1943–1944. <span class="internal-nav"><a href="#ref25">↑</a></span></td>
</tr>
<tr>
<td class="number">26</td>
<td class="reference"><a name="fn26"></a>Tanke, E. D., and Leirer, V. O., “Automated Telephone Reminders in Tuberculosis Care,” <em> Medical Care </em>32 (1994): 380–389. <span class="internal-nav"><a href="#ref26">↑</a></span></td>
</tr>
<tr>
<td class="number">27</td>
<td class="reference"><a name="fn27"></a>Meneghini, L. F., Albisser, A. M., Goldberg, R. B., and Mintz, D. H., “An Electronic Case Manager for Diabetes Control,” <em>Diabetes Care </em>21 (1998): 591–596. <span class="internal-nav"><a href="#ref27">↑</a></span></td>
</tr>
<tr>
<td class="number">28</td>
<td class="reference"><a name="fn28"></a>“Chatbots: The Definitive Guide (2020)”, <em>Artificial Solutions </em>. <span class="internal-nav"><a href="#ref28">↑</a></span></td>
</tr>
<tr>
<td class="number">29</td>
<td class="reference"><a name="fn29"></a>Middlebrook, S., and Muller, J. “Thoughts on Bots: The Emerging Law of Electronic Agents,” <em> The Business Lawyer </em>56, no. 1(2000): 341–373, accessed on 12 June 2020, <a href="https://www.jstor.org/stable/40687980"_blank">www.jstor.org/stable/40687980</a> <span class="internal-nav"><a href="#ref29">↑</a></span></td>
</tr>
<tr>
<td class="number">30</td>
<td class="reference"><a name="fn30"></a>“Chatbots: The Definitive Guide (2020)”, <em> Artificial Solutions.</em> <span class="internal-nav"><a href="#ref30">↑</a></span></td>
</tr>
<tr>
<td class="number">31</td>
<td class="reference"><a name="fn31"></a>“Chatbots: The Definitive Guide (2020)”, <em> Artificial Solutions.</em> <span class="internal-nav"><a href="#ref31">↑</a></span></td>
</tr>
<tr>
<td class="number">32</td>
<td class="reference"><a name="fn32"></a>“What Is a Chatbot?”, <em> Oracle,</em> accessed on 21 June 2020, <a href="https://www.oracle.com/solutions/chatbots/what-is-a-chatbot/"_blank">https://www.oracle.com/solutions/chatbots/what-is-a-chatbot/</a> <span class="internal-nav"><a href="#ref32">↑</a></span></td>
</tr>
<tr>
<td class="number">33</td>
<td class="reference"><a name="fn33"></a>“Chatbots: The Definitive Guide (2020)”, <em> Artificial Solutions.</em> <span class="internal-nav"><a href="#ref33">↑</a></span></td>
</tr>
<tr>
<td class="number">34</td>
<td class="reference"><a name="fn34"></a>“Chatbots: The Definitive Guide (2020)”, <em> Artificial Solutions.</em> <span class="internal-nav"><a href="#ref34">↑</a></span></td>
</tr>
<tr>
<td class="number">35</td>
<td class="reference"><a name="fn35"></a>“What Is a Chatbot?”, <em>Oracle.</em> <span class="internal-nav"><a href="#ref35">↑</a></span></td>
</tr>
<tr>
<td class="number">36</td>
<td class="reference"><a name="fn36"></a>“What Is a Chatbot?”, <em>Oracle.</em> <span class="internal-nav"><a href="#ref36">↑</a></span></td>
</tr>
<tr>
<td class="number">37</td>
<td class="reference"><a name="fn37"></a>Timo Strohmann, et al., “Virtual Moderation Assistance: Creating Design Guidelines for Virtual Assistants Supporting Creative Workshops”, <em> PACIS 2018 Proceedings, </em>no. 80 (2018), <a href="https://aisel.aisnet.org/cgi/viewcontent.cgi?article=1079&context=pacis2018"_blank">https://aisel.aisnet.org/cgi/viewcontent.cgi?article=1079&context=pacis2018</a> <span class="internal-nav"><a href="#ref37">↑</a></span></td>
</tr>
<tr>
<td class="number">38</td>
<td class="reference"><a name="fn38"></a>Sirbi, K., Patankar, A. J., “Personal Assistant with Voice Recognition Intelligence”, <em> International Journal of Engineering Research and Technology </em> 10, no. 1 (2017): 416–419. <span class="internal-nav"><a href="#ref38">↑</a></span></td>
</tr>
<tr>
<td class="number">39</td>
<td class="reference"><a name="fn39"></a>Breen, et al, “Voice in the User Interface.” <span class="internal-nav"><a href="#ref39">↑</a></span></td>
</tr>
<tr>
<td class="number">40</td>
<td class="reference"><a name="fn40"></a>Ghanchi, “Chatbots vs Virtual Assistants.” <span class="internal-nav"><a href="#ref40">↑</a></span></td>
<tr>
<td class="number">41</td>
<td class="reference"><a name="fn41"></a>Joshi, “Yes, Chatbots and Virtual Assistants Are Different!” <span class="internal-nav"><a href="#ref41">↑</a></span></td>
</tr>
<tr>
<td class="number">42</td>
<td class="reference"><a name="fn42"></a>Ramakrishnan, V., “How Mindmeld Is Used to Conserve Agricultural Water (... and Win Hackathons in the Process)”, 2019, accessed on 2 November 2021, <a href="https://www.mindmeld.com/20190828-how-mindmeld-is-used-to-conserve-agricultural-water.html"_blank">https://www.mindmeld.com/20190828-how-mindmeld-is-used-to-conserve-agricultural-water.html</a> <span class="internal-nav"><a href="#ref42">↑</a></span></td>
</tr>
<tr>
<td class="number">43</td>
<td class="reference"><a name="fn43"></a>Patel, N., et al., “Avaaj Otalo – A Field Study of an Interactive Voice Forum for Small Farmers in Rural India.” Conference on Human Factors in Computing Systems – Proceedings 2 (2010): 733–742,10.1145/1753326.1753434. <span class="internal-nav"><a href="#ref43">↑</a></span></td>
</tr>
<tr>
<td class="number">44</td>
<td class="reference"><a name="fn44"></a>Joshi, P. and Patki, R., “Voice User Interface Using Hidden Markov Model for Word Formation,” <em> International Journal of Computer Science and Mobile Computing </em> 4, no. 3 (2015): 720-724, <a href="https://ijcsmc.com/docs/papers/March2015/V4I3201599a81.pdf"_blank">https://ijcsmc.com/docs/papers/March2015/V4I3201599a81.pdf.</a> <span class="internal-nav"><a href="#ref44">↑</a></span></td>
</tr>
</table>
</div>
</div>
<div class="six wide column empty">
</div>
</div>
</div>
</div>
<!-- Footer -->
<div class="footer">
<div class="ui container four column stackable grid">
<div class="one wide column empty">
</div>
<div class="five wide column">
<h3>About the Study</h3>
<p>We believe that voice interfaces have the potential to democratise the use of the internet by addressing limitations related to reading and writing on digital text-only platforms and devices. This report examines the current landscape of voice interfaces in India, with a focus on concerns related to privacy and data protection, linguistic barriers, and accessibility for persons with disabilities (PwDs). This project was undertaken with support by the Mozilla Corporation.</p>
</div>
<div class="five wide column">
<h3>Research Team</h3>
<p>Research: Shweta Mohandas, Saumyaa Naidu, Deepika Nandagudi Srinivasa, Divya Pinheiro, Sweta Bisht </p>
<p><em>Conceptualisation, Planning, and Research Inputs</em> Sumandro Chattapadhyay, Puthiya Purayil Sneha</p>
<p><em>Illustration</em> Kruthika NS (Instagram @theworkplacedoodler)</p>
<p><em>Website Design</em> Saumyaa Naidu</p>
<p><em>Website Development</em> Sumandro Chattapadhyay, Pranav M Bidare</p>
<p><em>Review and Editing</em> Puthiya Purayil Sneha, Divyank Katira, Pranav M Bidare, Torsha Sarkar, Pallavi Bedi, Divya Pinheiro</p>
<p><em>Copy Editing</em> The Clean Copy</p>
</div>
<div class="four wide column">
<h3>Copyright and Credits</h3>
<p>Copyright: <a href="http://cis-india.org/" target="_blank">CIS, India</a>, 2021<br />License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">CC BY 4.0 International</a></p>
<p>Built using <a href="https://semantic-ui.com/" target="_blank">Semantic UI</a><br/><a href="https://fonts.google.com/specimen/Barlow" target="_blank">Barlow</a> and <a href="https://fonts.google.com/specimen/Open+Sans" target="_blank">Open Sans</a> by <a href="https://fonts.google.com/" target="_blank">Google Fonts</a><br/>Social media icons by <a href="https://fontawesome.com/" target="_blank">Font Awesome</a><br/>Hosted on <a href="https://github.com/cis-india/mozvoice" target="_blank">GitHub</a></p>
</div>
<div class="one wide column empty">
</div>
<div class="sixteen wide column">
<div style="float: center; clear: both;">
<a href="https://cis-india.org/" target="_blank" style="border-bottom: 0px solid"><img src="img/logo.png" alt="The Centre for Internet and Society, India" class="logo" /></a>
</div>
<div class="icons" style="float: center; clear: both;">
<a href="https://www.instagram.com/cis.india/" target="_blank"><i class="fab fa-instagram fa-lg"></i></a> <a href="https://twitter.com/cis_india" target="_blank"><i class="fab fa-twitter fa-lg"></i></a> <a href="https://www.youtube.com/channel/UC0SLNXQo9XQGUE7Enujr9Ng" target="_blank"><i class="fab fa-youtube fa-lg"></i></a></p>
</div>
</div>
</div>
</div>
</body>
</html>