physical button interfaces and that knowledge of the speech widgets transfers from ..... In refining our speech widget dialogs we used 35 paid subjects for an ...
Speech Widgets Dan R. Olsen Jr., S. Travis Nielsen, Matt Reimann Computer Science Department, Brigham Young University, Provo, Utah, 84602, USA {olsen, nielsent, ratt}@cs.byu.edu ABSTRACT
Spoken language interfaces are difficult to develop. We have developed a set of widgets for building speech interfaces by composition similar to toolkits for graphical user interfaces. Our speech widgets presume that users will learn an artificial language that can be universally applied to all applications built with the widgets. We describe lessons learned from having naïve users use the widgets. Experiments with naïve users show that speech widget interfaces have performance times that are comparable to physical button interfaces and that knowledge of the speech widgets transfers from application to application. This learning transfer results in reduced performance times. Keywords
spoken language interfaces, toolkits, widgets INTRODUCTION
One of the important future directions for interactive systems is to push human-computer interaction into more physical situations than the simple desktop workstation. One of the most promising of these technologies is speech. In recent years commercial speech recognition software with usable accuracy has become available. Though significant work still remains to increase recognition accuracy, it is now possible to do more extensive work on exactly how spoken language interfaces should work. Spoken language has some significant advantages over visual-based interfaces. It can be used when either the hands or the eyes are otherwise engaged. Most people can speak desired values faster than they can type them or enter them with a mouse. Speech also has a decided advantage in terms of low power requirements and very small physical form factors. One of our scenarios for the use of interactive speech is a very small device that has a single “push-totalk” button as its interface along with a speaker and microphone. If Moore’s law makes computing cheap and speech recognition becomes highly accurate, this can become a model for highly functional unobtrusive interactive devices. In this paper we explicitly separate spoken language interaction from natural language interaction. One of the arguments in favor of speech interaction has been the naturalness of speech. This has tended to equate spoken language interfaces with natural language understanding. In our work we have separated these two concepts and set aside the natural language portion. We have done this for
1/8 1/28/2002
two reasons. The first is the difficulty of the natural language interaction problem and the second is a series of lessons learned from developing tools for graphical user interfaces. One of the problems with spoken language interfaces is that they are invisible. Unlike graphical user interfaces they contain no external affordances to tell a user what they can say so that the system will understand them. Natural language offers the solution of “users will say whatever they want.” This seems very attractive, but has several problems. Natural language, as used when people speak to each other, contains many contextual references to mutually understood world knowledge. Even people speaking the same language but not sharing the same knowledge base have trouble understanding each other. Filling a small interactive device with enough world context to support such an interface remains problematic. When encountering a new device or interface a user must always enter a discovery process to understand what that interface can do. This is true even when using natural language and human assistance. Our strategy is to provide a standard discovery process that simultaneously teaches the user our restricted user interface language while working with the interface. Our work is focused on creating tools that designers can use to rapidly develop new applications and is also strongly biased by the XWeb project’s cross-modal interaction goals [8]. Many speech-based tools and development approaches use grammar-based approaches from natural language understanding. Work on Air Travel Information Systems (ATIS) have this characteristic[11]. Simpler tools such as the CSLU Toolkit [9] and Suede[3] use a flowchart model of a conversation for designing sequences of prompts, questions, answers and conversational recovery. The focus of these tools is on designing a conversation for a specific application. Our approach is to build a generalized interactive client, which can browse and edit information for a huge variety of applications. This is very different from the application specific efforts of Chatter[5], SpeechActs[12], and Wildfire[13]. In developing speech tools we are guided by our experience in building tools for graphical user interfaces. In the very early days of interactive graphics all of our models for interaction were based on linguistic models such as automata or grammars. A driving assumption of the
early days of User Interface Management Systems (UIMS) was that the tools needed the power to express any interactive dialog because that would allow the designer to craft dialogs that were most appropriate and most effective for a particular need. In graphical user interfaces, this universal dialog approach has been almost completely replaced by the concept of components or widgets. In the widget model, fragments of predefined interaction are packaged in such a manner that they can be easily composed together to form an interface to virtually any application. The widget approach allows many programmers to completely ignore the design of input event syntax by using prepackaged pieces. Unlike linguistic parsing, the widget approach tightly integrates system feedback with user input. The user is also benefited because all applications are composed of the same syntactic pieces. The transfer of learning between applications is enhanced because similar pieces behave in similar ways. This component-based approach is the one that we have pursued in our development of speech tools. Our speech widgets are explicitly not conversational. Our dialog model is that each device has a set of information that the user is either trying to understand or to modify. There are a fixed set of commands that are universal for all applications. Our focus on a small fixed set of commands was inspired by Aron’s Hyperspeech[1] work where increased user satisfaction and performance was reported when the command set was uniform for all nodes. As such our dialogs are strictly user-initiated rather than the systeminitiated or mixed initiative styles studied by Walker[10]. The paper proceeds by first outlining the forms of interaction imposed by the XWeb model and briefly discusses how such interfaces are specified. We then discuss how the abstract interfaces of XWeb are implemented in a speech-only interactive client. This is followed by a discussion of our formative evaluations with users and the lessons we learned in making our speech widgets usable. We conclude with the results of a limited user study comparing our speech-based techniques against the physical interfaces of common automated devices. XWEB INTERACTORS
The XWeb architecture is intended to distribute interactive services over the Internet to a wide variety of interactive clients[8]. All information to be manipulated is represented as XML trees. Interactive control of processes and devices is modeled as interactive editing of control state information. For example a home automation thermostat has several settings that are presented to XWeb via a special server implementation. The user controls the thermostat using an XWeb client to modify those settings. Interfaces in XWeb are specified in an XView. An XView is also encoded in XML and consists of a set of abstract
2/8 1/28/2002
interactors. The primitive interactors that manage atomic values are enumeration, number, time, date and text. These can be composed into larger interfaces using the group, list and link interactors. These eight interactors are used to build any XWeb interface. Recently we have added interactors for 2D and 3D spaces, but no speech implementations for these have been attempted. Our earlier work on speech interaction in 2D spaces[7] has not yet been incorporated into XWeb. Each interactor is tied to a specific data object in an XWeb server’s XML tree. The interactor’s purpose is to present that data to the user and to make modifications to that data as requested by the user. Consider the example of a simple automated thermostat that uses different temperature settings for different times of the day. Such a thermostat could be represented in XML as: A highly abbreviated XView for this thermostat would be: The ,
In recent years commercial speech recognition software with usable accuracy has .... user study comparing our speech-based techniques against the physical ...
Les horaires des transports en commun sur votre site web. Pratique et amusant. Le Web Widget est un outil affichant sur n'importe quel site web les lignes TEC ...
making speech the highest learned skill a human ... standard' characters makes it .... In the situations of video teleconferencing or cartoon-style animation, the.
example iDwidgets and define a conceptual framework based on what is being customized in the widget. .... Many ubiquitous computing environments exploit ...
achieve reliable translation performance. Resources are not just computation limited, but memory and storage requirements, and the audio input and output ...
This study investigates masking effects occurring during speech comprehension ... Introduction. Most of the .... inserted 2.5 s from the start of the stimulus. Stimuli ...
I do not translate the French version of this document which has to do with poems in the. French language. However, I give a few suggestions for poem selection ...
Why? Identify three different persuasive techniques the speaker uses and describe their effect. (Repetition, anaphora, parallel structure, lists of three, metaphor, ...
Jul 7, 2018 - approach based on Regular Expressions [11] introduced in the following section. ... lambdas are first class, essentially they can be called as arguments, returned as ... lambdas are pure functions, like mathematical functions the list o
that day, month, year etc. is over. !! Always think about the logic of the situation. For ex. :the simple present can remain unchanged in reported speech to show ...
see in the demo program, Say is the command to speak the text that follows, and .... LOOKDOWN takes cmd and hunts for it in the table and finds it in position five .... keep track of what abbreviations are currently downloaded to the Emic TTS.
Speech Intelligibility - A JBL Professional Technical Note. 1. ... and sports venues takes on greater impor- .... speech power is in the 250 and 500 Hz bands,.
test of perception, comparing the productions with those of the given model. ... follow our expectations, that is, the scores are good when the percentage of ...
scribed in Section 2.1, is used. The results are shown in Fig. 8. The performances with the different solutions do not differ significantly, presumably because the.
Thinking Head project [14], a special initiative scheme of the Australian Research ... Heads to Thinking Heads: A Research Platform for Human. Communication ...
Along with Martin, Ungerleider, and Haxby (2001), we believe that "The semantic ...... Alibali, M. W., Kita, S., & Young, A. J. (2000). Gesture and the process of ...
perception of incongruent auditory-visual speech stimuli [4]. Because of theses issues, ... electromagnetic field and signals transduced in small sensors (3 mm²) ...
tout à fait à l'abri de tout aléa ? Une loi peut être .... occupies from a juridical and moral point of view isn't determined by the perfection of it's military parades, by ...
scribed. This task is presented in literature as a speech-to-text synchro- nization problem, or as an imperfect transcript correction problem. In both cases the main ...