VR Theremin Deliverable 1

In order to assist the user the Super-. Theremin displays a piano keyboard with a line ... A wireless data glove. •. A passive (orthogonal polarising lens based) ...
742KB taille 157 téléchargements 399 vues
VR Theremin Deliverable 1 Final Year Dissertation Author: James Redfern Supervisor: Ruth Aylett

“Imagine a virtuoso soprano with an unlimited upper range. Imagine a violin with the lower range of a cello. Imagine an instrument that allows for every nuance, for every slight embellishment, for every dynamic flourish imaginable.“ - Nancy Moran on the theremin.

1. Introduction The project aim is to develop an application which will provide a stereoscopic 3d graphical interface with which the user can interact using virtual reality (VR) technologies. The application will generate an audio output based on the user's interaction. The graphical interface should provide the user with feedback relating to the audio output - the visual feedback of the virtual reality interface replaces the tactile feedback received from physical instruments.

This document will initially discuss non-haptic1 and VR instruments. Next will be presented a concept for the VR instrument that will be created in the course of this project. Finally there will be presented a feasability study of creating the instrument with the hardware and software options available.

2. Background The theremin or thereminvox (invented by Léon Theremin, 1896 – 1993) is the ancestor to all VR music interfaces as it employs a non-haptic gestural interface. Its sound became widely recognised when it was used to great effect by the Beach Boys in their track “Good Vibrations”. The interface simply consists of two

1 Haptic - Of or relating to the sense of touch; tactile.

electromagnetic sensors, which track the player's hands. The instrument plays a simple sine tone, whose pitch and amplitude are controlled by the proximity of the player's hands to the sensors (right hand for pitch, left for amplitude) in a continuous fashion (notes which don't correspond to the chromatic scale may be played). The only feedback the player receives is the change in audio output. Because of this the instrument is notoriously difficult to play, and there are few highly skilled theremin players. Notable players include the Lithuanian Clara Rockmore (1911-1998) with her seminal album entitled “The art of Theremin”, the Russian virtuoso Lydia Kavina (1967-) and the contemporary artist Pamelia Kurstin [1].

One of the first VR musical interfaces was the Mandala Virtual Reality System (MVRS) [2] created by Vincent John Vincent in the late 80s. Reminiscent of Sony's eye-toy, it used a video capture system and overlaid the player's image with computer created graphics. The instrument is played by moving the body over predefined zones of the computer graphic, which trigger sounds to be played. Some zones may also change the mode of the instrument, changing the instrument between different graphic and sound sets. Like the theremin the MVRS does not provide any haptic feedback to the player. However, unlike the theremin there is a form of visual feedback .

Figure 1 – Mandala Virtual Reality System.

The player can see themselves apparently hitting the instruments on screen, providing a disembodied type of visual feedback. A criticism of this type of percussive virtual instrument interface is the lack of haptic feedback which is central to the playing of physical percussive instruments.

One of the first “embodied” virtual reality musical instrument interfaces (that is, the user interacts with the interface from a first-person view) was created by Jaron Lanier (who is also accredited with coining the term “virtual reality”). He used a single dataglove as the input device and wore a head mounted display (HMD) to see the interface. Entitled “The Sound of One Hand” [3], his performance involved interacting with several different physically modelled virtual objects. The first virtual object played in this performance was the Rhythm Gimbal. It resembled a gyroscope and would make no sound when still. When Lanier picked up and moved the Rhythm Gimbal it would emit sound, which was created by the rings rubbing off each other. The rings would also change colour on contact. Once in motion the Rhythm Gimbal would take a long time to come to rest and the timbre of the sound would change as it slowed.

Figure 2 – A gimbal.

The Rhythm Gimbal was an effective VR instrument interface in several ways. It did not require a high level of dexterity to play – the VR input device did not afford a high level of dexterity. It produced a complex musical output based on a simple user input.

Since 2001 the International Conference on New Interfaces for Musical Expression (NIME) [4] has been held annually, and is dedicated to scientific research on the development of new technologies for musical expression and artistic performance. Several virtual reality instruments were presented by Mäki-Patola, Laitinen, Kanerva and Takala [5] at NIME05. The first instrument presented was the Virtual Xylophone (see figure 3). The interface was projected in stereoscopic 3d, and consisted of a user-definable number of virtual xylophone plates and two virtual mallets. The performer held two magnetic sensors, from which the virtual mallet visualisations extended. The performer could place the plates anywhere in 3d space by grabbing and dragging them with a dataglove, and could play the notes by striking the virtual plates with the virtual mallets. This enabled the performer to build up sequences of notes to be played by placing the plates side by side, or layering the plates to build chords.

Figure 3 – The Virtual Xylophone. Based on feedback from user tests, it was determined that the configurable interface considerably increased the appeal of the instrument. Making chords and sequences was seen to be very rewarding, and it excited and inspired the people testing the instrument.

The second instrument presented by Mäki-Patola, Laitinen, Kanerva and Takala was the Gestural FM Synthesizer. It was an evolution of the theremin instrument. The pitch of the sound produced was controlled by moving the right hand up and down, with the amplitude being controlled by opening and closing the

fingers of the right hand. The instrument offered a visualisation of the current note by drawing a thin line from the performer's hand to a vertical keyboard, striking the key corresponding to the current pitch. The pitch was continuous; the keyboard was only a visual aid. As a result of the visual feedback, users found it much easier to find particular notes with the Gestural FM Synthesizer, compared to the original theremin.

3. Interface Design The application should provide a stereoscopic 3d graphical interface with which the user can interact, and which will generate audio output based on the user's interaction. Many VR input devices do not offer the fine grained control of a physical instrument, thus the instrument should not require the user to provide a fine level of control2. The positions of the 3d computer created objects are not as easily determined by the user compared to actual physical 3d objects, and the lack of tactile feedback means that the user can't learn and memorise the interface through his/her hands. Because of this the interface should require less spatial accuracy from the user than a physical instrument. The application should have a high musical efficiency (defined by Jordà 2004 [6]). This means that it should have a relatively high musical output complexity compared to the control input complexity - unsophisticated actions from the user would generate sophisticated sonic output.

Super-Theremin The interface that I have conceived mimmicks that of a theremin. The instrument produces a tone whose amplitude is increased by the user raising their left hand. Moving the right hand to the left raises the pitch (matching the theremin's frequency range of 4 octaves). The fine spatial resolution of a theremin allows vibrato and tremolo effects to be performed by the user. The spatial resolution of the Polhemus Liberty user input device (described in section 4) is less than that offered by the theremin instrument hampering the user's ability to create these effects. A solution to this problem is to implement automatic vibrato and tremolo effects, which are controlled by extending the user's fingers in the data gloves (this solution was used in the Gestural FM Synthesizer [5]).

As discussed above, playing specific notes on a theremin requires an acute musical ear, due to the singular 2 This constraint is likely to be overcome as virtual reality input devices becomes more sensitive.

medium of feedback (auditory) and the continuous nature of its control. In order to assist the user the SuperTheremin displays a piano keyboard with a line being projected from the user's hand to the note on the keyboard which corresponds to the current pitch (this enhancement was also used in the Gestural FM Synthesizer [5]). A trail may be left behind the user's hand as they play the Super-Theremin, which displays a record of the user's input (pitch history). This can help the user to repeat phrases and to use gesture in order to play the instrument (see figure 4). There are several possible ways to implement the pitch history trail. The trail may be recorded in three dimensions, being drawn as the users moves their hand and fading out over time. Alternatively the horizontal position of the trail may be controlled by the users horizontal hand movement, and the vertical position of the trail denoting time. The trail scrolls vertically, beginning from a horizontal line which represents the current time slice (see figure 5).

The user may choose to lock the pitch of the Super-Theremin to a scale (forcing the notes to snap into tune). The scale may be the chromatic scale, or others scales such as c major or c minor etc. Locking the notes to a scale will enable the novice musician to create a satisfying musical output. The user may select several scales to play from – the scales will be located in zones which are stacked vertically (and labelled). The user can play notes from any of the scales by moving their hand into the appropriate zone (see figure 6). A possible extension of this mode is a scale import function, which would let the user import custom scales from an scl file. These files are created in Scala.

Scala is a powerful software tool for experimentation with musical tunings, such as just intonation scales, equal and historical temperaments, microtonal and macrotonal scales, and non-Western scales. It supports scale creation, editing, comparison, analysis, storage, tuning of electronic instruments, and MIDI file generation and tuning conversion. (from the Scala website [7])

A tutor mode may be enabled that displays lines which the user can follow to play melodies (see figure 7). A possible extension of this mode is a melody import function, which would let the user import melodies from a midi file. Nodes are displayed on the lines which indicate when the user should raise their left hand to increase the volume and emphasise notes. If the hardware allows the Super-Theremin to reproduce the characteristics of a theremin accurately, then this mode may be used as a learning tool for playing the

traditional theremin.

The success of the different modes that have been specified for the Super-Theremin cannot truly be known until the functions have been tested. As such the requirements for these modes are tentative and may be improved after some user testing has been performed. One such unknown variable is whether it is better to have the pitch controlled vertically or horizontally, with the tutor mode and pitch history mode two scrolling horizontally.

Functional Architecture Diagrams3

Figure 4 – Super-Theremin pitch history mode 1.

3 Dataglove image from http://www.generativedesign.com/, keyboard image from http://www.harmony-central.com/

Figure 5 – Super-Theremin pitch history mode 2.

Figure 6 – Super-Theremin multi scale mode.

Figure 7 – Super-Theremin tutor mode.

2. Hardware and software feasibility and solutions This section describes the technical issues that arise in this project, and puts forward initial ideas to address them. The hardware devices that are available for this project include:



A Polhemus Liberty 3d tracker with two wired trackers. - These trackers relay their position and rotation in three dimensions.



Several wired data gloves. - The gloves record the level of extension of each finger and thumb.



A wireless data glove.



A passive (orthogonal polarising lens based) stereoscopic projection system.



A PC with an Nvidia Quadro 3d graphics card.

The technical issues arising can mainly be classified as relating to:



3d graphics an user input.



Audio generation.

3d graphics and user input The application should create an artificial 3d interface for the user, mapping actual space to virtual space. This means that the user can directly interact with the interface in front of them as if it were a real but untouchable object.

In order to map actual to virtual space the eye position of the user is required by the application in order to locate the virtual camera. Three possible solutions are as followed. A tracker may be placed on the user's head relaying its position to the application and modifying the virtual camera position. Alternatively the user can be instructed to stay in a fixed pre-calculated position. Finally there may be several pre-determined positions that the user can move to, with the application switching the virtual camera position to match the user's position.

Software is required to display the 3d interface as well as to record the user's input. There are many different systems that could be used to create this software. An evaluation of the available systems was conducted (see appendix for the evaluation). It was determined that the VR Juggler [8] system is most suited to this project. It is a middle-ware system which allows the use of high level languages (including Java and Python), and contains features such as a scene-graph (Open Scene Graph) for creating the 3d visuals, and a standard interface (Gadgeteer) for the VR input devices. The Polhemus Liberty tracker is not natively supported, however an older tracker from Polhemus is, and it should be possible to modify this driver (the Liberty comes with a C based API).

A factor to consider in the interface design is the latency which is introduced by the IO hardware (see Table 1 for details). Assuming 5 ms processing time, using the Liberty tracker would theoretically incur (4.16 + 5 + 20 = 29.16 ms) visual latency and (4.16 + 5 + 5 = 14.16 ms) audio latency. The gloves would incur (16.66 +

5 + 20 = 41.66 ms) visual latency and (16.66 + 5 + 5 = 26.66 ms) audio latency. Dahl and Bresin [9] suggest that latencies of over 55ms degrade the performance of playing a percussion instrument to a metronome, when that percussion instrument does not have tactile feedback. Mäki-Patola and Hämäläinen [10] report that a player of a continuous sound instrument such as the theremin just notices the difference between no latency and about 25 ms. They also report that playing style affects the detection of latency – when playing slow passages with vibrato, even high latencies (tested up to 100 ms) were not noticed. However, they also note that latencies as high as several hundred milliseconds are not unusual for church organs, and that they can still be played with practise.

Device

Operating Frequency (Hz)

Latency (ms)

Dataglove

60

16.66

Liberty Tracker

240

4.16

3d Display

50 fps (based on a 3d scene of reasonable complexity)

20

Audio output ASIO driver

44100 (with 256 byte buffer)

5

Table 1 – IO Device Latency

Consideration must be taken regarding the ratio of the distance between the virtual cameras (corresponding to left and right eyes) and the distance between the cameras and the closest object. Increasing this ratio requires the user's eyes to converge to view the close object. There exists a range between two ratios which is comfortable for the user to view, and beyond which causes eye-strain. It may be necessary to modify the pitch history trail of the Super-Theremin to being displayed a short distance away from the user's hand (depending on how close the virtual 3d graphics can be to the user's eyes).

It's not known beforehand precisely what will be effective and what will not regarding the suitability of the hardware to the requirement specification. Because of this the application will be created using an exploratory development model. The functional requirements that are the most unpredictable factors in the project will be developed first, using a mock-up/prototype method. This will show their feasibility and show whether the requirements need to be modified earlier rather than later in the project.

Audio generation The application should output an audio stream based on the specification of the application. There are two approaches to the audio synthesis that can be taken. They are, using a library to generate the sound, and using an external sound generator which is controlled via messaging. The first approach will let the application directly control the sound card. The library may control the windows midi synthesiser, or the library may generate the audio digitally and send it to the wave output. The second approach involves using an external audio generator and sending note information to it for processing. A hardware synthesiser could be used, or alternatively a software synthesiser running in parallel with the application. The benefit of using a library to generate the sound is that the final application would be free-standing, not dependant on another tool to generate the audio. The benefit of using an external device to generate the audio is that the device need not be known at compile time, it only needs to be compliant with the messaging protocol used by the application. The traditional standard messaging protocol for electronic musical instruments is MIDI. Conceived in 1981, MIDI messages use 7 bit words for data transferral, which may prove to be of limited use for modelling a continuous input device such as the theremin. In modern systems MIDI has been replaced by OSC (OpenSound Control) [11]. OSC has been implemented in many languages including Java, Python and even Perl. It commonly uses UDP as a transport layer.

Pure Data (Pd) is a graphical programming language developed by Miller Puckette in the 1990s for the creation of interactive computer music and multimedia works. It is open source, and available for Windows, Linux and Macintosh. Pd has a large user base and many pre-constructed software synthesiser configurations are available for for free on the net, and is compatible with the OSC communication protocol.

3. Testing strategy There are two aspects to the testing strategy of this project. Namely, System Testing and Evaluation Testing (performance assessment). System testing is aimed at ensuring that the system functions correctly. The strategy that will be used is to separate the system into components which may be tested individually, before

being integrated and tested as a whole. Evaluation testing is a difficult problem with applications whose value is subjective, such as a musical instrument. An evaluation framework has been developed by Jordà [6] which can be applied to the Super-Theremin. It attempts to objectively evaluate whether an instrument is “Good”. It evaluates the possibility and diversity of musical instruments as well as the expressive freedom of human music performers. Its evaluation is performed on the following criteria:



Balance (challenge, frustration, boredom...)



Playability, progression and learnability



The learning curve



Efficiency of a musical instrument - The ratio of musical output complexity to control input complexity.



Playing [with] music - The extent to which playing a musical instrument is “playing” in the game sense – an instrument can be fun.

User testing of the Super-Theremin will also be performed. Users will be allowed to play with the system, and given a questionnaire to fill in. The questionnaire will ask the user to comment on their experience with the instrument, under headings such as:



How useful they found the visual aids.



What they generally liked/disliked about it.



Suggestions for further developing the functionality of the instrument.

4. Timetable (a 1-page timetable for the whole year (including term 1), agreed with your supervisor, and specifying activities, deliverables and deadlines (ask your supervisor for an example).)

References [1] - Pamelia Kurstin - http://www.pameliakurstin.com [2] - Vincent John Vincent - http://www.vjvincent.com/index.htm [3] - Jaron Lanier. Music From Inside Virtual Reality: The Sound of One Hand http://www.advanced.org/jaron/vr.html [4] - International Conference on New Interfaces for Musical Expression (NIME) - http://www.nime.org/ [5] - Mäki-Patola, T. Experiments with Virtual Reality Instruments. Proceedings of the Conference on New Interfaces for Musical Expression (NIME05). http://hct.ece.ubc.ca/nime/2005/proc/nime2005_011.pdf [6] - Jordà, S. Digital Intruments and Players: Part I – Efficiency and Apprenticeship. Proceedings of the Conference on New Interfaces for Musical Expression (NIME04), Hamamatsu, Japan, 2004. http://hct.ece.ubc.ca/nime/2004/NIME04/paper/NIME04_1D03.pdf [7] - Scala home page - http://www.xs4all.nl/~huygensf/scala/ [8] - VR Juggler – Open Source Virtual Reality Tools - http://www.vrjuggler.org/ [9] - Dahl, S., Bresin, R. IS THE PLAYER MORE INFLUENCED BY THE AUDITORY THAN THE TACTILE FEEDBACK FROM THE INSTRUMENT? [10] - Mäki-Patola, T., Hämäläinen, P. Latency Tolerance for Gesture Controlled Continuous Sound Instrument without Tactile Feedback.. http://www.tml.tkk.fi/~tmakipat/pubs/icmcarticlefinal10.pdf http://www.csis.ul.ie/dafx01/proceedings/papers/dahl.pdf [11] - OpenSound Control - http://www.cnmat.berkeley.edu/OpenSoundControl/