Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Gestural Melodic Interface Mid-year report for IITG January 10, 2013 Jim McElwaine, Paul Thayer, Keith Landa Initial Concepts The Gestural Music Interface is custom music software designed to capture melodies quickly and clearly in a digital audio domain using hand gestures on a digital tablet. It also allows editing of the initial captures: motivic repetition and variation using conventional melodic procedures. All melodic files will be editable and will allow overdub and compilation. It will be capable of capturing melodies in a variety of tonalities and temperaments. That means that frequency can either be 'corrected' to conventional scales, or left 'uncorrected.' All melodic files will be exportable, in file-types suitable to further audio processing (MP3), and in file-types suitable for digital notation software and timbral reassignment (MIDI, with duration quantizing options). GMI is written in open code for iOS, suitable for both tablet and smartphone devices. Eventually, it could evolve to the Android OS. Initially, the GMI will have three operating modes: Capture, Edit, and eXport. Capture will acquire melody from screen translations of hand motions. Edit mode will allow recall and change of captures. Export will provide file export of edits and captures via email of MIDI files. Although GMI sounds will be compatible with General MIDI Level 2 (GM2) specification, they will not use GM2 sounds. Instead, captures will l be purposely limited in their timbral options, only two or three sustained musical sounds will be available. Pitches will have simple graphic representations, but will not immediately be adapted to conventional clefs and notations. The screen output will follow common visual conventions for music software. Frequency and Time The GMI human user interface is a conventional grid, consonant with most other digital audio software: the X-axis equals time, and the Y-axis equals frequency. Outer portions of each grid member retain microtonal pitch variation. Initially, the GMI has a movable pitch compass (highto-low) of a perfect 12th, or 18 semitones, approximating the human vocal span and providing for several different stepwise cadential approaches from above or below cadential points or pitches. Melodies that exceed the GMI compass will need first to expand pitch compass, probably via the Edit mode. It is hoped that amplitude variation can be incorporated in a later version. Initial Design Initial planning and conferences between music specialist (J. McElwaine), software engineer (P. Thayer), and project manager (K. Landa) began in late August. Conversations revealed many shared concerns: fidelity of hand gesture to resultant sound; note connections (portamento); pitch resolution (conventional scales?) silence as a recorded element A modest pace in our development process allowed for better discussion of initial objectives and technological paths to them, and eventually a reduction in programming overhead. By late October 2012, we had arrived at a functioning alpha-model that responded to screen data entry, via a grid generated by the above musical requirements. By December 2012, the alpha was 80% functional. Its look and feel remained a simple approximation of what would result from a subsequent beta refactoring. That allowed for largescale changes with quick turnarounds – more reductions in programming overhead. We decided not to incorporate a metronome. We also agreed that a moving grid, actually a moving background along the x-axis beneath a static grid image, would eventually solve our extended time grid needs. So, time was generated only by event entry, with no accompanying event divisibility. Current Challenges and Solutions We now have a clear idea of how we want the grid to work, and we have established an optimal path for achieving it. The current grid is entirely event-oriented. We have not yet addressed the issues of changing duration values (realtime mensuration) since this variability seems to require an invariable, a comparative clock. Current event-entry mode also obviates any multiple rhythmic modalities, or prolation. Sometimes beats are divided in twos; other times in threes and fours, and their subgroupings. Often prolations are irrational numbers too, like the ‘swing’ feel of jazz. Once the grid is fully functional, we will be faced with the problem of devising a method for defining consecutive notes as being tied together into a long note as opposed to simply being a row of distinct short notes. Not exactly a "problem" but a "solution" nonetheless: Instead of programming audio features in objective-c3, the application uses "libpd"4 which allows for easy incorporation of the Pure Data interactive audio programming environment, reducing a lot of programming overhead. Next Steps Next on our agenda will be a decommission of the alpha model and a newly constructed beta model for release to selected master-class faculty by January 25, and then to students in those master-classes by February 10. The alpha version of the application has been an exploration of the most efficient methods for achieving the musical goals of the project. It has served its purpose: getting a more concrete vision of an intuitive and gratifying interface. Since it has been experimental from the start, it contains remnants of code that will not be needed. It also contains redundant code that will be refactored to optimize the application and make the code more accessible to others. Existing code from the alpha version will be optimized in the beta version. In the IOS development environment (Apple’s XCode), there are several elements that need to be clear when a project is created. The alpha work has clarified many of those considerations. Beginning the beta version as a new XCode project will simplify the integration of these ideas. The most important of these are the need for a window layout that allows for interface elements outside of the gestural grid and the need for a “paged” application as opposed to a “single view” application. Remaining Technical Challenges Grid spacing Grid markers appearance, if at all Retaining highlights on screen of melodic curve Aesthetic Challenges The aesthetic challenges have been revelatory. Is the event (individual note) duration really important to the intuitive capture of a melodic idea? Humans seem to remember rhythms more quickly than pitches.1 If rhythm is short-term and more imminently recollectable than pitch sequence, can the user retain the combined rhythm-pitch model (the ‘melody’) without reference to rhythm? Melodies are also very often defined by their phrases (or pauses or ‘breaths’). What happens if silence is not recorded. Is the inclusion of a referent clock a necessity? Will the exclusion of duration and event rhythms yield the same recollections or will we adapt a dichotomized solution similar to the isorhythmic training of musicians in the 12th through the 14th centuries2? What needs to be recorded? What are the minimum components of a melody? Capture these to assure a reasonable facsimile of the initial human idea. Evaluation The GMI will be tested first by a few qualified professional composers and performers, familiar with digital sketching software. That evaluation is scheduled for late January 2013. It will then be offered to various master class format. Master classes are small, curricular groups of three to four students, all engaged in music composition and production at Purchase College, SUNY. These groups will use a variety of evaluative and comparative studies relative to effectiveness and accuracy of capture, quicker completion, and better artistic development and incorporation of melody and counterpoint into their compositions. Group evaluations will begin after faculty evaluation in mid-February, to be completed in April. Data collected from beta-testing will be gathered in a database. Evaluative instruments for both groups and their common instruments and queries are being written. Source 1. Music and the Perception of Rhythm, Candace Brower, 1993 2. Metaphor and Musical Thought. Michael Spitzer. 2003 3. Objective-C, B. Cox, T. Love (1983) 4. libpd (layered on PD) M. Puckette (1991) ________________ Reference Functional neuroimaging of semantic and episodic memory. H. Platel (2005) Human Memory: A proposed system and its control processes. R. Atkinson, R.Shiffrin (1968). Memory in a jingle jungle: Music as a mnemonic device in communicating advertising slogans. Richard Yalch, (1991)