Talking Sheet Music - app to describe sheet music to blind musicians - with speech recognition


If the music can talk to me - then I should be able to talk to the music! Yes - it makes an impressive video - but I believe it has genuine practical use too - such as helping blind musicians interact with sheet music.

I was born quite early so my eyes hadn't really formed properly. This has meant I need music to be enlarged in order to see it easily. To solve this problem - I wrote some software to enlarge music / lyrics and scroll with a pedal.

As a visually impaired musician with a Computing Science degree and lots of experience writing MIDI software - it rather caught my interest to read that the RNIB was working on ways to automate describing sheet music.

Since I can only work on these projects in my spare time - progress is rather slow. The Android app in the above videos uses a MIDI parsing library I began writing as part of my final year project at Staffordshire University; because I was familiar with it! However, MIDI is very ambiguous and MusicXML is far more suitable as an input source for sheet music to describe. Therefore, I have begun contributing to an open source web based Talking Scores project - with other members of the Music Subject Area of the UK Association for Accessible Formats. This is based on Music21, a far more extensive, open source music library which parses MusicXML files. I think my ultimate aim is to integrate some of the features in the above videos into a project such as MuseScore.

My Initial Thoughts

  • Interactive - both the music and description should be adaptable to the user's requirements as they work through the music. This could include selecting a phrase (or repeating it), changing the speed, adding a metronome, arpeggiating chords and separating the rhythm, the pitch separating eg eg top note to learn the tune.
  • Hands free - it can be detrimental for a musician to move their hands off an instrument to start a phrase again. Hands free use can be achieved with pedals (short / long / double presses etc) and speech recognition - with a limited set of commands but being aware of the context.
  • Describe and play simultaneously - either so the user can play along - or as a way of helping the user understand a phrase.
  • Respond to user input - either to give feedback on what the user is playing right or wrong. Or to prompt the user what is coming next whilst they play.
  • Be selective - give the user only the information they want - this will aid navigation, comprehension and musical learning.
  • Patterns - the user may wish to know eg - 'this bar is the same as bar X but with a B in the left hand' or 'the rhythm is the repeated' or 'the same chord shape but start on a D'.

Talking Music Android App

The above video is an early version - but it demonstrates some of my initial ideas. The app was supplied a standard midi file (track 1 = right hand, track 2 = left hand) - the metronome, speed, apreggiation and description is dynamically generated based on spoken input.

If you would like to try this app then you can download the Alpha Version (very unfinished) from Google Play. Please send me your feedback - perhaps on the general concept as opposed to the current state! This prototype was developed for Android devices due to personal ease of development and reuse of code. If the idea is successful then it could be developed for other platforms to eg iPhone / Windows / Mac.

Currently speech recognition may well be quite inaccurate - especially for pronunciations other than my own - but a pdf of all the commands it understands (dated 14th January 2018) can be found here.

I welcome comments and feedback on my suggestions for Talking Music, the current app and how the principles behind it could be applied to solve other problems.

Privacy Policy

About Me:

I am an individual developer working on this app (and others) in my spare time. Thanks for using / considering this app - I hope you find it useful.

Google Firebase Analytics:

This app uses Google Firebase Analytics to give me an idea of how many users the app has, what features are popular and how long the app is used for.

Google Firebase Crashlytics:

This app uses Crashlytics crash reporting (part of Google Firebase). This will provide me with a stack trace (details of what the app was doing when it crashed) - which I will use to understand the problem, try to replicate it and hopefully fix it.

Along with the stack trace, I also receive some device information - eg device model, language, OS version, app version etc. This helps me understand the reason for the crash.

Android Advertising ID:

This is a unique (but user-resettable) string - similar to a cookie.

This app uses the Android Advertising ID for Firebase Analytics and Firebase Crashlytics.

The Android Advertising ID is user resettable from the Google Settings.

Speech To Text

This app uses the Android Speech To Text API - which is built into Android. Audio may be sent to Google servers. I do not have access to any audio - only the possible text suggestions returned by Google.

Internet Connection:

The app works without an internet connection - but speech recognition accuracy may well be improved by an internet connection.

Communication:

I welcome comments and suggestions - it is encouraging to hear from users - thank you. If you choose to contact me, I will try to reply in a timely manner. I will most likely contact you again some time later when I have released an update or to let you know about progress relating to your original question. I will not pass your details on to anyone else.