Talking out of the box III
By esj on Friday 11 March 2011, 01:05 - Permalink
In the last segment, the discussion covered on the conceptual jump between traditional Natural Language Commands/direct dictation to working with data not normally manipulated using these techniques. The distance between working speech recognition friendly data and non-friendly data is not as far as it may seem at first.
The starting point is the application with the data in a form that cannot be spoken. As we've described, code, XML,JSON are all great candidates for something that hurts to speak. I believe a key concept in manipulating something unspeakable is removing it from the context it is in and transform it into something that can be spoken. The reason this approach should work is because it shortens the conceptual distance between what you say and what's on the screen.
from a speech manipulation perspective, the following example is quite ugly,
You can pronounce rename, encode, OS (but you get the case wrong), you can spell out "f" and 'n' most of the time as well as 'u', 't', 'f', '-', and '8'. With modern speech recognition technology, if you are really lucky, you should be able to say everything correctly in two or three tries. In other words, very low efficiency of speech recognition use.
Using the classic embedded dictation model complicates the user interface and dictating the entire line is fraught with misrecognition risk as well as instilling significant anxiety in the user. A common sense solution would break the entire line into multiple utterances. Implications of this will be discussed in a later post but once we have a way of stringing together individual utterances, we've made significant progress bridging the gap between what can be spoken and what is written.
The first argument of the first term fn.encode('utf-8') contains three things that are hard to say and one that is annoying. The first is the name fn. A programmer, acquainted with the code would know that it's also an alias for "filename" but it could also be an alias for "function name", or some number of alternatives. The problem with speaking the name directly is the user has no clue as to what they need to say and no way to correct a misrecognition.
One solution is performing certain operations outside of the context of the application. In this case, the translation of a name to something a user can speak. Translating a spoken name to a written name is problematic because there are so many transformations people have developed for this purpose. Some are algorithmic in nature, others are more ad hoc, motivated by events such as discovering a name generation pattern created a name collision.
Until now, the descriptions of what to do have been abstract. Now it's time to get concrete using the example of a code symbol generator.
Normally data transformation processes from and to the application are idempotent. Unfortunately, it's not possible to maintain that symmetry under circumstances like this one. Transforming plain text to a codename is easy... sort of.
credit account --> Credit_Account credit account --> credit_account credit account --> crdtAcnt credit account --> crdtAccnt
Once you've made your selection, the reverse transform is difficult. A symbol, once modified to fit coding standards, frequently loses information as to the original meaning. Humans are pretty good at guessing at the original meaning but it's a tricky problem for software to do the same thing. Using the examples in a list above, work out the algorithms for going back to the original English text. It's important to remember there is no crime in a bad reforming of code name into text name if the reverse transformation results in the same code name. For example:
crdtAcnt -> cardtype a count card read type account credit a count
It's important that when the reverse transform is applied to a code name, it generates a name that will successfully generate the original text name. Sometimes, the only way to enforce this behavior is by using a name cache remembering the original and generated name one of the name generation schemes.
In practice, simple name generation is not always sufficient. Experience in other domains has shown, one could increase the chances of getting an appropriate name by using environment dependent name scope information.
The next level of concrete comes with applying this name generator in an application, namely the code editor and that comes in the next entry.