Voice2IM 0.1 README Moshe Yudkowsky speech@pobox.com

Introduuction
-------------

Voice2IM is a demonstration of how to create a mutli-modal user interface. The
two modes are speech and text: Speech recognition ("ASR")
input and text-to-speech ("TTS") output, and instant messaging ("IM") for input
and output via text.

This package also assumes that the user is speaking over a telephone. The demo
has only been tested to date with old-fashioned "Public Switched
Network" telephones, but could easily accomodate SIP telephones.

To make this work, you will need a telephony server to connect between the
telephone network and the Internet. You will need a speech server that has the
appropriate ASR/TTS resources. Telephony and speech servers based on the
VoiceXML language are freely accessible to developers from several sources; I
used Voxeo's servers to create and test this package (www.voxeo.com).

You will also need an IM server running Jabber. Several sources make these
servers freely accessible; I used the ones at jabber.org for my testing.


Requirements
------------

* Software

1. Expat

Expat provides XML capabilties to Python. You will need a working copy of Python
with the expat package installed. The expat package is available from
http://expat.sourceforge.net (*not* the URL given in the Python
Setup/Modules file in versions 2.1 and 2.2!). You may have to rebuild your
Python compiler if you use Linux; for the 2.1 and later versions of Python, as
far as I can tell expat will be included automatically if you install expat on
your computer and then rebuild Python.

2. Japper.py

The Jabber.py package is available from http://jabberpy.sourceforge.net. This
package provides basic IM.

* Hardware/Network

1. Telephone and Speech Servers

Voice2IM uses VoiceXML to operate the speech technology resources (ASR and
TTS). You will need to find a suitable server to run VoiceXML. Please note that
aside from the VoiceXML/telephony servers available to developers on the
Internet, IBM has (through Alphaworks) a VoiceXML, Voice Toolkit, for use on
desktops. I haven't tried it lately, but I expect that if you don't want to use
telephony-based applications, Voice Toolkit can be made to serve.

2. IM Server

Voice2IM uses the Jabber protocol for its IM messages. You will need a Jabber
server to pass messages between the psuedo-client the application uses and the
user's IM client. I used the server at jabber.org, but any server will do.


Installation
------------

The VoiceXML script "top.xml" goes on the speech server. Since VoiceXML is a
young language, you may need to tweak the script to achieve full compatibility
with your particular server. For example, the header of top.xml includes a
reference to "Nuance," the vendor who supplies ASR and TTS to my speech server.

All of the CGI scripts build VoiceXML files, and therefore they too include some
text which must be changed for a new system. This text will be in the
endProgram() functions -- look there for the "Nuance" line, for example.

The other files are all installed on a CGI server that can be accessed by the
VoiceXML server. While installing a CGI script isn't all that complicated,
instructions are beyond the scope of this document.

The VoiceXML scripts have <subdialog> tags, and these subdialog tags point to
the scripts on the CGI server "example.com". Of course, you'll need to modify
the VoiceXML script to have these tags point to your CGI server.

On your CGI server, you will need to create the databases "dbLog", "dbUser", and
"dbInfo" for use with this package. The python script "initsystem.py" will do
this for you, but of course you will need to make these files accessible to the
CGI scripts by giving them proper permissions and whatnot (the log file must
be writeable. Voice2IM does not yet use the log file, by the way). Initsystem.py *must
be modified* to include the login, password, and server name of the jabber
server you will use. In other words, you must use a regular jabber client to
create a login and password for use by this CGI script, and you must place the
login and password into this script before you run the script. I realize that this
should be driven by the command line...

The file "adduser.py" lets you add a user's phone number and IM address to the
database. It also adds the preferences of how the user receives text information;
this does not currently work, but I do have some test versions that use the "text"
formatting, which works nicely, and I a currently looking for an HTML-capable IM client to test the
"xhtml" setting.

Use/Limitations/Bugs
--------------------

To use this package, activate the VoiceXML script (usually by calling a phone
number). If all's well you will be able to receive flight information on your IM
client, and either speak or write the selection of your flight number.

Voice2IM has many limitations, not least of which is that it doesn't actually
tie into a database of flight information.

No doubt there are bugs -- but some problems are actually "features" that this
package is meant to explore. For example, there's a delay between when I ask for
flight information and it shows up on my IM, due to the nature of Internet
communications. Voice2IM was written to explore these problems in multimodal
access.

Public Demo
-----------

There ought to be a publicly-accessible demo of this package available via my
web site at http://www.Disaggregate.com. Check and see.


Documentation
-------------

Well, you're reading the documentation...

The system will be covered in an article in Dr. Dobb's Journal; more information
as it becomes available.

Security
--------

This package does not currently incoporate any security. For example, if I know
your cell phone number and I suspect that you've registered with this package,
it's possible that I could trigger hundreds of IM messages to your IM client.

I do have some fixes for this problem, most notably using a token to assure that
the request for the IM was generated by a known package; look for these fixes
in a subsequent release.

Licensing
---------

This package is released according to the LGPL and the Artistic License. See
the LICENSE file for details.

Disclaimers
-----------

Read the disclaimers in the LGPL license very carefully. This is a
limited demonstration package, not a working piece of software. THIS PACKAGE
CARRIES NO WARRANTIES AND USE IS ENTIRELY AT THE RISK OF THE USER.

Contact Information
-------------------

Disaggregate provides consulting for speech technology. You may reach
us at:

	Disaggregate
	2952 W Fargo
	Chicago IL 60645 USA

	tel: +1 773 764 8727

	speech@pobox.com

	http://www.Disaggregate.com

Regards,
 Moshe Yudkowsky

