Speech Synthesis & Speech Recognition Using SAPI 4 High Level Interfaces
Speech Synthesis & Speech Recognition
Using SAPI 4 High Level Interfaces
Brian Long (www.blong.com)
Table of Contents
Click
here to download the files associated with this article.
Introduction
This article looks at adding support for speech capabilities to Microsoft Windows
applications written in Delphi, using the Microsoft Speech API version 4 (SAPI
4). For an overview on the subject of speech technology please click
here. For information on using SAPI 5.1 in Delphi applications click
here.
The older SAPI 4 interfaces are defined in two ways. There are high level interfaces,
intended to make implementation easier, but which sacrifice some of the control.
These are intended for quick results but can be quite effective. There are also
low level interfaces, which give full
control but involve more work to get going. These are intended for the serious
programmer to work with.
The high level interfaces are implemented by Microsoft in COM objects to call
the lower level interfaces, taking
care of all the nitty-gritty. The low level
interfaces themselves are implemented by the TTS and SR engines that you
obtain and install.
We will look at the high level interfaces available for TTS and SR in this
article. You can find coverage of the low level interfaces by clicking
here.
Grammars
Part of the process of speech recognition involves deciding what words have
actually been spoken. Recognisers use a grammar to decide what has been said,
where possible.
In the case of dictation, a grammar can be used to indicate some words that
are likely to be spoken. It is not feasible to try and represent the entire
spoken English language as a grammar, so the recogniser does its best and uses
the grammar to help out. The recogniser tries to use context information from
the text to work out which words are more likely than others. At its simplest,
the Microsoft SR engine can use a dictation grammar like this:
[Grammar]
LangID=2057
;2057 = $809 = UK English
type=dictation
With Command and Control, the permitted words are limited to the supported
commands. The grammar defines various rules that dictate what will be said and
this makes the recogniser's job much easier. Rather than trying to understand
anything spoken, it only needs to recognise speech that follows the supplied
rules. A Command and Control grammar is typically referred to as Context-Free
Grammar (CFG). A simple CFG that recognises three colours might look like this:
[Grammar]
LangID=2057
;UK English - 2057 = $809
Type=cfg
[<Start>]
<Start> = colour red
<Start> = colour green
<Start> = colour blue
Start is the root point of the grammar.
Grammars support lists to make implementing many similar commands easy. For
example:
LangID=2057
;UK English - 2057 = $809
Type=cfg
[<Start>]
<Start> = colour <Colour>
[<Colour>]
<Colour> = red
<Colour> = green
<Colour> = blue
You can find more details about the supported grammar syntax in the SAPI documentation
High Level Interfaces
The interfaces are made available to programmers as true COM interfaces, which
Delphi's Object Pascal and C++ are both more than able to use. However they
are also made available in a simplified form as Automation interfaces for less
able languages such as VBA and also through ActiveX controls for use as visual
controls in development environments that support them. Additionally, C++ wrapper
classes are supplied for Visual C++ programmers familiar with MFC, but we won't
need to pay any attention to those in this paper.
Most of the details of how the interfaces work will be covered whilst looking
at the COM support and so will not be focused on quite so much during the Automation
and ActiveX sections.
DSR speech recognition with the high level interfaces does
not require a formal grammar to be supplied. The COM objects that implement
these interfaces deal with setting up suitable grammars.
COM
The high level COM APIs are described as the Voice Text API, Voice
Command API, Voice Dictation API and Voice Telephony API (we
won't be looking at the last one in this article). As mentioned earlier, these
high level APIs are implemented to call the lower level APIs and this code resides
in an out-of-process COM/Automation server.
This server resides with the other main redistributable SAPI elements speech
directory under the main Windows directory (for example C:WINNTspeech). It
is called VCmd.exe and is described (by its version information) as Microsoft
Voice Commands.
Voice Text API
Let's first look at TTS; the high level COM support for TTS is referred to
as the Voice Text API. This API involves working with some interfaces
implemented by a single COM object, referred to as the Voice Text Object
by the SAPI 4 documentation but described as the Voice Text Manager in
the Windows registry.
To get access to the object you can call CreateComObject from the ComObj unit,
passing in the ClassID CLSID_VTxt
from the speech unit. The created object supports the IVoiceText
interface, which is what you use for the most common tasks. It also supports
IVTxtAttributes, which allows
you to control attributes such as the audio device and the speaking speed and
find out if speech is in progress, and IVTxtDialogs,
which allows you to invoke dialogs to configure the TTS engine (the dialogs
are implemented by the engine).
Making Your Computer Talk
At its simplest level, all you need to do to get your program to speak is to
create the COM object, extract the IVoiceText
interface, register your application to allow you to use Voice Text and then
ask it to speak. A trivial application that does this can be found in the VoiceTextAPISimple.dpr
project in the files associated with this paper
in the COM directory. The code looks like this:
uses
Speech, ...
type
TfrmVoiceTextAPI = class(TForm)
...
private
VoiceText: IVoiceText;
end;
...
uses
ComObj, ActiveX;
procedure TfrmVoiceTextAPI.FormCreate(Sender: TObject);
begin
VoiceText := CreateComObject(CLSID_VTxt) as IVoiceText;
OleCheck(VoiceText.Register(nil, PChar(Application.ExeName),
nil, GUID_NULL, 0, nil));
end;
procedure TfrmVoiceTextAPI.Button1Click(Sender: TObject);
begin
OleCheck(VoiceText.Speak(PChar(memText.Text), 0, nil));
end;
And there you have it: a speaking application. The call to Register
takes a number of parameters that we should examine:
- The first one is a string (PChar)
that represents the site you want to use Voice Text on. Passing nil
or 'Local PC' means to use
the computer, but you can also pass in 'Line1'
for the first telephone line in a telephony-based application. Information
about each site is stored in the registry as a key under HKCUSoftwareVoiceVoiceText.
- The next parameter is a unique identifying string for the application being
registered.
- The following three parameters are to do
with notifications, which are just like events triggered by the Voice Text
API (we'll look more at these later). Passing
nil, GUID_NULL
and 0 for these parameters results in your application not being notified
of anything that happens during the speech synthesis. However you might need
to know when the speech starts and stops or if any of the speech attributes
get changed.
Another notification tells you about each phoneme
that gets spoken. This includes the phoneme itself, both in an Engine-specific
form and also as an IPA (International Phonetic Alphabet) phoneme. It also
includes information that describes the mouth position for the current phoneme
(such as mouth width and height, tongue position and lip tension) to aid in
rendering a visual representation of the phoneme being spoken.
- The last registration argument allows you to customise attributes of this
site, such as the voice or talking speed, by passing the address of a TVTSiteInfo
record.
When requesting some speech the Speak
method takes three parameters:
- The first is the text to speak, passed as a PChar.
- The second parameter represents some flags that indicate the type and priority
of the text. The default flags are VTXTST_STATEMENT
or VTXTSP_NORMAL but you can
indicate a very high priority question (VTXTST_QUESTION
or VTXTSP_VERYHIGH) or a high
priority warning (VTXTST_WARNING
or VTXTSP_HIGH), for example.
- The last parameter lets you pass in textual control tags that can affect
the voice, language or context of the text to be spoken.
When the program executes it lets you type in some text in a memo and a button
renders it into the spoken word.

That's the simple example out of the way, but what can we achieve if we dig
a little deeper and get our hands a little dirtier? The next
project, which holds the answers to these questions, can be found as VoiceTextAPI.dpr
in the COM directory.
This makes use of the notification support and also uses a few more methods
of the IVoiceText and IVTxtAttributes
interfaces to control the generated speech. As you can see below there are buttons
to play, pause and stop the speech as well as to invoke some TTS engine configuration
dialogs. These are joined by a memo where a phonetic equivalent (using the TTS
engine's phoneme representation) of the spoken text is inserted and also a listbox
where the notifications are recorded.

Let's start with the simple stuff. We have a couple of routines that support
logging information to the listbox (one takes a string parameter and the other
takes the same parameters as Format).
procedure TfrmVoiceTextAPI.Log(const Msg: String);
begin
if not Assigned(lstProgress) then
Exit;
lstProgress.Items.Add(Msg);
lstProgress.ItemIndex := lstProgress.Items.Count - 1
end;
procedure TfrmVoiceTextAPI.Log(const Msg: String; const Args: array of const);
begin
Log(Format(Msg, Args))
end;
The form's OnCreate event handler
connects to the COM object as before but this time it passes in a freshly created
object that will receive the notifications. The object (which we will come
back to later) implements the IVTxtNotifySink
interface and expects to receive all notifications (as opposed to just two of
them).
Additionally the IVTxtAttributes
interface is extracted in order to set up the checkbox that shows you whether
Voice Text is enabled or not (you can see the checkbox event handler below),
and the IVTxtDialogs interface
is extracted to allow access to the engine dialogs available through buttons
on the form.
The other task performed here is to add a horizontal scrollbar to the logging
listbox to ensure strings longer than its current width can be viewed.
procedure TfrmVoiceTextAPI.FormCreate(Sender: TObject);
var
Enabled: DWord;
begin
SendMessage(lstProgress.Handle, LB_SETHORIZONTALEXTENT, Width, 0);
VoiceText := CreateComObject(CLSID_VTxt) as IVoiceText;
OleCheck(VoiceText.Register(nil, PChar(Application.ExeName),
TVTxtNotifySink.Create(Self), IVTxtNotifySink, VTXTF_ALLMESSAGES, nil));
TxtAttrs := VoiceText as IVTxtAttributes;
OleCheck(TxtAttrs.EnabledGet(Enabled));
chkEnabled.Checked := Bool(Enabled);
TxtDlgs := VoiceText as IVTxtDialogs;
end;
procedure TfrmVoiceTextAPI.chkEnabledClick(Sender: TObject);
begin
OleCheck(TxtAttrs.EnabledSet(DWord(chkEnabled.Checked)))
end;
The buttons that play, pause and stop simply call methods IVoiceText
methods. The play button can start the speech and the pause button can suspend
it. The play button can then continue the speech but it has to do so using a
different method than it used to initiate the speech (the BeenPaused
flag is used to help manage this). The stop button completely stops the current
speech.
procedure TfrmVoiceTextAPI.btnPlayClick(Sender: TObject);
begin
if not BeenPaused then
OleCheck(VoiceText.Speak(PChar(reText.Text),
VTXTST_STATEMENT or VTXTSP_NORMAL, nil))
else
begin
OleCheck(VoiceText.AudioResume);
BeenPaused := False
end;
end;
procedure TfrmVoiceTextAPI.btnPauseClick(Sender: TObject);
begin
if VoiceText.AudioPause = NOERROR then
BeenPaused := True
end;
procedure TfrmVoiceTextAPI.btnStopClick(Sender: TObject);
begin
OleCheck(VoiceText.StopSpeaking);
BeenPaused := False;
end;
Engine Dialogs
The buttons that invoke the various dialogs each use much the same code. Each
one makes a call to an appropriate method of the dialogs interface, passing
the form window handle and a blank string (so it uses the default caption).
procedure TfrmVoiceTextAPI.btnAboutClick(Sender: TObject);
begin
OleCheck(TxtDlgs.AboutDlg(Handle, nil))
end;
procedure TfrmVoiceTextAPI.btnGeneralClick(Sender: TObject);
begin
OleCheck(TxtDlgs.GeneralDlg(Handle, nil))
end;
procedure TfrmVoiceTextAPI.btnLexiconClick(Sender: TObject);
begin
OleCheck(TxtDlgs.LexiconDlg(Handle, nil))
end;
procedure TfrmVoiceTextAPI.btnTranslateClick(Sender: TObject);
begin
OleCheck(TxtDlgs.TranslateDlg(Handle, nil))
end;
In the case of the Microsoft TTS engine, the About and General dialogs are
both the same:

The Lexicon dialog offers the user a wizard to add new words to the internal
dictionary and specify their correct pronunciation:

The Translation dialog is not implemented by the MS TTS engine.
Voice Text Notifications
The rest of the code is made up of the class that is designed to receive the
notifications. Such a class is called a notification sink and must implement
the IVTxtNotifySink notification
interface as shown below.
type
TVTxtNotifySink = class(TInterfacedObject, IVTxtNotifySink)
private
FForm: TfrmVoiceTextAPI;
protected
function AttribChanged(dwAttribute: DWORD): HResult; stdcall;
function Visual(cIPAPhoneme: WideChar; cEnginePhoneme: AnsiChar;
dwHints: DWORD; pTTSMouth: PTTSMOUTH): HResult; stdcall;
function Speak(pszText: PAnsiChar; pszApplication: PAnsiChar;
dwType: DWORD): HResult; stdcall;
function SpeakingStarted: HResult; stdcall;
function SpeakingDone: HResult; stdcall;
public
constructor Create(Form: TfrmVoiceTextAPI);
end;
constructor TVTxtNotifySink.Create(Form: TfrmVoiceTextAPI);
begin
inherited Create;
FForm := Form
end;
function TVTxtNotifySink.AttribChanged(dwAttribute: DWORD): HResult;
var
S: String;
begin
Result := S_OK;
case dwAttribute of
TTSNSAC_REALTIME : S := 'Realtime';
TTSNSAC_PITCH : S := 'Pitch';
TTSNSAC_SPEED : S := 'Speed';
TTSNSAC_VOLUME : S := 'Volume';
else
S := 'unknown'
end;
FForm.Log('Engine Event AttribChanged: %s changed', [S]);
end;
function TVTxtNotifySink.Speak(pszText, pszApplication: PAnsiChar;
dwType: DWORD): HResult;
begin
Result := S_OK;
FForm.Log('Engine Event Speak');
FForm.memEnginePhonemes.Clear
end;
function TVTxtNotifySink.SpeakingDone: HResult;
begin
Result := S_OK;
FForm.Log('Engine Event SpeakingDone');
end;
function TVTxtNotifySink.SpeakingStarted: HResult;
begin
Result := S_OK;
FForm.Log('Engine Event SpeakingStarted');
end;
function TVTxtNotifySink.Visual(cIPAPhoneme: WideChar;
cEnginePhoneme: AnsiChar; dwHints: DWORD; pTTSMouth: PTTSMOUTH): HResult;
var
Hint: String;
begin
Result := S_OK;
Hint := '';
if dwHints <> 0 then
begin
if dwHints and TTSNSHINT_QUESTION <> 0 then
Hint := 'Question ';
if dwHints and TTSNSHINT_STATEMENT <> 0 then
Hint := Hint + 'Statement ';
if dwHints and TTSNSHINT_COMMAND <> 0 then
Hint := Hint + 'Command ';
if dwHints and TTSNSHINT_EXCLAMATION <> 0 then
Hint := Hint + 'Exclamation ';
if dwHints and TTSNSHINT_EMPHASIS <> 0 then
Hint := Hint + 'Emphasis';
end
else
Hint := 'none';
FForm.Log('Engine Event Visual: hint = %s', [Hint]);
if cEnginePhoneme <> #32 then
FForm.memEnginePhonemes.Text :=
FForm.memEnginePhonemes.Text + cEnginePhoneme
end;
As you can see, the interface defines a total of five methods that are used
to notify your program of various things such as a speech attribute changing,
a speech request and indications that speech has either started or stopped.
There is also the aforementioned notification that
describes the phoneme being spoken and allows the creative to animate the speech
(the Visual method). Don't fret
if you lack the creative touch as later options
provide animated mouths anyway.
All five of these notification methods are likely to be
called if you pass VTXTF_ALLMESSAGES
as the penultimate parameter when registering to
use voice text. However if you pass 0 instead, only the SpeakingStarted
and SpeakingDone notification
methods will be called.
As you can see, most of these notification methods just log a string into the
listbox, although the Speak notification
clears the phoneme memo. The Visual
notification offers the most information. In this case the dwHints
parameter is examined to see if it tells us anything and the engine phoneme
is added to the memo, but everything else is ignored.
Speaking Dialogs
As an example of using the Voice Text API you can make all your VCL dialogs
talk to you using this small piece of code.
uses
VTxtAuto_TLB;
var
Voice: IVTxtAuto;
procedure TForm1.FormCreate(Sender: TObject);
begin
Screen.OnActiveFormChange := ScreenFormChange;
end;
procedure TForm1.ReadVCLDialog(Form: TCustomForm);
var
I: Integer;
ButtonCaptions, LabelCaption, DialogText: string;
begin
try
if not Assigned(Voice) then
begin
Voice := CoVTxtAuto_.Create;
Voice.Register('', Application.ExeName);
end;
for I := 0 to Form.ComponentCount - 1 do
if Form.Components[I] is TLabel then
LabelCaption := TLabel(Form.Components[I]).Caption
else
if Form.Components[I] is TButton then
ButtonCaptions := Format('%s%s, ',
[ButtonCaptions, TButton(Form.Components[I]).Caption]);
ButtonCaptions := StringReplace(ButtonCaptions,'&','', [rfReplaceAll]);
DialogText := Format('%s.%s%s.%s%s',
[Form.Caption, sLineBreak, LabelCaption, sLineBreak, ButtonCaptions]);
Memo1.Text := DialogText;
Voice.Speak(DialogText, 0)
except
//pretend everything is okay
end
end;
procedure TForm1.ScreenFormChange(Sender: TObject);
begin
if Assigned(Screen.ActiveForm) and
(Screen.ActiveForm.ClassName = 'TMessageForm') then
ReadVCLDialog(Screen.ActiveForm)
end;
The form's OnCreate event handler
sets up an OnActiveFormChange
event handler for the screen object. This is triggered each time a new form
is displayed, which includes VCL dialogs. Any call to ShowMessage,
MessageDlg or related routines
causes a TMessageForm to be displayed
so the code checks for this. If the form type is found, a textual version of
what's on the dialog is built up and then spoken through the Voice Text API
Automation component.
A statement such as:
MessageDlg('Save changes?', mtConfirmation, mbYesNoCancel, 0)
causes the ReadVCLDialog routine to build up and say this text:
Confirm.
Save changes?.
Yes, No, Cancel,
Notice the full stops at the end of each line to briefly pause the speech engine
at that point before moving on.
Voice Command API
The Voice Command API allows you to implement command-and-control SR in your
application and operates through a COM object referred to as the Voice Command
Object by the SAPI 4 documentation but described as the Voice Command
Manager in the Windows registry.
You use the ClassID CLSID_VCmd
from the Speech unit to initialise it and the created object supports the IVoiceCmd,
IVCmdDialogs and IVCmdAttributes
interfaces. The application must register itself through the IVoiceCmd
interface Register method and
must then create a command menu, represented by the IVCmdMenu
interface. This menu contains details of all the commands the user can utter
and is analogous to a Windows menu in that regard.
Menus And Commands
The concepts of command and control SR support are similar to the commands
in normal Windows menus. You define menus containing commands and can then control
which are active and which are not. All Voice Command client applications share
the Microsoft Voice Commands Automation server and so there is a central "repository"
of all the defined command menus.
Depending how the menus are created, the server may store the menu in a database
file (a file in the speech directory beneath your main Windows directory with
the spec named as vcmd*.vcd). Also, the menu can be specified as local to a
particular window (it is automatically activated and deactivated as the window
gains and loses focus) or global (it is always active).
When a command from an active menu is recognised you are told about it through
another notification interface, IVCmdNotifySink.
The primary notification of interest is the CommandRecognize
method that tells you which of your commands was spoken. If another application's
command is recognised by the Voice Commands Automation server, the CommandOther
notification is fired.
Also available is a command menu enumerator interface, IVCmdEnum,
which allows you to enumerate all the voice menus in the Voice Menu database.
Speech recognition can be either enabled or disabled for an entire site, but
it can also be awake or asleep (temporarily paused). Information about these
states can be obtained from the IVCmdAttributes
interface. You can send Voice Commands to sleep if you want it to ignore almost
everything you say for a while, except for a dedicated sleep menu (that contains
a command such as Start Listening or Wake Up). If you disable
Voice Commands you cannot reactivate it by voice.
A sample Voice Command API application can
be found as VoiceCommandAPI.dpr in the COM directory.
The OnCreate event handler connects
to the Voice Command Object and extracts the IVCmdAttributes
interface. This is used to ascertain whether speech recognition is awake and
enabled for the Voice Command site. Two checkboxes on the form are used to record
this information (and they have OnClick event handlers to update these states
as well). The IVCmdDialogs interface
is also extracted to enable access to the engine dialogs through buttons on
the form.
uses
Speech, ...
type
TfrmVoiceCommandAPI = class(TForm)
...
private
VoiceCmd: IVoiceCmd;
CmdAttrs: IVCmdAttributes;
CmdMenu: IVCmdMenu;
CmdDlgs: IVCmdDialogs;
...
end;
...
procedure TfrmVoiceCommandAPI.FormCreate(Sender: TObject);
var
Enabled, Awake: DWord;
begin
VoiceCmd := CreateComObject(CLSID_VCmd) as IVoiceCmd;
CmdAttrs := VoiceCmd as IVCmdAttributes;
OleCheck(VoiceCmd.Register(nil, TVCmdNotifySink.Create(Self),
IVCmdNotifySink, VCMDRF_ALLMESSAGES, nil));
CreateCommandMenu;
CmdAttrs.EnabledGet(Enabled);
chkEnabled.Checked := Bool(Enabled);
CmdAttrs.AwakeStateGet(Awake);
chkAwake.Checked := Bool(Awake);
CmdDlgs := VoiceCmd as IVCmdDialogs;
end;
procedure TfrmVoiceCommandAPI.chkEnabledClick(Sender: TObject);
begin
OleCheck(CmdAttrs.EnabledSet(Integer(chkEnabled.Checked)))
end;
procedure TfrmVoiceCommandAPI.chkAwakeClick(Sender: TObject);
begin
OleCheck(CmdAttrs.AwakeStateSet(Integer(chkAwake.Checked)))
end;
Requesting Notifications
The next job is to register the application in order to use Voice Commands.
Part of the registration process involves passing a reference to a callback
object and (in this case) requesting that all notifications be sent to it.
There are various flags to control how many notifications are sent:
- VCMDRF_NOMESSAGES means no
notifications will be sent (a bit pointless unless you do not create a callback
object, but then you cannot do an awful lot without one)
- VCMDRF_ALLBUTVUMETER means
all notifications except the VU meter notification
- VCMDRF_VUMETER gives just
the VU meter
notification
- VCMDRF_ALLMESSAGES requests
all notifications
Before looking at the callback object we should see what is involved in setting
up a command menu. This is all wrapped up in the form's CreateCommandMenu
method.
procedure TfrmVoiceCommandAPI.CreateCommandMenu;
procedure AddCommand(ID: Integer; const Command, Category,
Description: String);
var
MemReq: Integer;
CmdCommand: PVCmdCommand;
SData: TSData;
Dest: PChar;
CmdStart: DWord;
begin
MemReq := SizeOf(TVCmdCommand) + Succ(Length(Command)) +
Succ(Length(Category)) + Succ(Length(Description));
CmdCommand := AllocMem(MemReq);
try
SData.pData := CmdCommand;
SData.dwSize := MemReq;
CmdCommand.dwSize := MemReq;
CmdCommand.dwID := ID;
Dest := PChar(CmdCommand) + SizeOf(TVCmdCommand);
CmdCommand.dwCommand := SizeOf(TVCmdCommand);
StrCopy(Dest, PChar(Command));
Inc(Dest, Succ(Length(Command)));
CmdCommand.dwCategory :=
SizeOf(TVCmdCommand) + DWord(Succ(Length(Description)));
StrCopy(Dest, PChar(Category));
Inc(Dest, Succ(Length(Category)));
CmdCommand.dwDescription :=
CmdCommand.dwCommand + DWord(Succ(Length(Command)));
StrCopy(Dest, PChar(Description));
OleCheck(CmdMenu.Add(1, SData, CmdStart));
finally
FreeMem(CmdCommand);
end;
end;
var
VCMDName: TVCMDName;
begin
StrPCopy(VCMDName.szApplication, ExtractFileName(Application.ExeName));
StrPCopy(VCMDName.szState, 'Main');
OleCheck(VoiceCmd.MenuCreate(
@VCMDName, nil, VCMDMC_CREATE_TEMP, CmdMenu));
AddCommand(1, 'Red', 'FormColour', 'Change form colour to red');
AddCommand(2, 'Green', 'FormColour', 'Change form colour to green');
AddCommand(3, 'Blue', 'FormColour', 'Change form colour to blue');
OleCheck(CmdMenu.Activate(Handle, 0));
end;
A command menu is created by first setting up a TVCMDName record with a unique
application name (the actual application name is used here) and "state" to describe
the menu. In this case the state is Main, to indicate that the menu related
to the application's main form.
The menu is then created with the IVoiceCmd.MenuCreate
method. The command menu interface is returned in the last parameter (CmdMenu)
and the menu is specified as being temporary (it won't be stored in the Voice
Menu database). Having got a menu, commands are then added using a helper routine
called AddCommand, which expects
the command's ID, name, category and description to be passed.
The details of what AddCommand
does are not too important, but it will suffice to say that it allocates enough
memory for a TVCmdCommand structure
plus the strings that specify the command's name, category and description.
The results are added to the command menu and the allocated memory is freed.
There are reports of issues with permanent Voice Command
menus due to the underlying database sometimes disappearing without warning.
Therefore it is recommended to create temporary menus (using the VCMDMC_CREATE_TEMP
flag), which may take slightly more time than reusing an existing menu from
the database.
Command Options
One option not taken advantage of in AddCommand
is command verification. If a command warrants verification (such as Format
Disk or Delete File) you should add in the appropriate flag, as in:
CmdCommand.dwFlags := VCMDCMD_VERIFY;
This causes the same flag to be passed along to the CommandRecognize
notification method so the verification can be actioned. However,
there is another problem there. The JEDI SAPI import
unit has an error in one of the parameter definitions in the Voice Command notification
sink interface. Unless you need command verification you can ignore it, but
we'll look at what the issue is when we get down to the notification methods.
Another possible flag is VCMDCMD_CANTRENAME,
which ensures the command cannot be renamed by applications supporting that
feature (such as Microsoft Voice).
Voice Commands supports both simple commands (such as Red)
and command that support lists. This allows you to define one command, such
as Start <App>, where <App> represents
a value from the App list, which itself could be a large list of potential programs
to start.
The support of lists allows many similar commands to be set up in two steps:
add the command with IVCmdMenu.Add,
then add the list with IVCmdMenu.ListSet.
At the COM level ListSet is a
little involved and so, like Add,
warrants a helper function. You can use this one if you need to add list commands:
procedure AddList(Menu: IVCmdMenu; const ListName: String;
const ListItems: array of String);
var
List: String;
I: Integer;
Data: TSData;
begin
List := '';
if High(ListItems) < Low(ListItems) then
Exit;
for I := Low(ListItems) to High(ListItems) do
List := List + ListItems[I] + #0;
Data.pData := @List[1];
Data.dwSize := Succ(Length(List));
Menu.ListSet('App', Succ(High(ListItems) - Low(ListItems)), Data);
end;
Once the menu is complete it is activated by calling the Activate method. This
takes two arguments:
- the first argument dictates that the menu will be global (always available
no matter what window has focus) by passing a zero, or local to a window (only
active when that window has focus) by passing the appropriate window handle
- the second argument lets you optionally specify the menu is a sleep menu
(activated when Voice Commands is asleep and deactivated when it is awake)
using the VWGFLAG_ASLEEP flag.
Voice Command Notifications
The callback object simply logs any information passed to its methods when
they are called by the Voice Command object. The only notable methods are listed
below and show how to identify which speech recognition attribute has changed
(if the awake or enabled attributes have changed, the checkboxes are updated
accordingly), what type of interference was encountered, and a simple way to
display the VU meter information (using a progress bar).
type
TVCmdNotifySink = class(TInterfacedObject, IVCmdNotifySink)
private
FForm: TfrmVoiceCommandAPI;
protected
function CommandRecognize(dwID: DWORD; pvCmdName: PVCmdNameA;
pdwFlags: PDWORD;
dwActionSize: DWORD; pAction: pointer; dwNumLists: DWORD;
pszListValues: PAnsiChar; pszCommand: PAnsiChar): HResult; stdcall;
function CommandOther(pName: PVCmdNameA;
pszCommand: PAnsiChar): HResult; stdcall;
function CommandStart: HResult; stdcall;
function MenuActivate(pName: PVCmdNameA;
bActive: BOOL): HResult; stdcall;
function UtteranceBegin: HResult; stdcall;
function UtteranceEnd: HResult; stdcall;
function VUMeter(wLevel: WORD): HResult; stdcall;
function AttribChanged(dwAttribute: DWORD): HResult; stdcall;
function Interference(dwType: DWORD): HResult; stdcall;
public
constructor Create(Form: TfrmVoiceCommandAPI);
end;
...
function TVCmdNotifySink.AttribChanged(dwAttribute: DWORD): HResult;
var
S: String;
Enabled, Awake: DWord;
begin
Result := S_OK;
if dwAttribute <> 0 then
begin
if dwAttribute and IVCNSAC_AUTOGAINENABLE > 0 then
S := 'automatic gain, ';
if dwAttribute and IVCNSAC_ENABLED > 0 then
begin
S := S + 'enabled, ';
OleCheck(FForm.CmdAttrs.EnabledGet(Enabled));
FForm.chkEnabled.Checked := Bool(Enabled);
end;
if dwAttribute and IVCNSAC_AWAKE > 0 then
begin
S := S + 'awake, ';
OleCheck(FForm.CmdAttrs.AwakeStateGet(Awake));
FForm.chkAwake.Checked := Bool(Awake);
end;
if dwAttribute and IVCNSAC_DEVICE > 0 then
S := S + 'audio device, ';
if dwAttribute and IVCNSAC_MICROPHONE > 0 then
S := S + 'current microphone, ';
if dwAttribute and IVCNSAC_SPEAKER > 0 then
S := S + 'speaker, ';
if dwAttribute and IVCNSAC_SRMODE > 0 then
S := S + 'SR mode, ';
if dwAttribute and IVCNSAC_THRESHOLD > 0 then
S := S + 'threshold, ';
if dwAttribute and IVCNSAC_ORIGINAPP > 0 then
S := S + 'from this app';
end
else
S := 'none';
FForm.Log('Attribute changed: ' + S)
end;
function TVCmdNotifySink.CommandRecognize(dwID: DWORD;
pvCmdName: PVCmdNameA; pdwFlags: PDWORD; dwActionSize: DWORD;
pAction: pointer; dwNumLists: DWORD; pszListValues,
pszCommand: PAnsiChar): HResult;
begin
Result := S_OK;
FForm.Log('Command: app = %s, state = %s, cmd = %s, id = %d',
[pvCmdName.szApplication, pvCmdName.szState, pszCommand, dwId]);
case dwID of
1: FForm.Color := clRed;
2: FForm.Color := clGreen;
3: FForm.Color := clBlue;
end
end;
function TVCmdNotifySink.Interference(dwType: DWORD): HResult;
var
S: String;
begin
Result := S_OK;
case dwType of
SRMSGINT_NOISE: S := 'background noise too high';
SRMSGINT_NOSIGNAL:
S := 'engine cannot detect a signal (mic unplugged?)';
SRMSGINT_TOOLOUD:
S := 'speaker is too loud; recognition results may be degraded';
SRMSGINT_TOOQUIET:
S := 'speaker is too quiet; recognition results may be degraded';
SRMSGINT_AUDIODATA_STOPPED,
SRMSGINT_IAUDIO_STOPPED:
S := 'engine has stopped receiving audio data from the audio source';
SRMSGINT_AUDIODATA_STARTED,
SRMSGINT_IAUDIO_STARTED:
S := 'engine has resumed receiving audio data from the audio source';
else
S := Format('type %d', [dwType])
end;
FForm.Log('Interference: %s', [S])
end;
function TVCmdNotifySink.VUMeter(wLevel: WORD): HResult;
begin
Result := S_OK;
FForm.ProgressBar.Position := wLevel;
FForm.lblVU.Caption := IntToStr(wLevel);
end;
The most important method is, of course, CommandRecognize.
This fires when the user speaks a command that is recognised by the Voice Command
object. As you can see, it gets passed a number of parameters, but the most
important one is ID, which allows you to respond to the commands as you like
using a case statement.
If you request command verification,
as described earlier, you will need to be aware
of a problem in the JEDI SAPI 4 import unit.
The CommandRecognize method (defined
in the IVCmdNotifySinkA and IVCmdNotifySinkW
interfaces) declares the flags parameter as:
pdwFlags: PDWORD;
where it should actually be:
dwFlags: DWORD;
This has been reported and so should be fixed at some point. In the meantime,
the simplest way to overcome this is to check DWord(pdwFlags).
CommandRecognize
does not get passed the category or description of the command, so there is
little point in setting them up here (although some of the other mechanisms
for using the Voice Command object do utilise this information).
If a command is recognised and that command is defined
in terms of a list, pszCommand
will have the full command as spoken by the user, pszListValues
refers to the list item on its own and dwNumLists
tells you how many bytes the list item takes up (including the null terminator).
Since we are looking at an ANSI notification interface, dwNumLists
will be one greater than the number of characters in the list item.
You can see how this application looks after a few commands have been spoken
below.

Engine Dialogs
As you can see above there are five potential dialogs available from an SR
engine, each being invoked much as with the TTS dialogs.
procedure TfrmVoiceCommandAPI.btnAboutClick(Sender: TObject);
begin
OleCheck(CmdDlgs.AboutDlg(Handle, nil))
end;
procedure TfrmVoiceCommandAPI.btnGeneralClick(Sender: TObject);
begin
OleCheck(CmdDlgs.GeneralDlg(Handle, nil))
end;
procedure TfrmVoiceCommandAPI.btnLexiconClick(Sender: TObject);
begin
OleCheck(CmdDlgs.LexiconDlg(Handle, nil))
end;
procedure TfrmVoiceCommandAPI.btnTrainGeneralClick(Sender: TObject);
begin
OleCheck(CmdDlgs.TrainGeneralDlg(Handle, nil))
end;
procedure TfrmVoiceCommandAPI.btnTrainMicClick(Sender: TObject);
begin
OleCheck(CmdDlgs.TrainMicDlg(Handle, nil))
end;
In the case of the Microsoft SR engine, the About dialog gives version information:

The General dialog allows you to set the accuracy of speech recognition:

The Lexicon dialog is much the same as with the TTS engine but the training
dialog allows you to read various passages of text to train the SR engine to
your voice:

Voice Dictation API
The Voice Dictation API allows you to implement dictation SR in your application
and operates through a COM object referred to as the Voice Dictation Object
by the SAPI 4 documentation but described as the Voice Dictation Manager
in the Windows registry.
You use the ClassID CLSID_VDct
from the Speech unit to initialise it and the created object supports numerous
interfaces, including IVoiceDictation
(which you use to register your application) as well as IVDctAttributes
and IVDctDialogs.
You identify what has been spoken by the user through another notification
interface, IVDctNotifySink. The
primary notification of interest is the PhraseFinish
method, which tells you which phrase was spoken (or the engine's best guess).
A sample Voice Dictation API application can
be found as VoiceDictationAPI.dpr in the COM directory.
The OnCreate event handler connects
to the Voice Dictation Object and registers with it. Then the IVDctDialogs
interface is extracted in order to allow access to the dialogs through buttons
on the form and the IVDctAttributes
interface is also extracted. This is used to set a speaker so the speech recognition
training for that speaker can be used. It would be better to store the speaker
name with your application state data (registry or an INI file) than to hardcode
it as in this example.
The Voice Dictation session is then activated for whenever the form is focused.
Finally the SR mode is set to support voice commands and dictation, meaning
this program will work and any Voice Command applications will also continue
to function. Note that this will only work if the engine supports simultaneous
command and control and dictation.
uses
Speech, ...
type
TfrmVoiceDictationAPI = class(TForm)
...
private
VoiceDct: IVoiceDictation;
DctAttrs: IVDctAttributes;
DctDlgs: IVDctDialogs;
...
end;
...
procedure TfrmVoiceDictationAPI.FormCreate(Sender: TObject);
begin
VoiceDct := CreateComObject(CLSID_VDct) as IVoiceDictation;
OleCheck(VoiceDct.Register(
PChar(ExtractFileName(Application.ExeName)), 'My Topic', nil, nil,
TVDctNotifySink.Create(Self), IVDctNotifySink, VCMDRF_ALLMESSAGES));
DctDlgs := VoiceDct as IVDctDialogs;
DctAttrs := VoiceDct as IVDctAttributes;
OleCheck(DctAttrs.SpeakerSet('blong'));
OleCheck(VoiceDct.Activate(Handle));
OleCheck(DctAttrs.ModeSet(VSRMODE_CMDANDDCT));end;
The callback object logs information passed to the notification methods.
type
TVDctNotifySink = class(TInterfacedObject, IVDctNotifySink)
private
FForm: TfrmVoiceDictationAPI;
function PhraseToStr(pSRPhrase: PSRPhraseA): String;
protected
function CommandBuiltIn(pszCommand: PAnsiChar): HResult; stdcall;
function CommandOther(pszCommand: PAnsiChar): HResult; stdcall;
function CommandRecognize(dwID: DWord; pdwFlags: PDWord;
dwActionSize: DWORD; pAction: Pointer;
pszCommand: PAnsiChar): HResult; stdcall;
function TextSelChanged: HResult; stdcall;
function TextChanged(dwReason: DWORD): HResult; stdcall;
function TextBookmarkChanged(dwID: DWORD): HResult; stdcall;
function PhraseStart: HResult; stdcall;
function PhraseFinish(dwFlags: DWORD;
pSRPhrase: PSRPhraseA): HResult; stdcall;
function PhraseHypothesis(dwFlags: DWORD;
pSRPhrase: PSRPhraseA): HResult; stdcall;
function UtteranceBegin: HResult; stdcall;
function UtteranceEnd: HResult; stdcall;
function VUMeter(wLevel: WORD): HResult; stdcall;
function AttribChanged(dwAttribute: DWORD): HResult; stdcall;
function Interference(dwType: DWORD): HResult; stdcall;
function Training(dwTrain: DWORD): HResult; stdcall;
function Dictating(pszApp: PAnsiChar;
fDictating: BOOL): HResult; stdcall;
public
constructor Create(Form: TfrmVoiceDictationAPI);
end;
The main method is PhraseFinish,
which is called when the SR engine has decided what has been spoken. Whilst
working it out, it will likely call the PhraseHypothesis
method several times.
function TVDctNotifySink.PhraseFinish(dwFlags: DWORD;
pSRPhrase: PSRPhraseA): HResult;
begin
Result := S_OK;
FForm.Log('PhraseFinish: %s', [PhraseToStr(pSRPhrase)]);
FForm.memText.SelText := PhraseToStr(pSRPhrase);
//Since PhraseStart never seems to trigger, this will
//clear the old hypothese list on the first new hypothesis
FPhraseDone := True
end;
function TVDctNotifySink.PhraseHypothesis(dwFlags: DWORD;
pSRPhrase: PSRPhraseA): HResult;
begin
Result := S_OK;
//Since PhraseStart never seems to trigger, this will
//clear the old hypothese list on the first new hypothesis
if FPhraseDone then
begin
FForm.lstHypotheses.Clear;
FPhraseDone := False
end;
FForm.lstHypotheses.Items.Add(PhraseToStr(pSRPhrase));
FForm.lstHypotheses.ItemIndex := FForm.lstHypotheses.Items.Count - 1
end;
Both these methods are passed a pointer to a TSRPhrase
record that contains the words in the phrase being reported. A helper routine
is used to turn this into a normal string. Finished phrases are added to a memo
on the form and whilst a phrase is being worked out the hypotheses are added
to a list box so you can see how the SR engine made its decision. Each time
a new phrase is started, the hypothesis list is cleared.
The hypothesis list is cleared in the PhraseHypothesis
notification method, if a specified flag is True.
It would be more sensible to clear it in the PhraseStart
method but that notification method is never called.
function TVDctNotifySink.PhraseToStr(pSRPhrase: PSRPhraseA): String;
var
ToGo: Integer;
PSRW: PSRWord;
begin
Result := '';
if pSRPhrase = nil then
Exit;
ToGo := pSRPhrase.dwSize - SizeOf(pSRPhrase.dwSize);
PSRW := @pSRPhrase.abWords;
while ToGo > 0 do
begin
Result := Result + PChar(@PSRW.szWord) + #32;
Dec(ToGo, PSRW.dwSize);
Inc(PChar(PSRW), PSRW.dwSize)
end;
end;
You can see the list of hypotheses building up in this screenshot of the program
running.

The API routines for invoking the dialogs have a slight
issue. The method IVDctDialogs.TrainGeneralDlg
method actually invokes the microphone training dialog and IVDctDialogs.TrainMicDlg
actually invokes the general speech recognition training dialog.
For dictation to work acceptably you should spend the time
doing several voice training sessions. You should also invest in a quality microphone
(a close-talk headset microphone is best).
Built-In Commands
The application has a popup menu that shows if you right-click on the form.
The only menu item causes the Voice Dictation Object's built-in command grammar
to be dumped into the memo.
procedure TfrmVoiceDictationAPI.Showbuiltincommands1Click(Sender: TObject);
var
Buf: PChar;
BufSize: DWord;
begin
//Get built-in commands
with VoiceDct as IVDctCommandsBuiltIn do
begin
TextGet(Buf, BufSize);
memText.Text := Buf;
CoTaskMemFree(Buf);
end
end;
You can see a small portion of the grammar that sets up these commands here:

Automation
The high level SAPI Automation objects are also available for control through
Automation. The Automation objects themselves are implemented in the Microsoft
Voice Commands Automation server along with the high level COM objects. Being
Automation objects, their capabilities are described in type libraries.
The Voice Text Object Automation interface is described in the vtxtauto.tlb
type library whilst the Voice Command Object Automation interface is described
by vcauto.tlb. Both these type libraries can be found in the Windows speech
directory.
There is no Automation interface to the Voice Dictation
Object.
The speech notifications are still available when using Automation, although
they are set up in a different way. Rather than implementing an internal notification
sink object and passing that along to a registration method, you must implement
a registered Automation object and assign its ProgID to the speech object's
Callback string property.
There is an important point about these callback objects
regarding application shutdown. Since the speech Automation object is a client
to your voice-enabled application (it instantiates the Automation callback object
specified by the ProgID) something important happens when you try to shut your
application.
Because the speech object still has a reference to your Automation object you
will get a warning dialog invoked:

You can usually rectify the problem by assigning Unassigned to the Variant
that represents the speech object (or objects) in the main form's OnClose event
handler.
An alternative is to prevent the warning dialog from being displayed in the
first place. This can be achieved by adding this line to the initialisation
section of the unit that implements your Automation callback object:
ComServer.UIInteractive := False
This is perfectly safe, as the normal destruction of your main form will drop
the reference to the speech object, which will then drop the reference to your
callback object, allowing it to be destroyed.
Voice Text Automation
Late Bound Automation
To use Automation against the Voice Text Object through a Variant (late bound
access) you access it through the ProgID Speech.VoiceText. As with the
Voice Text API you must register your application before you can use the TTS
functionality.
uses
ComObj,
...
type
TfrmVTxtAutoLateBound = class(TForm)
...
private
VTxt: Variant;
...
end;
...
VTxt := CreateOleObject('Speech.VoiceText');
VTxt.Register('', Application.ExeName);
The Automation object can notify you when speaking has started and when it
stops through an Automation object that you implement and register. It must
implement two parameterless methods: SpeakingStarted
and SpeakingDone. You assign
its ProgID to the Voice Text object's Callback
property.
The sample project VoiceTextAutoVar.dpr in
the Automation directory contains an Automation object that implements these
methods and its ProgID is VoiceTextAutoVar.VoiceCallBack.
VTxt.Callback := 'VoiceTextAutoVar.VoiceCallback';
The methods available for controlling the speech progress are much the same
as with the Voice Text API but there is an additional property, IsSpeaking,
which is useful for working out if speech is currently in progress. You can
get this information with the Voice Text API, but it involves calling a method
of the IVTxtAttributes interface
so this Automation object clearly surfaces parts of at least two of the Voice
Text API interfaces (IVoiceText
and IVTxtAttributes).
procedure TfrmVTxtAutoLateBound.btnPlayClick(Sender: TObject);
begin
if not BeenPaused then
VTxt.Speak(memText.Text, 0)
else
begin
VTxt.AudioResume;
BeenPaused := False
end
end;
procedure TfrmVTxtAutoLateBound.btnPauseClick(Sender: TObject);
begin
if VTxt.IsSpeaking then
begin
VTxt.AudioPause;
BeenPaused := True
end
end;
procedure TfrmVTxtAutoLateBound.btnStopClick(Sender: TObject);
begin
VTxt.StopSpeaking;
end;
The callback object simply logs messages to the listbox on the main form when
its two notification methods are called.
There is another demo of late bound Voice
Text Automation in the same directory in the project VoiceTextAutoVarReadWordDoc.dpr.
As the name suggests, this sample reads out loud from a Word document. It uses
Automation to control Microsoft Word and also to control the Voice Text object.
The demo is inspired by a sample VB application from Reference
1. However the original VB code used the WordBasic Automation interface,
which did not work so well with more recent versions of Word, so it has been
re-written to use the Word VBA interface. Other changes have also been made.
type
TfrmVTxtAutoLateBound = class(TForm)
...
private
VTxt, MSWord: Variant;
end;
...
procedure TfrmVTxtAutoLateBound.FormCreate(Sender: TObject);
begin
VTxt := CreateOleObject('Speech.VoiceText');
VTxt.Register('', Application.ExeName);
MSWord := CreateOleObject('Word.Application');
end;
procedure TfrmVTxtAutoLateBound.btnReadDocClick(Sender: TObject);
const
// Constants for enum WdUnits
wdCharacter = $00000001;
wdParagraph = $00000004;
// Constants for enum WdMovementType
wdExtend = $00000001;
var
Moved: Integer;
Txt: String;
begin
(Sender as TButton).Enabled := False;
if dlgOpenDoc.Execute then
begin
MSWord.Documents.Open(FileName := dlgOpenDoc.FileName);
Moved := 2;
while Moved > 1 do
begin
//Select next paragraph
Moved := MSWord.Selection.EndOf(Unit:=wdParagraph, Extend:=wdExtend);
if Moved > 1 then
begin
MSWord.Selection.Copy;
Txt := Trim(ClipBoard.AsText);
if Length(Txt) > 0 then
VTxt.Speak(pszBuffer := Txt, dwFlags := 0);
Application.ProcessMessages;
//Move to start of next paragraph
MSWord.Selection.MoveRight(Unit:=wdCharacter);
end
end;
end;
MSWord.ActiveDocument.Close;
TButton(Sender).Enabled := True;
end;
procedure TfrmVTxtAutoLateBound.btnStopClick(Sender: TObject);
begin
if VTxt.IsSpeaking then
VTxt.StopSpeaking
end;
procedure TfrmVTxtAutoLateBound.FormDestroy(Sender: TObject);
begin
btnStop.Click;
MSWord.Quit;
MSWord := Unassigned;
end;
The example uses the clipboard to copy each paragraph from
the document to be read. It would be more sensible to simply read the selected
text directly, but that causes strange hanging problems with long paragraphs.
Early Bound Automation
To use Automation against the Voice Text Object through interfaces (early bound)
requires you to import its type library to get a Pascal representation of all
its interfaces and supporting constants and types. In Delphi you do this with
Project | Import Type Library...,
but since the sought library is not registered you will need to press the Add...
button and locate it manually (vtxtauto.tlb in the Windows speech directory).

Pressing the Create Unit button
generates a type library import unit called VTxtAuto_TLB.pas.
You might normally press Install...
to ensure any generated component wrappers for exposed Automation objects are
installed on the Component Palette. However these example all work with the
Automation objects using normal Automation coding and don't make use of component
wrapper classes, so there is little point making the components (they save you
very little).
Ready made packages for Delphi 5, 6 and 7 containing the type library import
unit can be found in appropriately named subdirectories under SAPI 4 in the
accompanying files.
You access the Automation object using the ClassID CLASS_VTxtAuto_
and the implemented interface is IVTxtAuto.
Both the ClassID and the interface are defined in the type library import unit.
The following code comes from the sample project
VoiceTextAuto.dpr in the Automation directory.
uses
VTxtAuto_TLB, ComObj,
...
type
TfrmVTxtAutoEarlyBound = class(TForm)
...
private
VTxt: IVTxtAuto;
BeenPaused: Boolean;
...
end;
...
procedure TfrmVTxtAutoEarlyBound.FormCreate(Sender: TObject);
begin
SendMessage(lstProgress.Handle, LB_SETHORIZONTALEXTENT, Width, 0);
//VTxt := CreateOleObject('Speech.VoiceText') as IVTxtAuto;
//VTxt := CoVTxtAuto_.Create;
VTxt := CreateComObject(CLASS_VTxtAuto_) as IVTxtAuto;
VTxt.Register('', Application.ExeName);
//The callback object specified by the ProgID
//below implements the notification interface
VTxt.Callback := 'VoiceTextAuto.VoiceCallback';
end;
As well as using CreateComObject
and passing the ClassID (which requires you to query for the appropriate interface)
the code above shows two alternatives. You can use the helper class, CoVTxtAuto,
defined in the type library import unit. This does exactly the same as the code
that is being used (but involves less typing). Alternatively you can call CreateOleObject,
passing the ProgID, and query for the interface.
The code for the buttons and the callback object is just the same as for the
late bound version.
Speaking Dialogs
As an example of using automating the Voice Text API you can make all your
VCL dialogs talk to you using this small piece of code.
var
Voice: Variant;
procedure TForm1.FormCreate(Sender: TObject);
begin
Screen.OnActiveFormChange := ScreenFormChange;
end;
procedure TForm1.ReadVCLDialog(Form: TCustomForm);
var
I: Integer;
ButtonCaptions, LabelCaption, DialogText: string;
begin
try
if VarType(Voice) <> varDispatch then
begin
Voice := CreateOleObject('Speech.VoiceText');
Voice.Register('', Application.ExeName);
end;
for I := 0 to Form.ComponentCount - 1 do
if Form.Components[I] is TLabel then
LabelCaption := TLabel(Form.Components[I]).Caption
else
if Form.Components[I] is TButton then
ButtonCaptions := Format('%s%s, ',
[ButtonCaptions, TButton(Form.Components[I]).Caption]);
ButtonCaptions := StringReplace(ButtonCaptions,'&','', [rfReplaceAll]);
DialogText := Format('%s.%s%s.%s%s',
[Form.Caption, sLineBreak, LabelCaption, sLineBreak, ButtonCaptions]);
Memo1.Text := DialogText;
Voice.Speak(DialogText, 0)
except
//pretend eveyrthing is okay
end
end;
procedure TForm1.ScreenFormChange(Sender: TObject);
begin
if Assigned(Screen.ActiveForm) and
(Screen.ActiveForm.ClassName = 'TMessageForm') then
ReadVCLDialog(Screen.ActiveForm)
end;
The form's OnCreate event handler
sets up an OnActiveFormChange
event handler for the screen object. This is triggered each time a new form
is displayed, which includes VCL dialogs. Any call to ShowMessage,
MessageDlg or related routines
causes a TMessageForm to be displayed
so the code checks for this. If the form type is found, a textual version of
what's on the dialog is built up and then spoken through the Voice Text API
Automation component.
A statement such as:
MessageDlg('Save changes?', mtConfirmation, mbYesNoCancel, 0)
causes the ReadVCLDialog routine
to build up and say this text:
Confirm.
Save changes?.
Yes, No, Cancel,
Notice the full stops at the end of each line to briefly pause the speech engine
at that point before moving on.
Voice Command Automation
Late Bound Automation
To use Automation against the Voice Command Object through a Variant (late
bound access) you access it through the ProgID Speech.VoiceCommand. A
sample project called VoiceCommandAutoVar.dpr
can be found in the Automation directory.
This example project shows how you can take advantage of the similarities between
voice command menus and Windows menus by allowing voice control of your menu
system. The COM example above, by comparison
showed how to define arbitrary commands.
Just as with the Voice Text Automation object, the Voice Command Automation
object is instantiated and then the SR state is examined. You do not get supplied
with the global enabled state when using Automation, but you do have access
to the awake state through the Awake
property, although there is no notification to tell you when it changes (you
can use a timer to keep updated of changes if needed).
type
TfrmVoiceCommandAutomation = class(TForm)
...
private
VCmd, VMenu: Variant;
...
end;
...
procedure TfrmVoiceCommandAutomation.FormCreate(Sender: TObject);
begin
SendMessage(lstProgress.Handle, LB_SETHORIZONTALEXTENT, Width, 0);
VCmd := CreateOleObject('Speech.VoiceCommand');
chkListening.Checked := VCmd.Awake;
VCmd.Register('');
VCmd.Callback := 'VoiceCommandAutoVar.ListenCallback';
CreateCommandMenu;
end;
procedure TfrmVoiceCommandAutomation.chkAwakeClick(Sender: TObject);
begin
VCmd.Awake := chkAwake.Checked;
end;
The next step is to register for use of Voice Commands on the default site.
Whilst this example uses a callback, you don't strictly need to in order to
respond to voice commands. Instead you can regularly check the VCmd.CommandSpoken
property. If a command has been recognised this will give you the command ID,
otherwise it returns 0.
In this case a callback object is set up by assigning the relevant ProgID to
the Callback property. The Automation
object chosen as the callback will receive two notifications and so must implement
two methods declared as follows:
procedure CommandRecognize(const sCommand: WideString; ID: Integer); safecall;
procedure CommandOther(const sCommand, sApp, sState: WideString); safecall;
The important method here is CommandRecognize,
which is triggered when one of this application's commands is registered.
Back to the OnCreate event handler
now; this goes on to call a helper routine, CreateCommandMenu,
to set up the command menu. You will see that the Automation interface makes
it much simpler to add commands than with the COM interface; just pass the information
directly to a single method (you cannot request command verification through
the Automation interface). You are able to set up list commands using the ListSet
method but the callback doesn't give you any specific information about them.
CreateCommandMenu creates a
temporary Voice Menu and then adds in commands corresponding to each menu item
on the main form's main menu. It is careful to recurse down menus and submenus
and only adds commands for proper menu items (it ignores separators and submenus),
however it pays no attention to whether menus are enabled or not.
For a menu item such as Help | About...,
the command string in the Voice Menu is set to Help About. This means
any menu can be invoked by reading out the path through the menu hierarchy necessary
to reach it.
procedure TfrmVoiceCommandAutomation.CreateCommandMenu;
procedure AddMenuCommands(Item: TMenuItem; const ParentPath: String);
var
I: Integer;
Path: String;
begin
Path := ParentPath + StripHotKey(Item.Caption);
//Recurse through subitems, if any
if (Item.Count > 0) then
for I := 0 to Item.Count - 1 do
AddMenuCommands(Item.Items[I], Path + #32)
//Otherwise add this item, if appropriate
else
if (Item.Caption <> '') and (Item.Caption <> '-') then
VMenu.Add(Item.Command, Path, ' ', Item.Hint);
end;
begin
//Must pass a non-blank strings for all WideString parameters
VMenu := VCmd.MenuCreate(ExtractFileName(Application.ExeName),
'Main Menu', LANG_ENGLISH, ' ', vcmdmc_CREATE_TEMP);
AddMenuCommands(Menu.Items, '');
VMenu.hWndMenu := Handle;
VMenu.Active := True;
end;
These Automation methods are very sensitive to blank strings.
The menu is created with a LANG_ENGLISH
parameter to specify the language and the following parameter is an optional
string you can use to specify the dialect. If you pass a blank string the Voice
Command object will throw an exception so it is important to pass non-blank
strings for these string parameters. The same can be seen with the call to add
a command to the menu; the category parameter is omitted but a non-empty string
is passed to keep things working smoothly.
The command ID for each Voice Menu command is extracted from the menu item's
Command property, which is something
we can use to trigger the menu item via a message if the command is recognised.
The description is taken from the menu item's Hint
property.
Once all menu items have had commands added for them the menu is told to restrict
itself to the current form (in other words it is only active if the form has
focus) and then activated.
The important code now is in the callback object's CommandRecognize
method. As you can see, if a command is recognised a message is sent to the
form to emulate a menu selection (as well as log the command in the message
log).
procedure TListenCallback.CommandRecognize(const sCommand: WideString;
ID: Integer);
begin
frmVoiceCommandAutomation.Log('Our command: %s, id: %d', [sCommand, ID]);
frmVoiceCommandAutomation.Perform(WM_COMMAND, ID, 0)
end;
The menu items on this form allow the user to close the application, minimise,
maximise and restore the form, clear the message log, change the form colour
and invoke an About dialog. You can check the project for their implementations
but the following screenshot shows the application after a few commands have
been spoken.

The last job for the OnCreate
handler is to use the Voice Command Object's Awake
property to set up the checkbox that tells the user if speech recognition is
enabled or not. The checkbox OnClick
event handler toggles this property for full user control.
Early Bound Automation
To use Automation against the Voice Command Object through interfaces (early
bound) requires you to import its type library, vcauto.tlb in the Windows speech
directory. This generates a type library import unit called VCmdAuto_TLB.pas.
Ready made packages for Delphi 5 and Delphi 6 containing the type library import
unit can be found in appropriately named subdirectories under SAPI 4 in the
accompanying files.
You access the Automation object using the ClassID CLASS_VCmdAuto_
and the implemented interface is IVCmdAuto.
Both the ClassID and the interface are defined in the type library import unit.
Much as with the COM APIs, commands are set up using a menu and the IVMenuAuto
interface supports this.
A sample project called VoiceCommandAuto.dpr
can be found in the Automation directory. The logic is much the same as for
the late bound version, other than the types used to access the Voice Command
and Voice Menu objects.
uses
VCmdAuto_TLB, ...
type
TfrmVoiceCommandAutomation = class(TForm)
...
private
VCmd: IVCmdAuto;
VMenu: IVMenuAuto;
...
end;
...
procedure TfrmVoiceCommandAutomation.FormCreate(Sender: TObject);
begin
SendMessage(lstProgress.Handle, LB_SETHORIZONTALEXTENT, Width, 0);
//VCmd := CreateOleObject('Speech.VoiceCommand') as IVCmdAuto;
//VCmd := CoVCmdAuto_.Create;
VCmd := CreateComObject(CLASS_VCmdAuto_) as IVCmdAuto;
VCmd.Register('');
VCmd.Callback := 'VoiceCommandAuto.ListenCallback';
CreateCommandMenu;
chkAwake.Checked := VCmd.Awake;
end;
procedure TfrmVoiceCommandAutomation.chkAwakeClick(Sender: TObject);
begin
VCmd.Set_Awake(chkAwake.Checked);
end;
As well as using CreateComObject
and passing the ClassID (which requires you to query for the appropriate interface)
the code above shows two alternatives. You can use the helper class, CoVCmdAuto,
defined in the type library import unit. This does exactly the same as the code
that is being used (but involves less typing). Alternatively you can call CreateOleObject,
passing the ProgID, and query for the interface. Other than that, the rest of
the code is the same as with the late bound Automation
example.
ActiveX
When SAPI 4 is installed it does a two step installation. First of all it installs
all the normal COM/Automation support along with the help files and so on. When
that's all done it installs the ActiveX controls that can simplify the process
of building SAPI applications. The following sections look at how these ActiveX
controls can be used.
Ready made packages for Delphi 5 and Delphi 6 containing the ActiveX units
can be found in appropriately named subdirectories under SAPI 4 in the accompanying
files.
TextToSpeech Control
The Microsoft TextToSpeech control is an ActiveX that wraps up the high level
Voice Text API (and does more besides as we shall see). To use it you must first
import the ActiveX into Delphi with Component
| Import ActiveX... This presents you with a list of all the registered
ActiveX controls on your system. The one you are looking for is described as
Microsoft Voice Text (Version 1.0).

Pressing Install... will take
you through the process of adding the generated type library import unit (HTTSLib_TLB.pas)
to a package and having it compiled and installed. The import unit contains
the Object Pascal component wrapper for the ActiveX, which is called TTextToSpeech
and this component will by default be installed on the ActiveX page of
the Component Palette.
The ActiveX is implemented in Vtext.dll in the Windows speech directory (whose
version information describes it as the High-Level Text To Speech Module)
and the primary interface implemented is ITextToSpeech.
You can programmatically work with this ActiveX control using the ProgID TextToSpeech.TextToSpeech
or the ClassID CLASS_TextToSpeech
from the HTTSLib_TLB unit. The Windows registry describes this class as the
TextToSpeech Class.
Normally you will want to simply place the ActiveX on a form for use and that
is what the sample project VoiceTextControl.dpr
in the ActiveX directory does. The pleasant surprise you get when you place
the ActiveX component on the form is that it shows up as a colourful mouth.

When the ActiveX is asked to speak, the mouth animates
in sync with the spoken phonemes. The effect is rather difficult to get across
through the written word and in screenshots so I encourage you to try using
this ActiveX control. It's pretty cool (and saves you the trouble of working
out how to do it yourself)!
In this project the play, pause and stop buttons to much the same as before,
although the methods they call have slightly different names. Just as with the
Automation interface, the ActiveX surfaces parts of the IVoiceText
and IVTxtAttributes interfaces
from the Voice Text API and also the IVTxtDialogs
interface (methods are exposed to invoke the four dialogs).
procedure TfrmTextToSpeechControl.btnPlayClick(Sender: TObject);
begin
if not BeenPaused then
TextToSpeech.Speak(memText.Text)
else
begin
TextToSpeech.Resume;
BeenPaused := False
end
end;
procedure TfrmTextToSpeechControl.btnPauseClick(Sender: TObject);
begin
if Bool(TextToSpeech.IsSpeaking) then
begin
TextToSpeech.Pause;
BeenPaused := True
end
end;
procedure TfrmTextToSpeechControl.btnStopClick(Sender: TObject);
begin
TextToSpeech.StopSpeaking;
end;
procedure TfrmTextToSpeechControl.btnAboutClick(Sender: TObject);
begin
TextToSpeech.AboutDlg(Handle, '');
end;
procedure TfrmTextToSpeechControl.btnGeneralClick(Sender: TObject);
begin
TextToSpeech.GeneralDlg(Handle, '');
end;
procedure TfrmTextToSpeechControl.btnLexiconClick(Sender: TObject);
begin
TextToSpeech.LexiconDlg(Handle, '');
end;
procedure TfrmTextToSpeechControl.btnTranslateClick(Sender: TObject);
begin
TextToSpeech.TranslateDlg(Handle, '');
end;
The ActiveX also delivers the same notifications as on offer in the IVTxtNotifySink
interface, although responding to them is now a no-brainer thanks to them being
exposed as normal Delphi events.
procedure TfrmTextToSpeechControl.TextToSpeechAttribChanged(Sender: TObject;
attrib: Integer);
var
S: String;
begin
case attrib of
TTSNSAC_REALTIME : S := 'Realtime';
TTSNSAC_PITCH : S := 'Pitch';
TTSNSAC_SPEED : S := 'Speed';
TTSNSAC_VOLUME : S := 'Volume';
end;
Log('OnAttribChanged: %s changed', [S]);
end;
procedure TfrmTextToSpeechControl.TextToSpeechSpeak(Sender: TObject; const Text,
App: WideString; thetype: Integer);
begin
Log('OnSpeak');
memEnginePhonemes.Clear
end;
procedure TfrmTextToSpeechControl.TextToSpeechSpeakingStarted(Sender: TObject);
begin
Log('OnSpeakingStarted');
end;
procedure TfrmTextToSpeechControl.TextToSpeechSpeakingDone(Sender: TObject);
begin
Log('OnSpeakingDone')
end;
procedure TfrmTextToSpeechControl.TextToSpeechVisual(Sender: TObject;
Phoneme, EnginePhoneme: Smallint; hints: Integer; MouthHeight,
bMouthWidth, bMouthUpturn, bJawOpen, TeethUpperVisible,
TeethLowerVisible, TonguePosn, LipTension: Smallint);
var
Hint: String;
begin
Hint := '';
if hints <> 0 then
begin
if hints and TTSNSHINT_QUESTION <> 0 then
Hint := 'Question ';
if hints and TTSNSHINT_STATEMENT <> 0 then
Hint := Hint + 'Statement ';
if hints and TTSNSHINT_COMMAND <> 0 then
Hint := Hint + 'Command ';
if hints and TTSNSHINT_EXCLAMATION <> 0 then
Hint := Hint + 'Exclamation ';
if hints and TTSNSHINT_EMPHASIS <> 0 then
Hint := Hint + 'Emphasis';
end
else
Hint := 'none';
Log('OnVisual: hint = %s', [Hint]);
if Char(EnginePhoneme) <> #32 then
memEnginePhonemes.Text := memEnginePhonemes.Text + Char(EnginePhoneme)
end;
Note that the OnVisual event
offers plenty of information that is already used by the ActiveX mouth animation.
The only use I see for much of this is if you were to hide the ActiveX in order
to perform your own animation.
Voice Commands Control
The Microsoft Voice Commands control is an ActiveX that wraps up the high level
Voice Command API. To use it you must first import the ActiveX into Delphi;
you will find it described as Microsoft Voice Commands (Version 1.0).
This will generate and install a type library import unit called HSRLib_TLB.pas.
The import unit contains the ActiveX component wrapper class called TVcommand.
The ActiveX is implemented in Xcommand.dll in the Windows speech directory
(whose version information describes it as the High-Level Speech Recognition
Module) and the primary interface implemented is IVcommand.
However it also surfaces parts of other interfaces such as IVCmdAttributes
and IVCmdDialogs.
You can programmatically work with this ActiveX control using the ProgID Vcommand.Vcommand
or the ClassID CLASS_Vcommand
from the HSRLib_TLB unit. The Windows registry describes this class as the VCommand
Class.
Alternatively (and more typically) you can simply drop the ActiveX component
on a form. This is done in the sample project
VoiceCommandsControl.dpr project in the ActiveX directory. This project does
much the same job as the Voice Command Automation examples (although we have
access to both the enabled and awake states with the ActiveX):
procedure TfrmVoiceCommandsControl.FormCreate(Sender: TObject);
begin
Vcommand.Initialized := Integer(True);
chkEnabled.Checked := Bool(Vcommand.Enabled);
chkAwake.Checked := Bool(Vcommand.AwakeState);
CreateCommandMenu;
end;
procedure TfrmVoiceCommandsControl.chkEnabledClick(Sender: TObject);
begin
Vcommand.Enabled := Integer(chkEnabled.Checked);
end;
procedure TfrmVoiceCommandsControl.chkAwakeClick(Sender: TObject);
begin
Vcommand.AwakeState := Integer(chkAwake.Checked);
end;
procedure TfrmVoiceCommandsControl.CreateCommandMenu;
procedure AddMenuCommands(Item: TMenuItem; const ParentPath: String);
var
I: Integer;
Path: String;
begin
Path := ParentPath + StripHotKey(Item.Caption);
//Recurse through subitems, if any
if (Item.Count > 0) then
for I := 0 to Item.Count - 1 do
AddMenuCommands(Item.Items[I], Path + #32)
//Otherwise add this item, if appropriate
else
if (Item.Caption <> '') and (Item.Caption <> '-') then
//Must pass a non-blank strings for all WideString parameters
Vcommand.AddCommand(CmdMenu, Item.Command, Path, Item.Hint,
'Menu', VCMDCMD_CANTRENAME, ' ');
end;
begin
CmdMenu := VCommand.MenuCreate[ExtractFileName(Application.ExeName),
'Main', VCMDMC_CREATE_TEMP];
AddMenuCommands(Menu.Items, '');
Vcommand.Activate(CmdMenu);
end;
The call to Vcommand.AddCommand
passes a command flag as the penultimate parameter. The VCMDCMD_CANTRENAME
flag means that this command cannot be renamed by applications that let users
customise Voice Commands (such as Microsoft Voice). Since none of these commands
require verification, the VCMDCMD_VERIFY
flag is not passed.
You can set up list commands with the ActiveX control but
when a list command is interpreted, the OnCommandRecognize
will not tell you which list item was said (the ListValues
parameter will be blank). However, the NumLists
parameter is set up correctly to tell you the number of bytes taken up by the
list item. Since the Voice Commands Control implements the Unicode notification
interface you must divide this by SizeOf(WideChar)
and then subtract one (the null terminator) to get the number of characters.
If your list command is simple enough this may be enough for you to work out
the list item that was spoken.
The ActiveX control sends more notifications so we have the chance to reinstate
the VU meter this time.
procedure TfrmVoiceCommandsControl.VcommandVUMeter(Sender: TObject; Level: Integer);
begin
ProgressBar.Position := Level;
lblVU.Caption := IntToStr(Level);
end;
There appears to be no way to specify whether the Voice
Menus should be local or global. Since this Voice Menu should be local, the
program itself disables the menu when the application loses focus and re-enables
it when the application gains focus. To do this it uses a TApplicationEvents
component (a component that surfaces the Application
object's events to the Object Inspector) and its OnActivate
and OnDeactivate event handlers.
procedure TfrmVoiceCommandsControl.ApplicationEvents1Activate(
Sender: TObject);
begin
Vcommand.Activate(CmdMenu);
end;
procedure TfrmVoiceCommandsControl.ApplicationEvents1Deactivate(
Sender: TObject);
begin
Vcommand.Deactivate(CmdMenu);
end;

Voice Dictation Control
The Microsoft Voice Dictation control is an ActiveX that wraps up the high
level Voice Dictation API. To use it you must first import the ActiveX into
Delphi; you will find it described as Microsoft Voice Dictation (Version
1.0).
This will generate and install a type library import unit called DICTLib_TLB.pas.
The import unit contains the ActiveX component wrapper class called TVdict.
The ActiveX is implemented in Vdict.dll in the Windows speech directory (whose
version information describes it as the Voice Dictation Module) and the
primary interface implemented is IVdict.
However it also surfaces parts of other interfaces such as IVDctAttributes
and IVDctDialogs.
You can programmatically work with this ActiveX control using the ProgID Vdict.Vdict
or the ClassID CLASS_Vdict from
the DICTLib_TLB unit. The Windows registry describes this class as the Vdict
Class.
Alternatively (and more typically) you can simply drop the ActiveX component
on a form. A sample project using this control can be found in the accompanying
files, VoiceDictationControl.dpr in the ActiveX directory. It offers the
same functionality as the Voice Dictation API sample from
earlier.
Until I had used the Voice Dictation API successfully this
control caused Delphi to hang as soon as the control was dropped on a form.
It starts up a copy of the Microsoft Voice Commands Automation server and then
both that and the Delphi IDE start consuming apparently endless chunks of virtual
memory. I never got to the bottom of why this was, but it no longer happens.
The OnPhraseStart
event is not triggered with this control (as with the Voice
Command API).
Speech Recognition Troubleshooting
If you get issues of SR stopping (or not starting) unexpectedly, or other weird
SR issues, check your recording settings have the microphone enabled.
- Double-click the Volume icon in your Task Bar's System Tray. If no Volume
icon is present, choose Start | Programs
| Accessories | Entertainment | Volume Control.
- If you see a Microphone column,
ensure it has its Mute checkbox
checked
- Choose Options | Properties,
click Recording, ensure the
Microphone option is checked
and press OK.
- Now ensure the Microphone
column has its Select checkbox
enabled, if it has one, or that its Mute
checkbox is unchecked, if it has one.
SAPI 4 Deployment
When distributing SAPI 4 applications you will need to supply the redistributable
components (available as spchapi.exe from http://www.microsoft.com/speech/download/old).
It would be advisable to also deploy the Speech Control Panel application (available
as spchcpl.exe from http://www.microsoft.com/msagent/downloads.htm),
however this Control Panel applet will not install on any version of Windows
later than Windows 2000.

The Microsoft SAPI 4 compliant TTS engine can be downloaded from various sites
(although not Microsoft's), such as http://misterhouse.net:81/public/speech
or http://www.cs.cofc.edu/~manaris/SUITEKeys.
As well as the Microsoft TTS engine, you can also download additional TTS engines
from Lernout & Hauspie (which include one that uses a British English voice)
from http://www.microsoft.com/msagent/downloads.htm.
If you plan to use any of these engines from applications running
under user accounts without user privileges, you need to do some registry tweaking,
described in http://www.microsoft.com/msagent/detail/tts3000deploy.htm.
You can download the Microsoft Speech Recognition engine for use with SAPI
4 from http://www.microsoft.com/msagent/downloads.htm.
References/Further Reading
The following is a list of useful articles and papers that I found on SAPI 4
development during my research on this subject.
- Using Microsoft OLE Automation Servers to Develop
Solutions by Ken Lassesen, MSDN Office Development (General) Technical
Articles, October 1995.
This shows VB Automation against the Voice Text Object and Voice Command Object
(no callbacks used).
- A
High-Level Look at Text-to-Speech via the Microsoft Voice Text Object
by Robert Coleridge, MSDN Windows User Interface Technical Articles,
October 1995.
Shows a VB example of using Automation against Voice Text Object.
- An
Overview of the Microsoft Speech API by Mike Rozak, November 1998.
Looks briefly at the high level and low level SR and TTS interfaces in the
SAPI 4 SDK.
- Talk
to Your Computer and Have It Answer Back with the Microsoft Speech API
by Mike Rozak, Microsoft Systems Journal, January 1996.
Uses the Voice Command and Voice Text APIs to implement a clock that tells
the time when asked.
- Making
Delphi Talk: Using Speech Technology with your Delphi Apps by Glenn
Stephens, Unofficial Newsletter
of Delphi Users, January 1999.
Uses late-bound Automation against the Voice Text Object available through
the Speech.VoiceText ProgID, including setting up the callback object.
- Making
Delphi Listen by Glenn Stephens, Unofficial
Newsletter of Delphi Users, January 1999.
Uses early-bound Automation against the Voice Command Object to implement
command and control.
About Brian Long
Brian Long used to work at Borland
UK, performing a number of duties including Technical Support on all the programming
tools. Since leaving in 1995, Brian has been providing training and consultancy
on Borland's RAD products ever since, and is now moving into the .NET world.
Besides authoring a
Borland Pascal problem-solving book published in 1994, Brian is a regular
columnist in The
Delphi Magazine and has had numerous articles published in Developer's Review,
Computing, Delphi
Developer's Journal and EXE Magazine. He was nominated for the Spirit
of Delphi 2000 award and was voted Best Speaker at Borland's BorCon
2002 conference in Anaheim, California by the conference delegates.
There are a growing number of conference papers and articles available on Brian's
Web site, so feel free to have a browse.
In his spare time (and waiting for his C++ programs to compile) Brian has learnt
the art of juggling and
making inflatable origami
paper frogs.
Go to the speech capabilities overview
Go back to the top of this SAPI 4 High Level Interfaces article
Go to the SAPI 4 Low Level Interfaces
article
Go to the SAPI 5.1 article
Connect with Us