Empirical Answer: 256
When I put more than 256 rules in a grammar and try to load it, I get a System.OutOfMemoryException: Insufficient memory to continue the execution of the program. at SpeechLib.ISpeechRecoGrammar.CmdSetRuleIdState(Int32 RuleId, SpeechRuleState State)at SwitchSapi.Form1.OnLoad(EventArgs e)
(Sapi 5.1)
The good news is you can make a larger grammar, you just can't have more rules than that that are top level.
Thursday, May 24, 2007
Blah Blah Blah...
In the speech properties panel (Start - control panel - speech - speech recognition tab - language) you may have the options Microsoft English Recognizer v5.1 and the SAPI developer sample engine.
I use the Microsoft English Recognizer v5.1, but occasionally it gets accidentally switched. I think it has caused the following behaviour:
* actually outputting "blah blah blah", much to my confusion (because it wasn't in the grammar)
* Suddenly recognizing much faster and more poorly
If you are experiencing any of these things, check your speech properties.
I use the Microsoft English Recognizer v5.1, but occasionally it gets accidentally switched. I think it has caused the following behaviour:
* actually outputting "blah blah blah", much to my confusion (because it wasn't in the grammar)
* Suddenly recognizing much faster and more poorly
If you are experiencing any of these things, check your speech properties.
Wednesday, February 21, 2007
Using the SpFileStreamClass Seek Method
After you have opened your file for reading
the seek method can be used to
1. return the current audio stream position in bytes
2. move the current audio stream position backward or forward in bytes
SpeechStreamSeekPositionType values include: SSSPTRelativeToEnd, SSSPTRelativeToStart, SSSPTRelativeToCurrentPosition
ISpeechFileStream sfs = new SpFileStreamClass();
sfs.Open(@"C:\dev\Speech Data\sapiWav\wavfile.wav", SpeechStreamFileMode.SSFMOpenForRead, false);
the seek method can be used to
1. return the current audio stream position in bytes
decimal position = (decimal)sfs.Seek(0, SpeechStreamSeekPositionType.SSSPTRelativeToCurrentPosition)
2. move the current audio stream position backward or forward in bytes
//I have a stereo wav file at 16000 Hz, so I use 64000
const int stereoWavBitsPerSecond = 64000;
const int monoWavBitsPerSecond = 32000;
//move forward 4 seconds for a stereo file
sfs.Seek(4 * stereoWavBitsPerSecond , SpeechStreamSeekPositionType.SSSPTRelativeToCurrentPosition)
SpeechStreamSeekPositionType values include: SSSPTRelativeToEnd, SSSPTRelativeToStart, SSSPTRelativeToCurrentPosition
Monday, February 19, 2007
C# starter code for SAPI 5.3 speech recognition from microphone under WPF
So despite the indications here and here to the contrary, it appears that it is possible to successfully create a SAPI 5.3 application to run on Windows XP. Here are the steps for getting a bare-bones C# program (using .NET 3.0 Windows Presentation Foundation to run as a Window application) up and running to recognize speech input from the microphone using SAPI 5.3.
Caveat: When using SAPI 5.3 on Windows XP in a WPF Window application, it appears that you cannot use the more complex SpeechRecognitionEngine class, but have to resort to using the SpeechRecognizer class. One of the limitations that this entails is that you cannot specify an audio file as an input into the recognizer.
SAPI 5.3 uses W3C's Speech Recognition Grammar Specification (SRGS) Version 1.0 for its grammar files (see here for the grammar specification). To use a command-and-control grammar, set commandAndControl to true, and save the following file as grammar.xml in the same directory as the executable:
Caveat: When using SAPI 5.3 on Windows XP in a WPF Window application, it appears that you cannot use the more complex SpeechRecognitionEngine class, but have to resort to using the SpeechRecognizer class. One of the limitations that this entails is that you cannot specify an audio file as an input into the recognizer.
- Install .NET Framework 3.0 from here.
- (Optional) If you want the additional tools for Visual Studio 2005 to facilitate development using .NET Framework 3.0, install the following two components:
- Microsoft® Windows® Software Development Kit for Windows Vista™ and .NET Framework 3.0 Runtime Components (only the Documentation is needed for installing the next component)
- Visual Studio 2005 extensions for .NET Framework 3.0 (WCF & WPF), November 2006 CTP (provides support for visually editing XAML files)
- Create a new C# Windows Application (WPF) in Visual Studio 2005.
- In the Solution Explorer, right click on References under your project node, and select Add Reference....
- In the .NET tab, select System.Speech (verify it's version 3.0.0.0), and click OK.
- Double click Window1.xaml (if you didn't do Step 2 above, then you'll have to right click on it, select Open with..., and choose XML editor), and add the following snippet inside the <Grid> </Grid> element:
<ScrollViewer>
<TextBox x:Name="result_textBox" TextWrapping="WrapWithOverflow"
ScrollViewer.CanContentScroll="True"></TextBox>
</ScrollViewer> - Change your Window1.xaml.cs code to the following:
using System;Note that the dictation grammar and the command-and-control grammar can be both active at the same time.
using System.Speech;
using System.Speech.Recognition;
namespace SimpleSAPI_5_3
{
public partial class Window1 : System.Windows.Window
{
// whether to use the command and control grammar or the dictation grammar
bool commandAndControl = false;
SpeechRecognizer _speechRecognizer;
public Window1()
{
InitializeComponent();
// set up the recognizer
_speechRecognizer = new SpeechRecognizer();
_speechRecognizer.Enabled = false;
_speechRecognizer.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>(_speechRecognizer_SpeechRecognized);
// set up the dictation grammar
DictationGrammar dictationGrammar = new DictationGrammar();
dictationGrammar.Name = "dictation";
dictationGrammar.Enabled = true;
// set up the command and control grammar
Grammar commandGrammar = new Grammar(@"grammar.xml");
commandGrammar.Name = "main command grammar";
commandGrammar.Enabled = true;
// activate one of the grammars if we don't want both at the same time
if (commandAndControl)
_speechRecognizer.LoadGrammar(commandGrammar);
else
_speechRecognizer.LoadGrammar(dictationGrammar);
}
protected override void OnClosing(System.ComponentModel.CancelEventArgs e)
{
_speechRecognizer.UnloadAllGrammars();
_speechRecognizer.Dispose();
}
void _speechRecognizer_SpeechRecognizediobject sender, SpeechRecognizedEventArgs e)
{
result_textBox.AppendText(e.Result.Text + "\n");
}
}
}
SAPI 5.3 uses W3C's Speech Recognition Grammar Specification (SRGS) Version 1.0 for its grammar files (see here for the grammar specification). To use a command-and-control grammar, set commandAndControl to true, and save the following file as grammar.xml in the same directory as the executable:
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN"
"http://www.w3.org/TR/speech-grammar/grammar.dtd">
<!-- the default grammar language is US English -->
<grammar xmlns="http://www.w3.org/2001/06/grammar"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/06/grammar
http://www.w3.org/TR/speech-grammar/grammar.xsd"
xml:lang="en-US" version="1.0" root="command">
<rule id="command" scope="public">
<one-of>
<item>selected</item>
<item>interface</item>
<item>default</item>
</one-of>
</rule>
</grammar>
Sunday, February 18, 2007
Working with grammars and recognition contexts
Here are several other places in the SAPI 5.1 SDK documentation that clarifies the relationship between grammars and recognition contexts and their appropriate usage:
Automation -> Sp[InProc/Shared]RecoContext
Automation -> Sp[InProc/Shared]RecoContext
- "An application may have several recognition contexts open at the same time, each controlling a different part of the application."
- "Applications may have more than one recognition context. In fact, it is recommended to have as many as makes sense."
- "A recognizer may have more than one grammar associated with it although they are usually limited to one each of two types: dictation and context free grammar (CFG)."
- "Each ISpRecoGrammar object can contain both a context-free grammar (CFG) and a dictation grammar simultaneously."
Thursday, February 15, 2007
C# starter code for SAPI 5.1 speech recognition from audio file
Here's how you would change the code in the previous post to recognize speech input from an audio file instead of the microphone.
First, create a wav file with some utterance using an audio recording software. Most formats should be supported, but a good setting would be 44,100Hz 16-bit mono wav file.
Simply replace the part of the code between /****** BEGIN: set up recognition context *****/ and /****** END: set up recognition context *****/ with the following snippet:
Caveat: Recognizing from audio file only works with in-proc recognizer.
First, create a wav file with some utterance using an audio recording software. Most formats should be supported, but a good setting would be 44,100Hz 16-bit mono wav file.
Simply replace the part of the code between /****** BEGIN: set up recognition context *****/ and /****** END: set up recognition context *****/ with the following snippet:
Caveat: Recognizing from audio file only works with in-proc recognizer.
/****** BEGIN: set up recognition context *****/
result_textBox.AppendText("File mode\n");
// create an audio file stream
ISpeechFileStream sfs = new SpFileStreamClass();
sfs.Open(@"recording.wav", SpeechStreamFileMode.SSFMOpenForRead, false);
// create the recognition context
recoContext = new SpeechLib.SpInProcRecoContext();
recoContext.Recognizer.AudioInputStream = sfs;
((SpInProcRecoContext)recoContext).Recognition +=
new _ISpeechRecoContextEvents_RecognitionEventHandler(RecoContext_Recognition);
/****** END: set up recognition context *****/
Wednesday, February 14, 2007
C# starter code for SAPI 5.1 speech recognition from microphone
Here are the steps for getting a bare-bones C# program up and running to recognize speech input from the microphone using SAPI 5.1.
Caveat: SAPI 5.1 does not work under a C# console application, due to the Automation API's dependence on Windows' Message Pump, so you have to create a Form-based application.
To use a command-and-control grammar, set commandAndControl to true, and save the following file as grammar.xml in the same directory as the executable:
Caveat: SAPI 5.1 does not work under a C# console application, due to the Automation API's dependence on Windows' Message Pump, so you have to create a Form-based application.
- Create a new C# Windows Application in Visual Studio 2005.
- In the Solution Explorer, right click on References under your project node, and select Add Reference....
- Click on the COM tab, select Microsoft Speech Object Library (verify it's version 5.0), and click OK.
- Double click Form1.cs, and add a TextBox control, set its Multiline behavior property to True, change the Name design property to "result_textBox", and resize the control on the Form to an appropriate size (this will be where the recognized text will be output).
- Change your Form1.cs code to the following:
using System.Windows.Forms;Note that the dictation grammar and the command-and-control grammar can be both active at the same time within the same speech recognition context.
using SpeechLib;
namespace SimpleSAPI
{
public partial class Form1 : Form
{
// whether to use the command and control grammar or the dictation grammar
bool commandAndControl = false;
ISpeechRecoContext recoContext;
ISpeechRecoGrammar grammar;
public Form1()
{
InitializeComponent();
}
protected override void OnLoad(System.EventArgs e)
{
/****** BEGIN: set up recognition context *****/
result_textBox.AppendText("Dictation mode\n");
// create the recognition context
recoContext = new SpeechLib.SpSharedRecoContext();
((SpSharedRecoContext)recoContext).Recognition +=
new _ISpeechRecoContextEvents_RecognitionEventHandler(RecoContext_Recognition);
/****** END: set up recognition context *****/
// set up the grammar
grammar = recoContext.CreateGrammar(0);
// set up the dictation grammar
grammar.DictationLoad("", SpeechLoadOption.SLOStatic);
grammar.DictationSetState(SpeechRuleState.SGDSInactive);
// load the command and control grammar
grammar.CmdLoadFromFile(@"grammar.xml", SpeechLoadOption.SLOStatic);
grammar.CmdSetRuleIdState(0, SpeechRuleState.SGDSInactive);
// activate one of the grammars if we don't want both at the same time
if (commandAndControl)
grammar.CmdSetRuleIdState(0, SpeechRuleState.SGDSActive);
else
grammar.DictationSetState(SpeechRuleState.SGDSActive);
}
protected override void OnClosing(System.ComponentModel.CancelEventArgs e)
{
recoContext.State = SpeechRecoContextState.SRCS_Disabled;
}
void RecoContext_Recognition(int StreamNumber, object StreamPosition,
SpeechRecognitionType RecognitionType, ISpeechRecoResult Result)
{
result_textBox.AppendText(Result.PhraseInfo.GetText(0, -1, true) + "\n");
}
}
}
To use a command-and-control grammar, set commandAndControl to true, and save the following file as grammar.xml in the same directory as the executable:
<GRAMMAR LANGID="409">Note that the LANGID should be set to 409, it appears that is the ID for English.
<RULE NAME="toplevel" TOPLEVEL="ACTIVE">
<L>
<P>selected</P>
<P>interface</P>
<P>default</P>
</L>
</RULE>
</GRAMMAR>
Tuesday, February 13, 2007
Understanding SAPI 5.1
The SAPI SDK 5.1 comes with a documentation file (or you can download just the documentation from here) in the form of Windows Help, but it's not quite easily navigable.
Some good places to start in the Contents tab (after opening Start -> All Programs -> Microsoft Speech SDK 5.1 -> Microsoft Speech SDK 5.1 Help) are:
Some good places to start in the Contents tab (after opening Start -> All Programs -> Microsoft Speech SDK 5.1 -> Microsoft Speech SDK 5.1 Help) are:
- Automation -> Sp[Shared/InProc]Recognizer:
Description of the interface to the underlying speech recognition engine and their different types (shared versus in-process). - Automation -> Sp[Shared/InProc]RecoContext:
A nice description of what "Recognition Contexts" are, and how one should create as many of them as appropriate for the application. - Automation -> ISpeechPhraseRule -> Code Example:
Lists the properties that can be queried on a phrase rule that was recognized, including rule name and confidence values. - Automation -> ISpeechPhraseElement -> Code Example:
Lists the properties that can be queried on a phrase element, including confidence values. - Automation -> ISpeechPhraseProperty -> Confidence:
Example of how the confidence values can be extracted along with its corresponding property name. - Automation -> ISpeechAlternate:
A way to get at a list of alternate phrase candidates for dictation mode recognition. - Automation -> Sp[Shared/InProc]RecoContext (Events):
The list of events that the recognition context can receive, and thus the clients can listen for. - Application-Level Interfaces -> Grammar Compiler Interfaces -> Text Grammar Format:
Description of the context-free grammar format used for command and control (as opposed to dictation) recognition. - White Papers -> SAPI 5.0 SR Properties White Paper:
The list of recognition engine properties that can be queried and set using the SetPropertyNumber method of Sp[Shared/InProc]Recognizer class, including the confidence thresholds.
Tuesday, February 6, 2007
Speech Recognition Profile Manager Tool
Once you have gone through the trouble to train the speech profile, (control panel -> speech -> train profile) you can save and load it on another machine using the Speech Recognition Profile Manager Tool. It is also useful if you are running user studies and want to make sure to backup your training sets for analysis.
http://www.microsoft.com/downloads/details.aspx?FamilyID=cd72250f-2e02-430e-8f99-e1acae760564&DisplayLang=en
http://www.microsoft.com/downloads/details.aspx?FamilyID=cd72250f-2e02-430e-8f99-e1acae760564&DisplayLang=en
Where to seek help
A good place to look for help from the developer community is the microsoft.public.speech_tech.sdk newsgroup:
http://groups.google.com/group/microsoft.public.speech_tech.sdk/topics
http://groups.google.com/group/microsoft.public.speech_tech.sdk/topics
Subscribe to:
Posts (Atom)