My favorites | Sign in
Project Logo
                
Search
for
Updated Aug 11, 2008 by gundlach
Labels: Featured
Discussion  
Discussion or questions about the speech module.

Discussion

The wiki makes for a pretty simple forum. Feel free to discuss or ask questions below. I'll be happy to help you out.


Comment by drfrog666, Aug 14, 2008

im getting

ImportError?: No module named win32com.client

is there other stuff needs to be installed? speech sdk or com stuff?

Comment by gundlach, Aug 14, 2008

Right on both counts. From the Installation notes on the homepage:

If you don't already have it, you'll also need pywin32 (for Python 2.5 or for Python 2.4) and the Microsoft Speech kit (installer here).

Could you let me know how you found my module and how you went about getting it onto your system, so I can know better how to make those instructions less missable?

Michael

Comment by drfrog666, Aug 14, 2008

winxp

i tink i just read too fast

i found out about it on reddit.com

everything seems good after installing those two modules

thanks

Comment by Da.Dekudude, Aug 17, 2008

Think you could add an optional parameter to speech.input() that allows text, as well as speech? That would be very useful, particularly when it is having trouble recognizing voice.

Comment by gundlach, Aug 18, 2008

Hi,

It might be doable, but I'm not sure -- the builtin raw_input() function blocks the thread while you enter text, so if speech.input() wanted to allow text input as well, I think it would be forced to call raw_input(), thus forcing the user to type something in order to continue. Maybe it could be done by listening for individual keystroke events, or running things on multiple threads, but no guarantees.

As this would also complicate speech.input()'s interface, I'm leaving it out for now. If more people clamor for this, or if someone sent a working patch, I'd be more inclined to support it.

Be aware that speech.input() can be cancelled with Ctrl-C, so if you're having trouble with speech recognition, you could try something like

try:
  answer = speech.input("Say something, or press Ctrl-C if frustrated.")
except KeyboardInterrupt:
  answer = raw_input("OK, just type your answer: ")

Good luck, Michael

Comment by Da.Dekudude, Aug 20, 2008

You have a function that checks for voice input, but does other stuff until that happens, right? What if you made a raw_input, but if voice comes, skip it? Would that be possible?

Comment by gundlach, Aug 21, 2008

Right, that's the "multiple thread" approach I mentioned above -- not sure if it could be made to work, but possible. If you do get it to work, by all means post it here for everyone else's benefit! :)

Comment by Da.Dekudude, Aug 24, 2008

I wouldn't be able to do it-- I'm no good with Python as of yet. :P

Is it possible to have this work even if the Python Window is minimized? Meaning, the window is minimized to the task bar, but if I talk to the mic, it still listens, and acts like a normal Python script? Thanks! I'm coding myself a laptop robot (because I'm kind of lazy, and I'd love to have her help me out :P) and it would be great if I could have the window minimized, and still be able to talk to her.

Comment by gundlach, Aug 25, 2008

Yep, talking to a minimized window works fine.

Comment by arunkakorp, Aug 31, 2008

I have Python 2.5 on Win XP Prof SP 2. I have installed the Microsoft Speech SDK and pywin32 linked in your install page.

When I first tried 'import speech' in IDLE I got: File "C:\Python25\Lib\site-packages\win32com\client\gencache.py", line 554, in AddModuleToCache?

dict = mod.CLSIDToClassMap
AttributeError?: 'module' object has no attribute 'CLSIDToClassMap'

After a little googling, I found a post regarding deleting the 'gen_py' directory in %WINDIR%\Temp. I did that and am now getting this error for which I can't seem to find any solution:

>>> import speech

Traceback (most recent call last):

File "<pyshell#1>", line 1, in <module>
import speech
File "C:\Python25\lib\site-packages\speech-0.5.1-py2.5.egg\speech.py", line 112, in <module>
TypeError?: Error when calling the metaclass bases
cannot create 'NoneType?' instances

Comment by gundlach, Aug 31, 2008

I'm sorry that you're having trouble! I'm using the same setup as you (except XP Home) and can't reproduce this.

A couple of suggestions:

  1. Type import speech in the regular Python shell (Start -> Run -> "cmd" -> "python.exe" or maybe "python25.exe") and see if you still have a problem. If not, IDLE is somehow playing badly with pywin32.
  2. If you feel like doing some digging, get the .tar.gz version of pyspeech (also available on PyPI for download), unzip it into a directory, and try import speech again from within that directory. Aka Start -> Run -> "cmd", then "cd <whatever unzipped directory has the file 'speech.py' in it>", then "python.exe". When you import speech it will use that local speech.py file rather than the installed egg, and your traceback will have more info in it.
  3. If all else fails, ask on the pywin32 mailing list, as this seems to be a problem with pywin32 as opposed to speech.py.

Good luck and please follow up with your findings,

Michael

Comment by arunkakorp, Aug 31, 2008

A quick follow up. The problem turned out to be linked to the first error in my post above. There are two 'gen_py' directories, one in %WINDIR%\Temp, and the other in %PYTHON%\Lib\site-packages\win32com\. Deleting the one in site-packages and re-importing the speech module fixed the problem for me.

Arun

Comment by gundlach, Aug 31, 2008

Thanks, Arun. I should have paid more attention and noticed you mentioning the Temp directory... glad you set things right.

Michael

Comment by grauwelf, Sep 23, 2008

What about non-standart voices? How can I choose the voice?

Grauwelf

Comment by gundlach, Sep 24, 2008

I only support very simple text-to-speech. Check out pyTTS for lots of flexibility in choosing voices. My module is more focused on speech-to-text.

Comment by tang.jiyu, Oct 20, 2008

Hi,

It's very kind of you to provide this speech.py. It's very helpful.

I want to write a application running at background and it could grab user voice input and do speech-to-text work.

But I don't know how to do this? Is it possible write a python app running at background and get user voice input as normal. Do you have any idea about this?

Any hint will be very appreciated. Thanks in advance.

Comment by gundlach, Oct 21, 2008

Hi Jiyu,

I've just responded to your private email as well -- sorry to take so many days to get back to you.

I believe that installing the Microsoft Speech SDK will fix the problem you were having, at which point the example code will work as intended. It is the basic building block for a background python app that gets user voice input and runs a callback function on the text.

I wrote http://musicbutler.googlecode.com as a test application, which runs in the background and controls your stereo in response to speech input. It's still in alpha, but it's a proof-of-concept for you if the example.py doesn't provide enough detail.

Thanks and good luck!

Comment by Da.Dekudude, Jan 02, 2009

I'm not sure if you are still supporting this, but if so, would it be possible for you to write a function that adds words to the dictionary (temporarily?)

For example. I say, "I have a bad case of the heebie-geebies" the system would be like, "what?" and say something completely different. Would it be possible for you to create a function that, should I put the phrase, "heebie-geebies" in it as a parameter (perhaps in an array?) it will recognize it if it can't find a better match?

I recall playing with: http://surguy.net/articles/speechrecognition.xml

And though I couldn't get it to work correctly, there is one snippet that could prove quite useful:

if name=='main':

wordsToAdd = "One", "Two", "Three", "Four" speechReco = SpeechRecognition?(wordsToAdd) while 1:
pythoncom.PumpWaitingMessages?()

Any chance of this happening?

Comment by gundlach, Jan 03, 2009

AFAIK, speech either works in dictation mode, in which it recognizes words based off of an unmodifiable dictionary, or in command mode, in which it tries to match your string of text to what you said.

The snippet you refer to (specifying words, then pumping messages) is done under the hood in speech.py by

listener = speech.listenfor(["one", "two", "three", "four"])

which pumps messages as long as listener.islistening().

So the short answer is, I don't think it's doable. The best you can hope for is specifying specific phrases that you expect it to hear, or else work with the standard dictation dictionary.

Good luck, Michael

Comment by titusz.pan, Feb 03, 2009

is it possible to use an audio file as input for speech recognition?

Comment by gundlach, Feb 05, 2009

titusz.pan, I haven't tried it, but I would love to know if it worked! I'm thinking about making a Skype audio bot, and that would almost certainly involve feeding speech.py the audio from a call as a file.

If you can get it to work, please post here for the good of the community, and I'll roll your work into the module!

Comment by Da.Dekudude, Feb 07, 2009

And there is no way, as far as you know, to add new phrases to the standard dictionary?

Comment by gundlach, Feb 08, 2009

Correct.

Comment by titusz.pan, Feb 10, 2009

here is a hint, that vista speech recognition can be feed with audio files: http://www.mymsspeech.com/microphones/prod_details.asp?prodID=228 see WSRToolkit Feature number 7

Comment by Da.Dekudude, Feb 20, 2009

I appear to be having trouble importing a Python file that has already imported speech.

http://code.google.com/p/pyspeech/issues/detail?id=17

Comment by gundlach, Mar 02, 2009

Thanks for your bug report. See http://code.google.com/p/pyspeech/issues/detail?id=17 for the solution.

Comment by chiragjain1989, Jul 05, 2009

I want to know during writing my python code, how can I know which voices are pre-installed?? Means in my windows Vista, Microsoft Anna comes as default, but if I install espeak for windows, then 3 more voices comes automatically. Now what I want during coding is that, which voice options are available to the user, so that I can show them into my application and user is able to select them directly from my application.

I want something like:

speak = win32com.client.Dispatch('Sapi.SpVoice?')

voices= speak.get_all_voices()

so that voices contains a list of all the voices available currently?? Please note that get_all_voices() is an imaginary function only for sake of clarity, I want something like this function. Is it available or not??

Thanks and regards

Chirag

Comment by gundlach, Jul 10, 2009

Hi Chirag, sorry for the delay in responding. pyspeech does not give you any control or insight over the installed voices; it just uses the one that you have currently selected for the system.

pyspeech is more focused on speech input, where you talk to the computer; it has minimal speech output support just as a convenience. You might check out the pyTTS package for more control over speech output.

Good luck!

Comment by ChrisM6794, Jul 10, 2009

This module is great! Do you know if there's any way to run two instances at one time on separate microphones? I'm thinking of having two people talking on their own USB headsets at once (different voice profiles) and it being able to listen to both of them.

Comment by gundlach, Jul 10, 2009

Man, that is a neat idea. Windows lets you do that -- have one profile trained on a male voice and the other on a female voice, or something? How do you normally tell it which profile to use for which microphone, when interacting directly with Windows?

I haven't heard of that and haven't any idea what the COM interop commands might be that would need to be implemented to get pyspeech to support that. You might ask over at the dragonfly project, as they are doing similar work and may have thought about this feature.

Comment by ChrisM6794, Jul 10, 2009

Windows lets you train separate profiles and select which one is active through the Speech Properties dialog in the Control Panel; but it looks like maybe you can only have one audio device and profile selected at any given time. But, I don't know if that's a limitation of the engine itself or if they just didn't put more options in the dialog.

Comment by gundlach, Jul 10, 2009

If Windows doesn't expose it in the UI, I'm going to guess that you can't do it through COM. But I'd be pleased to be proven wrong if you want to dig through their (poor) COM docs and find a way to do it! :)

Comment by ChrisM6794, Jul 14, 2009

I am going to take a crack at running two audio devices with separate profiles, although my head may explode in the process. I know nothing whatsoever about COM so it will be interesting, but looking at this page suggests that maybe you can set the AudioInput? and the Profile properties... or something:

http://msdn.microsoft.com/en-us/library/ms722071(VS.85).aspx

If I ever get something going, I'll let you know and contribute it back.

Comment by ChrisM6794, Jul 15, 2009

I've got a basic demo up and working! It seemed incredibly complicated at first, but turned out to be pretty convenient:

- for recognizer we use a SAPI.SpInProcRecognizer? instead, which is an instance bound to a specific process instead of one shared between all processes. The interface is the same though, so all your existing code still works.

- We can iterate over the profiles with recognizer.GetProfiles?() and set one for the recognizer.Profile property. If you don't pick a profile, it will default to "Default Speech Profile".

- The same goes for recognizer.GetAudioInputs?() and the recognizer.AudioInput? property. Note that you must specify an audio device here; it won't automatically use the default device like the shared one does.

That's actually all it takes, now you can run one process with a given audio device and profile, and a second process with a second audio and profile. It's not clear to me whether you can run both of them in the same process (probably not) but that shouldn't be a huge problem for me (we can use Popen or multiprocessing to spawn a process for each device).

You can see my code here: http://bitbucket.org/kiv/speech/ Please do whatever you want with it _

Comment by ChrisM6794, Jul 15, 2009

Wow, the wiki markup mangled my post a lot. Ah well.

Comment by gundlach, Aug 28, 2009

ChristM6794?: That's awesome! I've been busy since you wrote and haven't had a chance to check out your code until today. Way to go! I hope to have time some day to add a simple API to speech.py to optionally support multiple profiles. It would default to one, but you could specify a second if needed. I'll add a feature request for the future, pointing to your bitbucket code. (Of course, patches are very welcome :)

Comment by gundlach, Aug 28, 2009

ChrisM6794?: http://code.google.com/p/pyspeech/issues/detail?id=20 . Also, I got emailed your original wiki post, so no worries, I saw it in the original form :)

Comment by lpogorman, Nov 03, 2009

This works great in XP, but in Vista the speech recognizer recognizes not only the words in my grammar (e.g., the dwarfs in one of the example programs), but also system commands. Is there any way to turn off the Microsoft Speech Recognizer from accepting system commands? I think if it were in "dictation-only mode", this would work, but I don't see the functionality to do this.

Comment by gundlach, Nov 04, 2009

Ipogorman: Sorry, but I don't own a Vista machine; all my Vista testing has been through users reporting their experience.

Does this happen both when you use listenfor(phrases) and when you use listenforanything()? Or only in one of these modes?

Comment by lpogorman, Nov 04, 2009

It happens for both listenfor() and listenforanything() modes. It seems that the system first tests recognition against voiced system commands. So, if for instance, you say, "Open WordPad?", then WordPad? pops up, and this is not echoed in the program containing listenforanything(). I think if the program were able to specify dictation-only mode, then system commands would be disregarded. However, I can't see if this is possible from the Python program, and I can't figure out if there is the ability to turn off recognition of the system commands.

As I say, the XP speech was designed better to enable dictation and command-and-control separation. Maybe I should upgrade Vista to Windows 7 and hope it's better than Vista speech.

Comment by whitecro...@bak.rr.com, Nov 15, 2009

Hi Gundlach! It's Loren from Daniweb! Thought I'd FINALLY get to telling you how great your code is!

I'm using it on my robot, NINA, and its fabulous. My robot is pretty much completed and I'm working on a complex chat-kind of feature. Last Holloween, people were pretty impressed by my project (I stood outside with the robot and helped serve candy). They were especially excited that it was speech-commanded!

Great work, Gundlach! Many, many, many thanks!

By the way, any idea how your code might work on a Windows 7 machine?

Comment by gundlach, Dec 03, 2009

lpogorman: Sorry you've had trouble. I just bought a Windows 7 machine so will hopefully be able to diagnose this at some point in the next few months (unfortunately somewhat loaded with work ATM so not a lot of time for fun projects!)

Comment by gundlach, Dec 03, 2009

Loren: hi! You made my day. I'm so glad you've enjoyed the project and made such an awesome project of your own!

No idea how it will work on Windows 7, except for the comments you see in this thread re: Windows Vista (as Windows 7 has a lot of Vista under the covers.) I just got a Windows 7 machine so hope to try it out myself in the next few months.

Is NINA's code on the web? Does she physically move? How the heck did you do it?


Sign in to add a comment
Hosted by Google Code