|
Discussion
Discussion or questions about the speech module.
DiscussionThe wiki makes for a pretty simple forum. Feel free to discuss or ask questions below. I'll be happy to help you out. |
Sign in to add a comment
|
|
Search
|
|
Discussion
Discussion or questions about the speech module.
DiscussionThe wiki makes for a pretty simple forum. Feel free to discuss or ask questions below. I'll be happy to help you out. |
Sign in to add a comment
im getting
ImportError?: No module named win32com.client
is there other stuff needs to be installed? speech sdk or com stuff?
Right on both counts. From the Installation notes on the homepage:
If you don't already have it, you'll also need pywin32 (for Python 2.5 or for Python 2.4) and the Microsoft Speech kit (installer here).
Could you let me know how you found my module and how you went about getting it onto your system, so I can know better how to make those instructions less missable?
Michael
winxp
i tink i just read too fast
i found out about it on reddit.com
everything seems good after installing those two modules
thanks
Think you could add an optional parameter to speech.input() that allows text, as well as speech? That would be very useful, particularly when it is having trouble recognizing voice.
Hi,
It might be doable, but I'm not sure -- the builtin raw_input() function blocks the thread while you enter text, so if speech.input() wanted to allow text input as well, I think it would be forced to call raw_input(), thus forcing the user to type something in order to continue. Maybe it could be done by listening for individual keystroke events, or running things on multiple threads, but no guarantees.
As this would also complicate speech.input()'s interface, I'm leaving it out for now. If more people clamor for this, or if someone sent a working patch, I'd be more inclined to support it.
Be aware that speech.input() can be cancelled with Ctrl-C, so if you're having trouble with speech recognition, you could try something like
try: answer = speech.input("Say something, or press Ctrl-C if frustrated.") except KeyboardInterrupt: answer = raw_input("OK, just type your answer: ")Good luck, Michael
You have a function that checks for voice input, but does other stuff until that happens, right? What if you made a raw_input, but if voice comes, skip it? Would that be possible?
Right, that's the "multiple thread" approach I mentioned above -- not sure if it could be made to work, but possible. If you do get it to work, by all means post it here for everyone else's benefit! :)
I wouldn't be able to do it-- I'm no good with Python as of yet. :P
Is it possible to have this work even if the Python Window is minimized? Meaning, the window is minimized to the task bar, but if I talk to the mic, it still listens, and acts like a normal Python script? Thanks! I'm coding myself a laptop robot (because I'm kind of lazy, and I'd love to have her help me out :P) and it would be great if I could have the window minimized, and still be able to talk to her.
Yep, talking to a minimized window works fine.
I have Python 2.5 on Win XP Prof SP 2. I have installed the Microsoft Speech SDK and pywin32 linked in your install page.
When I first tried 'import speech' in IDLE I got: File "C:\Python25\Lib\site-packages\win32com\client\gencache.py", line 554, in AddModuleToCache?
AttributeError?: 'module' object has no attribute 'CLSIDToClassMap'After a little googling, I found a post regarding deleting the 'gen_py' directory in %WINDIR%\Temp. I did that and am now getting this error for which I can't seem to find any solution:
>>> import speech
Traceback (most recent call last):
TypeError?: Error when calling the metaclass basesI'm sorry that you're having trouble! I'm using the same setup as you (except XP Home) and can't reproduce this.
A couple of suggestions:
Good luck and please follow up with your findings,
Michael
A quick follow up. The problem turned out to be linked to the first error in my post above. There are two 'gen_py' directories, one in %WINDIR%\Temp, and the other in %PYTHON%\Lib\site-packages\win32com\. Deleting the one in site-packages and re-importing the speech module fixed the problem for me.
Arun
Thanks, Arun. I should have paid more attention and noticed you mentioning the Temp directory... glad you set things right.
Michael
What about non-standart voices? How can I choose the voice?
Grauwelf
I only support very simple text-to-speech. Check out pyTTS for lots of flexibility in choosing voices. My module is more focused on speech-to-text.
Hi,
It's very kind of you to provide this speech.py. It's very helpful.
I want to write a application running at background and it could grab user voice input and do speech-to-text work.
But I don't know how to do this? Is it possible write a python app running at background and get user voice input as normal. Do you have any idea about this?
Any hint will be very appreciated. Thanks in advance.
Hi Jiyu,
I've just responded to your private email as well -- sorry to take so many days to get back to you.
I believe that installing the Microsoft Speech SDK will fix the problem you were having, at which point the example code will work as intended. It is the basic building block for a background python app that gets user voice input and runs a callback function on the text.
I wrote http://musicbutler.googlecode.com as a test application, which runs in the background and controls your stereo in response to speech input. It's still in alpha, but it's a proof-of-concept for you if the example.py doesn't provide enough detail.
Thanks and good luck!
I'm not sure if you are still supporting this, but if so, would it be possible for you to write a function that adds words to the dictionary (temporarily?)
For example. I say, "I have a bad case of the heebie-geebies" the system would be like, "what?" and say something completely different. Would it be possible for you to create a function that, should I put the phrase, "heebie-geebies" in it as a parameter (perhaps in an array?) it will recognize it if it can't find a better match?
I recall playing with: http://surguy.net/articles/speechrecognition.xml
And though I couldn't get it to work correctly, there is one snippet that could prove quite useful:
if name=='main':
Any chance of this happening?
AFAIK, speech either works in dictation mode, in which it recognizes words based off of an unmodifiable dictionary, or in command mode, in which it tries to match your string of text to what you said.
The snippet you refer to (specifying words, then pumping messages) is done under the hood in speech.py by
which pumps messages as long as listener.islistening().
So the short answer is, I don't think it's doable. The best you can hope for is specifying specific phrases that you expect it to hear, or else work with the standard dictation dictionary.
Good luck, Michael
is it possible to use an audio file as input for speech recognition?
titusz.pan, I haven't tried it, but I would love to know if it worked! I'm thinking about making a Skype audio bot, and that would almost certainly involve feeding speech.py the audio from a call as a file.
If you can get it to work, please post here for the good of the community, and I'll roll your work into the module!
And there is no way, as far as you know, to add new phrases to the standard dictionary?
Correct.
here is a hint, that vista speech recognition can be feed with audio files: http://www.mymsspeech.com/microphones/prod_details.asp?prodID=228 see WSRToolkit Feature number 7
I appear to be having trouble importing a Python file that has already imported speech.
http://code.google.com/p/pyspeech/issues/detail?id=17
Thanks for your bug report. See http://code.google.com/p/pyspeech/issues/detail?id=17 for the solution.
I want to know during writing my python code, how can I know which voices are pre-installed?? Means in my windows Vista, Microsoft Anna comes as default, but if I install espeak for windows, then 3 more voices comes automatically. Now what I want during coding is that, which voice options are available to the user, so that I can show them into my application and user is able to select them directly from my application.
I want something like:
speak = win32com.client.Dispatch('Sapi.SpVoice?')
voices= speak.get_all_voices()
so that voices contains a list of all the voices available currently?? Please note that get_all_voices() is an imaginary function only for sake of clarity, I want something like this function. Is it available or not??
Thanks and regards
Chirag
Hi Chirag, sorry for the delay in responding. pyspeech does not give you any control or insight over the installed voices; it just uses the one that you have currently selected for the system.
pyspeech is more focused on speech input, where you talk to the computer; it has minimal speech output support just as a convenience. You might check out the pyTTS package for more control over speech output.
Good luck!
This module is great! Do you know if there's any way to run two instances at one time on separate microphones? I'm thinking of having two people talking on their own USB headsets at once (different voice profiles) and it being able to listen to both of them.
Man, that is a neat idea. Windows lets you do that -- have one profile trained on a male voice and the other on a female voice, or something? How do you normally tell it which profile to use for which microphone, when interacting directly with Windows?
I haven't heard of that and haven't any idea what the COM interop commands might be that would need to be implemented to get pyspeech to support that. You might ask over at the dragonfly project, as they are doing similar work and may have thought about this feature.
Windows lets you train separate profiles and select which one is active through the Speech Properties dialog in the Control Panel; but it looks like maybe you can only have one audio device and profile selected at any given time. But, I don't know if that's a limitation of the engine itself or if they just didn't put more options in the dialog.
If Windows doesn't expose it in the UI, I'm going to guess that you can't do it through COM. But I'd be pleased to be proven wrong if you want to dig through their (poor) COM docs and find a way to do it! :)
I am going to take a crack at running two audio devices with separate profiles, although my head may explode in the process. I know nothing whatsoever about COM so it will be interesting, but looking at this page suggests that maybe you can set the AudioInput? and the Profile properties... or something:
http://msdn.microsoft.com/en-us/library/ms722071(VS.85).aspx
If I ever get something going, I'll let you know and contribute it back.
I've got a basic demo up and working! It seemed incredibly complicated at first, but turned out to be pretty convenient:
- for recognizer we use a SAPI.SpInProcRecognizer? instead, which is an instance bound to a specific process instead of one shared between all processes. The interface is the same though, so all your existing code still works.
- We can iterate over the profiles with recognizer.GetProfiles?() and set one for the recognizer.Profile property. If you don't pick a profile, it will default to "Default Speech Profile".
- The same goes for recognizer.GetAudioInputs?() and the recognizer.AudioInput? property. Note that you must specify an audio device here; it won't automatically use the default device like the shared one does.
That's actually all it takes, now you can run one process with a given audio device and profile, and a second process with a second audio and profile. It's not clear to me whether you can run both of them in the same process (probably not) but that shouldn't be a huge problem for me (we can use Popen or multiprocessing to spawn a process for each device).
You can see my code here: http://bitbucket.org/kiv/speech/ Please do whatever you want with it _
Wow, the wiki markup mangled my post a lot. Ah well.
ChristM6794?: That's awesome! I've been busy since you wrote and haven't had a chance to check out your code until today. Way to go! I hope to have time some day to add a simple API to speech.py to optionally support multiple profiles. It would default to one, but you could specify a second if needed. I'll add a feature request for the future, pointing to your bitbucket code. (Of course, patches are very welcome :)
ChrisM6794?: http://code.google.com/p/pyspeech/issues/detail?id=20 . Also, I got emailed your original wiki post, so no worries, I saw it in the original form :)
This works great in XP, but in Vista the speech recognizer recognizes not only the words in my grammar (e.g., the dwarfs in one of the example programs), but also system commands. Is there any way to turn off the Microsoft Speech Recognizer from accepting system commands? I think if it were in "dictation-only mode", this would work, but I don't see the functionality to do this.
Ipogorman: Sorry, but I don't own a Vista machine; all my Vista testing has been through users reporting their experience.
Does this happen both when you use listenfor(phrases) and when you use listenforanything()? Or only in one of these modes?
It happens for both listenfor() and listenforanything() modes. It seems that the system first tests recognition against voiced system commands. So, if for instance, you say, "Open WordPad?", then WordPad? pops up, and this is not echoed in the program containing listenforanything(). I think if the program were able to specify dictation-only mode, then system commands would be disregarded. However, I can't see if this is possible from the Python program, and I can't figure out if there is the ability to turn off recognition of the system commands.
As I say, the XP speech was designed better to enable dictation and command-and-control separation. Maybe I should upgrade Vista to Windows 7 and hope it's better than Vista speech.
Hi Gundlach! It's Loren from Daniweb! Thought I'd FINALLY get to telling you how great your code is!
I'm using it on my robot, NINA, and its fabulous. My robot is pretty much completed and I'm working on a complex chat-kind of feature. Last Holloween, people were pretty impressed by my project (I stood outside with the robot and helped serve candy). They were especially excited that it was speech-commanded!
Great work, Gundlach! Many, many, many thanks!
By the way, any idea how your code might work on a Windows 7 machine?
lpogorman: Sorry you've had trouble. I just bought a Windows 7 machine so will hopefully be able to diagnose this at some point in the next few months (unfortunately somewhat loaded with work ATM so not a lot of time for fun projects!)
Loren: hi! You made my day. I'm so glad you've enjoyed the project and made such an awesome project of your own!
No idea how it will work on Windows 7, except for the comments you see in this thread re: Windows Vista (as Windows 7 has a lot of Vista under the covers.) I just got a Windows 7 machine so hope to try it out myself in the next few months.
Is NINA's code on the web? Does she physically move? How the heck did you do it?