Voice Controlled Robosapien

- R
- Robert Oschler
  
  Contact options for registered users
posted
19 years ago

Sun, Apr 3, 2005 1:01 AM

Yesterday I released the first version of Robosapien Dance Machine with Voice Control (Version Alpha - 2.0.1). You can now control a Robosapien with just your voice. The software still has all the original powerful scripting abilities for creating Robosapien movies, dances, and performances.

What's really great is that the voice recognition is being provided by the wonderful CMU Sphinx 3.5 speech recognition engine.

You can see a short movie of my Robosapien responding to my voice commands here:

formatting link

You can find the Robosapien Dance Machine files here:

formatting link

Technical support for the program can be received here:

formatting link

It's an Alpha version so it's most likely still got some wrinkles in it. I'd appreciate a bug report if you find any problems. Currently it requires the superb USB UIRT from Jon Rhees:

formatting link

I'm going to devote the month of April to supporting more infrared transmitters, especially some cheaper ones, as many of you have requested.

Thanks, Robert Oschler Robosapien Dance Machine

formatting link

- R
- Robo1
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sun, Apr 17, 2005 5:58 AM

That looks fake, you must have the remote where the camera doesn't see it and you act like it is voice recognition.

Make a video with the remote by him and then do your speech recognition.

- S
- Si Ballenger
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Apr 18, 2005 1:02 AM

That would certainly be proof positive for a google poster! ;-)

- R
- Robert Oschler
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Apr 19, 2005 8:10 PM

Simpler proof. It's an open source project. Download the program and try it. You don't need an infrared transceiver or even a Robosapien to test the voice recognition (it will just print the recognized text on the bottom of the screen).

Thanks, Robert Oschler

formatting link

- G
- Gordon McComb
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Apr 19, 2005 8:38 PM

Come on, Robert, admit it! It's clearly a fake. Nothing in robotics works this well!!

(Seriously, this is awesome. VERY nice job on this.)

-- Gordon

- R
- Robert Oschler
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Apr 19, 2005 10:36 PM

Gordon!

How are you?

Thanks for the compliment. Just wait till you see the next version coming out on May 1st. It's got even more.

Check out my thread on the HandyCam vision stuff. I keep waiting for your book on machine vision done on a budget. :)

Robert,

- A
- aiiadict
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Apr 20, 2005 7:00 PM

Detailed hardware interfacing and vision software description, please!

I have some great articles on vision that I put together, with tons of drawings, charts, diagrams, and psuedo-code so it's easily translatable to any language.

I did the same for the navigation program. If only I could find a publisher....

Rich

- R
- Robert Oschler
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Apr 20, 2005 7:18 PM

Rich,

Do you have a web site? I'd like to see some those items.

Thanks, Robert

- G
- Gordon McComb
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Apr 20, 2005 9:12 PM

The hardware for machine vision is fairly cheap. No problem there. It's all in the software!

I am currently working with commercially-available DirectShow filters that are intended for machine vision in factory automation-type applications. Beats reinventing the wheel, and I can apply all of my time in working with the data itself. Obviously not a Linux or Mac solution, though.

My immediate application isn't actually for robotics, but for doing certain image analysis of motion pictures, in realtime or better (preferably faster than 24 fps). But many of the same techniques can be used for robotics. Curiously, most people who have applied machine vision ideas to video/film stopped at shot change detection, or limit their systems to highly controlled studio environments for motion tracking (CG stuff). There's a whole lot more out there.

-- Gordon

- J
- JGCASEY
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Apr 21, 2005 12:18 AM

Gord> The hardware for machine vision is fairly cheap.

And those that know how to do it aren't telling.

What we need is ROBOT BASIC (or ROBOT C). This could be written in C++ using DirectX etc and compile for Windows and Linux but allow those without a degree in programming to tailor the hardware to their own ideas.

Actually I think Linux would be the most suitable from what I have read. A Robotic interface to the Linux kernal?

Essentially provide a simple means of reading USB ports and grabbing images from webcams at a reasonable speed.

In other words make it as easy to program a MB as it is to program a PIC using BASIC or C.

-- John

- G
- Gordon McComb
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Apr 21, 2005 1:39 AM

For native Linux, I imagine the typical DirectShow filter could be revised, as it's pretty much just standard C, but it's the idea of building "graphs" out of multiple filters that makes DirectShow so useful for this application (and the fact that DirectShow will automatically interpose necessary colorspace and decompression filters as needed, saving you the hassle of hand-building each graph from scratch). Does Linux offer a similar architecture?

Actually, DirectShow is kinda kludgy, and hard to use in VB or even C# (it's not COM compatible), but I hear Longhorn will use a new architecture, and will rely on managed code through .NET. Performance issues and OS dogma notwithstanding, this ought to bring the world of machine vision closer to mere mortals, but OS hooks through existing DirectShow filters will have to be revised. I figure a 3-5 year timeline.

Personally I feel a high frame rate is very useful. Low frame rates, especially at low shutter speeds, just creates blurs. Hard to do anything meaningful with these. The work that I'm doing relies on reasonably high resolution (but still standard def) video, at full 24fps or 30fps speeds. The limiting factor I'm up against is that the bulk of the video processing is on film that's been transferred to video. For most scenes, a film camera takes a fairly long exposure for each frame, so motion blur is common. Adds a layer of complexity in trying to figure out what's happening. OTOH, you can sometimes use the length and direction of the motion blur to determine velocity, etc, given the right circumstances.

Webcams are okay, but I think a better approach is a fairly good camera with an excellent lens, and full video speeds, over USB 2.0 or Firewire. Some of the mini-ITX boards have built-in video inputs, as well. Some interesting things can be done with a high resolution BW medical camera. (A really cool arrangement might be two cameras pointing at the same scene, or through a beam splitter. One could be color, and the other high res BW.)

-- Gordon

- J
- JGCASEY
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Apr 21, 2005 4:08 AM

The direction I am coming from is, "what is possible as regards using vision in a hobby robot"?

This would require something like a mini-ITX board, even if the i/o board uses its own uCs, and the web cam is the cheapest easily available option. It is sufficient for the needs of a simple robot.

So the next question is "what is the cheapest setup that would be accessible to the widest range of hobbyists?

Keep in mind that a hobbyist may not have a degree in computer programming and it is not reasonable that they should have one. The only choice I see is VB, despite its cost, or Java. Even with these languages they need to have access to routines to grab images from a webcam and use the USB port to communicate with the i/o card **and be shown how to use these routines** in their own programs. Even those who have played with VB may only have been able to reach a certain level with the "How to Learn Visual Basic in 21 days".

When I read "teach yourself Visual C++ 5 in 21 days" it didn't really explain anything. Just a set of recipes to follow to do a limited number of things using the Wizard things.

Of course there is no reason why a professional programmer could not provide a C++ shell in which a C programmer could access the camera and USB port. I have been using such a shell with VC++ but it is not suitable for general use as it limits you to Windows and requires you buy VC++ also it is very slow...

-- John

- A
- aiiadict
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Apr 21, 2005 4:11 PM

Would be very nice. Maybe something that looked like this:

webcam = any webcam = connected to USB

Xresolution = 100 ;camera code parameters Yresolution = 100 colors = blackAndWhite dim image(Xresolution, Yresolution, colors) ;place to put image

getImage ;call USB/ webcam code

Rich

- T
- the Artist Formerly Known as K
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Apr 21, 2005 8:20 PM

Video for Linux is basically a typical unix device interface, based on ioctl() calls to set various parameters, and read() to get frames. This makes it several orders of magnitude easier to use than direct show, but it doesn't offer the directed graph architecture that DirectShow does.

Really, I never cared much for the DirectShow architecture -- I usually end up doing any required color mapping to some kind of useful YUV or RGB format close to the source, and any processing happens subsequently. If I want a piping architecture, it's easy enough to do it myself, which has the added benefit of making the code significantly more portable. It also doesn't lock me in to all the indirection that the DirectShow API requires. Basically, I usually have a single FrameProducer class, an optionally ref-counted Frame class, and I just run with it from there.

One advantage that DirectShow has over VFL is that if you want a configuration UI for a device, you can just invoke the dialogs associated with the device. VFL doesn't offer this, since it generally adheres to the "interface not bound to implementation" philosophy (which is overall a good thing).

The only significant drawback to video capture on linux really has to do with setting up devices; there's support for a lot of capture/webcam hardware for Linux, but the hardware usually tends to be older, since very little ships with linux drivers. You need to take some care when choosing a capture card or webcam, and be wary of newer hardware. It's generally worth spending some quality time with Google before making a purchase.

Cheers - m

- R
- Robo1
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Apr 22, 2005 3:38 AM

Come on guy's this is fake I could tell by his vocal chords that it is fake.

Also Gordon McComb you of all people, I would not think that you would think that this is real.

Come on now?

- G
- Gordon McComb
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Apr 22, 2005 4:55 AM

Have you downloaded the code and/or precompiled executable yet from Sourceforge?

The speech recognition engine is Microsoft's -- comes free with Windows

-- and IR transceivers aren't exactly new technology. I'm not sure what it is you're thinking is faked. I just like how Robert has put everything together.

-- Gordon

- G
- Gordon McComb
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Apr 22, 2005 5:03 AM

Sorry..my mistake! CMU's Sphinx. (So that's why it loaded so fast! )

-- Gordon

- R
- Robo1
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Apr 22, 2005 9:02 AM

That does not mean anything just because there is the source code doesn't mean he did it come on now.

If he has the sapien with VR why doesn't he just make another video with him showing the saipen and pointing the camera 360d and show that know one else is around and point the camera at a angle that only show's him and the sapien and then he does the VR thing agian.

- R
- Robo1
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Apr 22, 2005 9:05 AM

Also Gordon McComb I am the plastic guy that called today about the color of the plastic.

- S
- Si Ballenger
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Apr 22, 2005 2:52 PM

Didn't you say the below would be proof for you??? ;-)