Yesterday I released the first version of Robosapien Dance Machine with Voice Control (Version Alpha - 2.0.1). You can now control a Robosapien with just your voice. The software still has all the original powerful scripting abilities for creating Robosapien movies, dances, and performances.
What's really great is that the voice recognition is being provided by the wonderful CMU Sphinx 3.5 speech recognition engine.
You can see a short movie of my Robosapien responding to my voice commands here:
formatting link
You can find the Robosapien Dance Machine files here:
formatting link
Technical support for the program can be received here:
formatting link
It's an Alpha version so it's most likely still got some wrinkles in it. I'd appreciate a bug report if you find any problems. Currently it requires the superb USB UIRT from Jon Rhees:
formatting link
I'm going to devote the month of April to supporting more infrared transmitters, especially some cheaper ones, as many of you have requested.
Simpler proof. It's an open source project. Download the program and try it. You don't need an infrared transceiver or even a Robosapien to test the voice recognition (it will just print the recognized text on the bottom of the screen).
Detailed hardware interfacing and vision software description, please!
I have some great articles on vision that I put together, with tons of drawings, charts, diagrams, and psuedo-code so it's easily translatable to any language.
I did the same for the navigation program. If only I could find a publisher....
The hardware for machine vision is fairly cheap. No problem there. It's all in the software!
I am currently working with commercially-available DirectShow filters that are intended for machine vision in factory automation-type applications. Beats reinventing the wheel, and I can apply all of my time in working with the data itself. Obviously not a Linux or Mac solution, though.
My immediate application isn't actually for robotics, but for doing certain image analysis of motion pictures, in realtime or better (preferably faster than 24 fps). But many of the same techniques can be used for robotics. Curiously, most people who have applied machine vision ideas to video/film stopped at shot change detection, or limit their systems to highly controlled studio environments for motion tracking (CG stuff). There's a whole lot more out there.
Gord> The hardware for machine vision is fairly cheap.
And those that know how to do it aren't telling.
What we need is ROBOT BASIC (or ROBOT C). This could be written in C++ using DirectX etc and compile for Windows and Linux but allow those without a degree in programming to tailor the hardware to their own ideas.
Actually I think Linux would be the most suitable from what I have read. A Robotic interface to the Linux kernal?
Essentially provide a simple means of reading USB ports and grabbing images from webcams at a reasonable speed.
In other words make it as easy to program a MB as it is to program a PIC using BASIC or C.
For native Linux, I imagine the typical DirectShow filter could be revised, as it's pretty much just standard C, but it's the idea of building "graphs" out of multiple filters that makes DirectShow so useful for this application (and the fact that DirectShow will automatically interpose necessary colorspace and decompression filters as needed, saving you the hassle of hand-building each graph from scratch). Does Linux offer a similar architecture?
Actually, DirectShow is kinda kludgy, and hard to use in VB or even C# (it's not COM compatible), but I hear Longhorn will use a new architecture, and will rely on managed code through .NET. Performance issues and OS dogma notwithstanding, this ought to bring the world of machine vision closer to mere mortals, but OS hooks through existing DirectShow filters will have to be revised. I figure a 3-5 year timeline.
Personally I feel a high frame rate is very useful. Low frame rates, especially at low shutter speeds, just creates blurs. Hard to do anything meaningful with these. The work that I'm doing relies on reasonably high resolution (but still standard def) video, at full 24fps or 30fps speeds. The limiting factor I'm up against is that the bulk of the video processing is on film that's been transferred to video. For most scenes, a film camera takes a fairly long exposure for each frame, so motion blur is common. Adds a layer of complexity in trying to figure out what's happening. OTOH, you can sometimes use the length and direction of the motion blur to determine velocity, etc, given the right circumstances.
Webcams are okay, but I think a better approach is a fairly good camera with an excellent lens, and full video speeds, over USB 2.0 or Firewire. Some of the mini-ITX boards have built-in video inputs, as well. Some interesting things can be done with a high resolution BW medical camera. (A really cool arrangement might be two cameras pointing at the same scene, or through a beam splitter. One could be color, and the other high res BW.)
The direction I am coming from is, "what is possible as regards using vision in a hobby robot"?
This would require something like a mini-ITX board, even if the i/o board uses its own uCs, and the web cam is the cheapest easily available option. It is sufficient for the needs of a simple robot.
So the next question is "what is the cheapest setup that would be accessible to the widest range of hobbyists?
Keep in mind that a hobbyist may not have a degree in computer programming and it is not reasonable that they should have one. The only choice I see is VB, despite its cost, or Java. Even with these languages they need to have access to routines to grab images from a webcam and use the USB port to communicate with the i/o card **and be shown how to use these routines** in their own programs. Even those who have played with VB may only have been able to reach a certain level with the "How to Learn Visual Basic in 21 days".
When I read "teach yourself Visual C++ 5 in 21 days" it didn't really explain anything. Just a set of recipes to follow to do a limited number of things using the Wizard things.
Of course there is no reason why a professional programmer could not provide a C++ shell in which a C programmer could access the camera and USB port. I have been using such a shell with VC++ but it is not suitable for general use as it limits you to Windows and requires you buy VC++ also it is very slow...
Video for Linux is basically a typical unix device interface, based on ioctl() calls to set various parameters, and read() to get frames. This makes it several orders of magnitude easier to use than direct show, but it doesn't offer the directed graph architecture that DirectShow does.
Really, I never cared much for the DirectShow architecture -- I usually end up doing any required color mapping to some kind of useful YUV or RGB format close to the source, and any processing happens subsequently. If I want a piping architecture, it's easy enough to do it myself, which has the added benefit of making the code significantly more portable. It also doesn't lock me in to all the indirection that the DirectShow API requires. Basically, I usually have a single FrameProducer class, an optionally ref-counted Frame class, and I just run with it from there.
One advantage that DirectShow has over VFL is that if you want a configuration UI for a device, you can just invoke the dialogs associated with the device. VFL doesn't offer this, since it generally adheres to the "interface not bound to implementation" philosophy (which is overall a good thing).
The only significant drawback to video capture on linux really has to do with setting up devices; there's support for a lot of capture/webcam hardware for Linux, but the hardware usually tends to be older, since very little ships with linux drivers. You need to take some care when choosing a capture card or webcam, and be wary of newer hardware. It's generally worth spending some quality time with Google before making a purchase.
Have you downloaded the code and/or precompiled executable yet from Sourceforge?
The speech recognition engine is Microsoft's -- comes free with Windows
-- and IR transceivers aren't exactly new technology. I'm not sure what it is you're thinking is faked. I just like how Robert has put everything together.
That does not mean anything just because there is the source code doesn't mean he did it come on now.
If he has the sapien with VR why doesn't he just make another video with him showing the saipen and pointing the camera 360d and show that know one else is around and point the camera at a angle that only show's him and the sapien and then he does the VR thing agian.
PolyTech Forum website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.