Robot vision - which algorithms do you recommend?

I have been planning a robotic project and will base my system on a Mini-ITX running a 1 GHz Via CPU. I will also use a single USB webcamera (most probably the Logitech QuickCam 4000 Pro) for its vision. Everything will be programmed in Java using JMF. But before I start building the thing I want to work more on the "cognitive" tasks of the robot first.

My main task for the first part of the project is to have the robot respond to a speech command (using IBM Viavoice 10) which would ask the robot to identify the items it can see and read them out aloud (speech by AT&T natural voices). I can basically develop all of this on a standard computer today.

I have looked into some algorithms for vision, and the ones I see as having a good degree of accuracy is the ones based on feature learning and recognition based on AdaBoost algorithm (which I do not really have the deep knowledge of yet). These seem to base on learning a neural network where you teach the network how e.g. a face looks like based on simple and tiny primitives (contrasts in the image). However, the theories I have read papers on lacks more "hands on" details I'd like to see. If anyone has any code examples on these algorithms (and hopefully a better explanation) I would be very happy to get some references.

I have previously worked with neural nets using back-propagation learning algorithms which worked quite well (used for image compression work by Prof. Munroe at Univ. of Pittsburgh) and it too learns features in an image. I might try this method too and see how I can adapt it to vision. The learning process can take a long time, but it is important that the identification is close to real-time. The work around AdaBoost seem to have some pointer to this (using integral image to quickly calculate features). I am just not sure how I go about learning the network and identifying the features found. Also more details about sliding the recognition window across the image at different scales is also needed. I have some ideas, but why re-invent the wheel?

Also, it seems that some parameters about the environment context can assist the recognizer. E.g. if it can see a face, there is a higher probability that there is a body underneath it. Coffee cups are most probably upright and not on its side. If other items indicate that its indoors (or the robot might already know this), then it is not likely that it is a dolphin it observes on the table (if that got a high match). Of course these rules are quite manual in the way that we need to weight them and there is all sorts of exceptions (there might be a picture of a dophin on the wall). A context graph about the relation between objects seems to be a way to go. E.g. the rule "Cup - lies on top of - Table" might be an indication that if it has seen a cup, then it might be a table it has observed as a "blob" underneath. A mental memory map can be created based on these relations too.

Now, the use-case needs to be limited to certain tasks at first though. If I can get it to respond to a voice command: "Tell me what you see?" with the spoken answer: "I see two faces, a ball and a cup." I would be very satisfied!

As for "intelligent" conversation I have been looking at Alice which has an AIML XML based language for simulating intelligent conversation where the robot can learn about its surroundings based on input (which I hope to be able to do with speech recognition). For instance, it should be able to recognize a face in its vision and ask "Hello, what is your name?", to which your reply will be memorized and used for later conversation. It should also act like a simple intelligent agent where I could ask it information like "What time is it?" and "What is the weather forecast?" (in which it would use the wlan access to get weather information).

I guess there is many ways I could take this but as a starter I'd like to do as much as possible using a normal computer and a webcamera. And then I can work more on the hardware and building the robot. Most people seem to do it the other way though.

Best regards, JC

Reply to
Jeceel
Loading thread data ...

Check out OpenCV

formatting link
suggest you to play around a little with the demos that it ships with, you might get a better idea of whats out there and what your needs are. Stereo pair of cheap USB cameras would be a good hardware setup to start with.

-kert

Reply to
Kaido Kert

----- Original Message ----- From: "Jeceel" Newsgroups: comp.robotics.misc Sent: Wednesday, January 28, 2004 2:51 AM Subject: Robot vision - which algorithms do you recommend?

Most people do it the other way because they can get their "robot" up and running around. The "brain" part rapidly becomes a task that would make writing your own version of Windows a breeze! Thus not many backyard robotics enthusiasts, that make up the bulk of those using comp.robotics.misc, delve into vision. There is a simple CMUCam for tiny bots.

I did an introductionary course on Java last year but haven't any experience with Java Media FrameWorks or how to use it. My take on Java is it might be a bit slow?

Like you, I wanted to do as much as possible on a normal computer and a webcam. I started with an old monochrome Connectix QuickCam (now LogiTech). This was great because I could run it from DOS and use the languages I knew best, assembler and C. I also had full control of the camera so I didn't need window drivers or for that matter any of the complications of windows programming. I have used a LogiTech color webcam with VC++ to grab and process images (movement detection) using the logitech SDK. Movement is a very good way to detect and outline people. The face will be the blob on the top :-)

I haven't cared much for neural networks for robot brains although I suspect they are closer to how our brain work. Rather I would hard code everything. For example if I wanted to recognize a set of hand written characters I would select a set of features and then write routines to extract those features and their spatial relationships. Although I haven't tackled facial recognition I would do it the same way.

Did you have any luck with the suggestion to check out opencvlibrary?

-- John Casey

Reply to
JGCasey

The best alternative I have (personally) found for vision is the ER1 from Evolution Robotics. I bought one of these kits a couple of years ago, and the vision recognition is awesome! If you are wanting to build everything from scratch, then it is not the kit for you; but if you want a robot that can be thrown together quickly and teach it to recognize your face, magazine covers, icons placed around the house, then it is worth checking out.

Here is the web site:

formatting link
I started out using the very laptop I am typing this post on as the brains. Then I purchased a mini-book PC for the brains.

They also have a ERSP available for Windows and for Linux. It's their development kit, and is in Java. I'm not sure if it is still available to all new purchasers of the kit or not, but sometimes they are listed on eBay and you might be able to pick up the ERSP as well. Their Customer Service is incredible, so you might want to contact them up front and ask about it. I spoke with them for almost 6 months before purchasing my 'bot, and they were always very polite, patient, candid, and informative.

Well, now you have something else to consider! JCD

you

Reply to
JC-Atl

PolyTech Forum website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.