Robot vision - which algorithms do you recommend?

I have been planning a robotic project and will base my system on a
Mini-ITX running a 1 GHz Via CPU. I will also use a single USB
webcamera (most probably the Logitech QuickCam 4000 Pro) for its
vision. Everything will be programmed in Java using JMF. But before I
start building the thing I want to work more on the "cognitive" tasks
of the robot first.
My main task for the first part of the project is to have the robot
respond to a speech command (using IBM Viavoice 10) which would ask
the robot to identify the items it can see and read them out aloud
(speech by AT&T natural voices). I can basically develop all of this
on a standard computer today.
I have looked into some algorithms for vision, and the ones I see as
having a good degree of accuracy is the ones based on feature learning
and recognition based on AdaBoost algorithm (which I do not really
have the deep knowledge of yet). These seem to base on learning a
neural network where you teach the network how e.g. a face looks like
based on simple and tiny primitives (contrasts in the image). However,
the theories I have read papers on lacks more "hands on" details I'd
like to see. If anyone has any code examples on these algorithms (and
hopefully a better explanation) I would be very happy to get some
I have previously worked with neural nets using back-propagation
learning algorithms which worked quite well (used for image
compression work by Prof. Munroe at Univ. of Pittsburgh) and it too
learns features in an image. I might try this method too and see how I
can adapt it to vision. The learning process can take a long time, but
it is important that the identification is close to real-time. The
work around AdaBoost seem to have some pointer to this (using integral
image to quickly calculate features). I am just not sure how I go
about learning the network and identifying the features found. Also
more details about sliding the recognition window across the image at
different scales is also needed. I have some ideas, but why re-invent
the wheel?
Also, it seems that some parameters about the environment context can
assist the recognizer. E.g. if it can see a face, there is a higher
probability that there is a body underneath it. Coffee cups are most
probably upright and not on its side. If other items indicate that its
indoors (or the robot might already know this), then it is not likely
that it is a dolphin it observes on the table (if that got a high
match). Of course these rules are quite manual in the way that we need
to weight them and there is all sorts of exceptions (there might be a
picture of a dophin on the wall). A context graph about the relation
between objects seems to be a way to go. E.g. the rule "Cup - lies on
top of - Table" might be an indication that if it has seen a cup, then
it might be a table it has observed as a "blob" underneath. A mental
memory map can be created based on these relations too.
Now, the use-case needs to be limited to certain tasks at first
though. If I can get it to respond to a voice command: "Tell me what
you see?" with the spoken answer: "I see two faces, a ball and a cup."
I would be very satisfied!
As for "intelligent" conversation I have been looking at Alice which
has an AIML XML based language for simulating intelligent conversation
where the robot can learn about its surroundings based on input (which
I hope to be able to do with speech recognition). For instance, it
should be able to recognize a face in its vision and ask "Hello, what
is your name?", to which your reply will be memorized and used for
later conversation. It should also act like a simple intelligent agent
where I could ask it information like "What time is it?" and "What is
the weather forecast?" (in which it would use the wlan access to get
weather information).
I guess there is many ways I could take this but as a starter I'd like
to do as much as possible using a normal computer and a webcamera. And
then I can work more on the hardware and building the robot. Most
people seem to do it the other way though.
Best regards,
Reply to
Loading thread data ...
Check out OpenCV
formatting link
suggest you to play around a little with the demos that it ships with, you might get a better idea of whats out there and what your needs are. Stereo pair of cheap USB cameras would be a good hardware setup to start with.
Reply to
Kaido Kert
----- Original Message ----- From: "Jeceel" Newsgroups: comp.robotics.misc Sent: Wednesday, January 28, 2004 2:51 AM Subject: Robot vision - which algorithms do you recommend?
Most people do it the other way because they can get their "robot" up and running around. The "brain" part rapidly becomes a task that would make writing your own version of Windows a breeze! Thus not many backyard robotics enthusiasts, that make up the bulk of those using comp.robotics.misc, delve into vision. There is a simple CMUCam for tiny bots.
I did an introductionary course on Java last year but haven't any experience with Java Media FrameWorks or how to use it. My take on Java is it might be a bit slow?
Like you, I wanted to do as much as possible on a normal computer and a webcam. I started with an old monochrome Connectix QuickCam (now LogiTech). This was great because I could run it from DOS and use the languages I knew best, assembler and C. I also had full control of the camera so I didn't need window drivers or for that matter any of the complications of windows programming. I have used a LogiTech color webcam with VC++ to grab and process images (movement detection) using the logitech SDK. Movement is a very good way to detect and outline people. The face will be the blob on the top :-)
I haven't cared much for neural networks for robot brains although I suspect they are closer to how our brain work. Rather I would hard code everything. For example if I wanted to recognize a set of hand written characters I would select a set of features and then write routines to extract those features and their spatial relationships. Although I haven't tackled facial recognition I would do it the same way.
Did you have any luck with the suggestion to check out opencvlibrary?
-- John Casey
Reply to
The best alternative I have (personally) found for vision is the ER1 from Evolution Robotics. I bought one of these kits a couple of years ago, and the vision recognition is awesome! If you are wanting to build everything from scratch, then it is not the kit for you; but if you want a robot that can be thrown together quickly and teach it to recognize your face, magazine covers, icons placed around the house, then it is worth checking out.
Here is the web site:
formatting link
I started out using the very laptop I am typing this post on as the brains. Then I purchased a mini-book PC for the brains.
They also have a ERSP available for Windows and for Linux. It's their development kit, and is in Java. I'm not sure if it is still available to all new purchasers of the kit or not, but sometimes they are listed on eBay and you might be able to pick up the ERSP as well. Their Customer Service is incredible, so you might want to contact them up front and ask about it. I spoke with them for almost 6 months before purchasing my 'bot, and they were always very polite, patient, candid, and informative.
Well, now you have something else to consider! JCD
Reply to

PolyTech Forum website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.