Visual recognition and the SIFT algorithm

- R
- Randall P. Hootman
  
  Contact options for registered users
posted
18 years ago

Thu, Feb 23, 2006 7:31 PM

Last night at the HomeBrew Robotics Club

formatting link

here in the Silicon Valley, 4 of our members put on an amazing demonstration of visual recognition using Lowe's SIFT algorithm.

How they were able to put on the demonstration in the first place was actually amazing in it's self. They first searched the web to find any code that would help them, and when that failed, they plowed through the mathematical paper on the algorithm and them implemented it. That was a very tedious task.

It was a great presentation. First, one of the fellows showed his computer (with webcam attached) first train on objects and then recognizing these various objects by speaking what they were: a doll, a dollar bill, etc. What was amazing was these objects could be rotated, partially obsured, distorted, at varied distance, etc. and could still be recoginized using this algorithm. For example, the computer trained on both a one dollar bill and a twenty dollar bill. The bills were then shown to the webcam rotated, crumpled, with a finger overlaying the bill at varying distances and the computer said what the object was each and every time. The kicker is that the computer can be trained on many many many objects and recognize any or all of them. The training data is stored in a database and the real problem is that it really comes down to searching the database speed as to how fast objects are recognized.

Next, there was a gentle introduction to the algorithm. The guy showed a few slides and then a program he had written in Visual Basic that took you visually through the steps of the algorithm and explained what was going on in each step.

Next, there was a slightly more technical explaination.

But what really blew the crowd away (not that the crowd of about 50 members wasn't already) was this: the next presenter had taken a CMUJcam and attached it to a cheap FPGA in which he had implemented some of the algorithm. While it was not yet recognizing objects (he is taking development in steps and he has a real job with Xilinks). He pointed the CMUcam at us, and at 60 FPS, there the crowd was in outlined form. A guy in the back of the crowd started throwing and spinning a hat in the air. No problems. The crowd started moving to see how robust this was. No problem. Truely amazing.

There now are plans of finishing implementing Lowe's SIFT algorithm in an FPGA, attaching a camera lens to it and selling them to HBRC members to play with (read debug). The expected cost was somewhere under $50. That's right. I'll type it again. $50. But I would think it worth it for twice that much or more.

- P
- Padu
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Feb 23, 2006 8:00 PM

Really interesting stuff. From my weak knowledge on classification, it seems that the SIFT algorithm is really useful for extracting a feature vector from the image at hand. Do you know what type of algorithm they used for classification?

Cheers

Padu

- J
- JGCASEY
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Feb 23, 2006 8:16 PM

I agree with Padu, really interesting stuff.

I googled with the subject line,

Lowe's SIFT algorithm

to get more information.

-- JC

- A
- aiiadict
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Feb 23, 2006 8:30 PM

what is the point of this paragraph? "no problem" ?

are you saying that the CMUcam had no problem outlining objects at 60pfs?

Rich

- B
- Brad Smallridge
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Feb 23, 2006 8:46 PM

David G. Lowe has a US patent 6,711,293 that describes a visual aparatus and method whereby images are blurred and then subtracted from the original image. Individual pixels that are maximal or minimal, called extrema in the patent, are picked out from the image and then a calcualtion involving radial zones and the summation of difference vectors in those zones is determined for each area around those extrema. Once these numbers are determined, Lowe uses a generalized Hough transform to correlate objects, with a pre-trained set of objects.

Hough has an early 1962 US patent 3,069,654 which describes a method of classifying images by charting the slopes and intercepts of line segments found in images and then finding correlations in those patterns. Interesting reading, especially for not so mathematically inclined, since the entire apparatus is designed without a computer.

Since then, I know that many researchers have adopted this technique and developed what is called a Generalize Hough Trasform. And I know that some researchers have used this around pixel extremas. I haven't dechiphered what Lowe is doing so far in his patent that makes it novel. Perhaps it is the incorporation of scanning different image sizes, that makes it novel, and this accomodates the recognition of an object at varying distances.

I believe this is the same technology marketed by Evolution in its ViPR technology. Please tell me if I am wrong because this is the first time I heard the name SIFT. At RoboNexus I picked up a demo copy of ViPR and it seems to work reasonably well. I didn't learn 10,000 images, as it claims to be able to accomodate.

Who gave the talk at Homebrew? Wish I had been there.

Thanks,

Brad Smallridge aivision dot com

- M
- Mark H
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Feb 23, 2006 11:24 PM

Just checking into this group after being gone for a while, thought I would weigh in on SIFT. I took a computer vision class last spring and while we didn't implement SIFT (we did implement Hough), we spent quite some time studying it. It's not for the faint of heart. If anybody is interested in it, however, I can probably find some notes and post them here. The technology is currently being used in some photostitching software which takes a collection of photos and automatically stitches them together (instead of manually creating a mapping between photos). It works *really* well, and I'd be shocked if the inventor isn't a millionaire some day.

- R
- Randall P. Hootman
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Feb 24, 2006 1:55 AM

I think it is the same the Evolution passed out at RoboNexus. At least they had the Evolution disk there with them last night. However, in their demo they did not use Evolution's stuff. They used their own software/firmware/hardware.

There were 4 persons giving the talk and demonstration. They were Dave, Ingolf, John, and Brandon. Dave and John gave talks, Ingolf showed the implementation he had done in Visual Basic and Brandon showed the implementation he had done with the FPGA (which was not fully implemented yet).

- D
- dan michaels
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Feb 24, 2006 5:37 PM

Hi Brad. If you look at Lowe's page, you'll see he is on the advisory committee of ER.

formatting link

Regards what makes this, or any other, patent unique is always a moot discrimination. Certainly, the first part of this is old-hat ... "... images are blurred and then subtracted ...", called lowpass/bandpass filtering, inverse filtering, etc. And as you point out, some of the other aspects are also old-hat or covered by other patents. Some of us think that people with money file patents simply to keep others from "using" the "obvious", after the obvious has been "pointed out". :-)

- B
- Brad Smallridge
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Feb 26, 2006 7:55 PM

I hope I didn't say that the patent wasn't novel. I meant to say that I haven't identified what was novel and that is because I haven't studied the claims well enough, and I don't have a good knowledge of correlation algorithms. I was hoping that someone might fill in the blanks.