computer vision, opencv relating real world object size to image pixel size

- W
- willydlw
  
  Contact options for registered users
posted
10 years ago

Tue, Apr 30, 2013 5:34 PM

My goal is to mount a web camera on a ground vehicle robot and have it recognize lanes and drive in between the lanes. In order to look for the appropriate size lines(lanes) in an image, I am trying to learn how to convert the size of a real world object to image pixel size.

If I mount a usb camera so that its height above the ground is a known fixed vertical distance from the ground, and the angle between the center of the camera lens is fixed, how do I relate the size of a real world object to the number of pixels in the image.

Example: My camera is two feet above the floor. If I define the y axis as pointing upward from the floor, my camera is on the y axis at real world coordinates (x = 0, y = 2 feet, z = 0). It has an angle of 15 degrees below the horizontal xz plane, meaning my camera is pointed downwards toward the floor.

I take an image of some bright green tape on the floor. The tape is 3 feet long and 2 inches wide. How do I translate an object of 3 feet long and 2 inches wide to pixel dimensions? How do I calculate how many pixels represent 3 feet in my image? I am trying to recognize lines of the approximate size.

Thank you in advance for your help.

- C
- Curt Welch
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, May 6, 2013 8:09 PM

Well, if you assume you will only be operating on a flat surface, you can do the math to translate this. But if the robot tilts at all (even a little), or the surface is not totally flat, you can't do a direct translation from an image to a size on the ground.

A rectangle on the ground, will show up as a trapezoid in the image data. Or, if you do the projection backwards, a square set of pixels, translates to a trapezoid area on the ground.

To do the math, you need to know the exact optical characteristics of the camera. For practical reasons, it will probably be easier if you ignore all that, and just create a translation based on empirical measurements. That is, draw an accurate grid on the ground, position the robot at an exact location on that grid, take an image, and then use the grid in the image as your foundation of how to map between pixels and grid locations on the ground. So the in the middle, the image might be 12" high on the ground, on the edges, it might 14.5", the bottom edge of the pixels might translate to 6" on the ground, and the top edge of the pixels might translate to 60 inches on the ground, etc.

What might be useful, is to take the pixel data, and translate it through a pixel resizing and distorting operation to create a ground-pixel map with constant size ground pixels (like 100 per inch or something). Then do the map processing on the ground pixels instead of the image pixels.

- C
- casey
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, May 20, 2013 1:01 AM

Have you looked at roborealm?

formatting link