Sylvia has been working with 123D Catch so I wanted to try some other software alternatives to see if I could get any better results. I came across VisualSFM and decided to give it a try. Normally I prefer to handle these sorts of things with my own code, but I wanted something that anyone could easily make use of.
Golum is going to be my model for this project. I had used a different object earlier that didn’t have a lot of surface detail and the algorithms couldn’t find matches between images to create a model.
Structure from Motion
One way our brain produces a three dimensional image from the world around us is using stereoscopic vision. Our eyes can detect the difference in the direction traveled by the light rays entering each eye and that difference allows us to extrapolate the position of the object.
Another way to determine the position is by observing how the position of an object changes as we move. If you go for a walk in a straight line, you will notice that objects closer to you change their relative position to you much faster than objects that are farther away. You make use of this fact to determine relative distances to objects as you move. The way Structure from Motion works is to look at how easily identifiable points on a model change their position in the image as the position of the camera changes. If you change the position of the camera and notice that one point on the model changes quite a bit but another point doesn’t change much, the part of the model that moved the most is closer to the camera.
The way programs like VisualSFM work is to compare each possible pair of images and find clumps of pixels in the two images that look similar. The software is able to determine the position of the camera relative to the image and then use this information to determine where a clump of pixels (a point on the model) is located
VisualSFM actually shows you the location of the cameras around the model, which comes in handy for finding gaps in your camera coverage. The software determines the relative positions of the camera and the sparse point cloud using the matching points on the model found in each pair of images (the point cloud is the set of x, y, and z coordinates that define each point found on the model). The sparse point cloud gives a rough outline of the model.
Another routine (CMVS in the case of VisualSFM) then creates a dense point cloud which uses the sparse point cloud to create a higher quality image. Once the point cloud is created you can use another software package to create the mesh that defines the image. I used Meshlab to clean up the point cloud and then generate the mesh. You can see the results below.
- Lighting, lighting, lighting. The camera needs good consistent lighting to identify common points in pairs of images.
- Only make small movements between pictures. The more points in common between pairs of images the better the results.
- Make sure you model has a lot of detail. The matching algorithms can’t find matching points in the images without details on the model.
- Look at the camera positions in VisualSFM. If you see an area where cameras are not present, go back and take more pictures to fill in the gaps.
It really helps if you use the dense point cloud to make the model and not the sparse point cloud. The model looks much better and you can even make out the facial features. I think with some more tweaking it could look pretty good.