Below are the images used for this project. I took images Street View 1 and Street View 2 in Queens, NYC for warping and mosaicing. These were done by standing firmly in the middle of the street and rotating my body about 30 degrees. The freedom tower and buildings images were taken with a similar method, but vertically.
Correspondence points were selected on image pairs (shown in green). These points were used to calculate the homography matrix.
The homography matrix H is calculated to map the coordinates from image 1 to image 2 via least squares. The matrix is shown below (rounded for visual pleasure).
Image rectification on the street would be a little boring, so I chose a more interesting photo. This image is from an Egyptian restaurant in SoHo. On the ceiling are a series of LED signs. I thought it would be cool to rectify that section to get a clearer view of them. I annotated the corner of that square sign with the hexagonal pattern.
The green points correspond to the sign corners, and the orange correspond to a manually designated square of with lengths corresponding to the longest distance between the green points.
For this situation it made the most sense to fix one image and shift the other to align the corresponding points. The second image was chosen to be fixed. The bounding box of the images is also expanded to accomodate the other image. This is calculated by warping the image corners of the first image, expanding the bounding box, calculating the overlap with the second image, and expanding the image based on the difference sans overlap in both directions.
Averageing the warped street view images works, but the results are very strange. We have color variations depending on if there is image overlap.
I generated masks from the warped images by thresholding where the pixel values were greater than 0.
I multiply these masks and their inverses to create 4 sections of the image.
The overlapping sections are averaged together, while the mutually exclusive pieces are taken at full magnitude.
We do have slight ghosting here. This is likely because I had some accidental degree of translation along with rotation when taking the image (what can I say, there were cars about to come).
I try to mitigate this a bit by implementing a 5-level Laplacian stack. Instead of averaging the overlapping regions, I instead select the View 2 to be on top (Piece 3). Aside from the abrupt end of the white line in the bottom-left corner, these results look pretty good.
A similar approach was implemented for the tower and buildings images, where the better piece was selected to be on top.
Doing this assignement made it very clear how signficiant very small off-axis effects can have on the prospective image alignment capabilities. While I was not aware of the small translations my hand did, it was enough to significantly shift the focal plane. While trying several iterations of this, I was able to align different parts of the image better by creating bias in that direction (adding more points to make the SSE favor alignment in that region), but I was unable to globally get rid of the doubling.
4-Point RANSAC was used to evaluate points for autostitching. The following parameters were used for each image pair. A larger amount of points were evaluated for the road image because the top 500 points with largest radii were located in non-overlapping parts of the image.
Road
Top Corresponding Harris Points Evaluated: 5000
Minimum Harris Distance: 25
Feature Matching Threshold NN1/NN2: .2
RANSAC $\epsilon = 5 \times 10^3$
Freedom Tower
Top Corresponding Harris Points Evaluated: 500
Minimum Harris Distance: 50
Feature Matching Threshold NN1/NN2: .2
RANSAC $\epsilon = 1 \times 10^4$
Buildings
Top Corresponding Harris Points Evaluated: 500
Minimum Harris Distance: 50
Feature Matching Threshold NN1/NN2: .2
RANSAC $\epsilon = 5 \times 10^3$
For the feature extraction portion, 40x40 patches are extracted centered at the point-of-interest. This is convolved with a 5x5 gaussian kernel, after which every 5th pixel is sampled. The patches where then normalized to N(0,1). A few extracted features are shown below.
New homographies were computed using the RANSAC selected points.
Mosaic images were created using these new homographies. The automatically stitched images are SIGNIFCANTLY better than the hand-stitched. We can see the ghosting is completely eliminated. The same 5-level laplacian pyramid method using the masks generated in part 1 were implemented for blending.
There are usually overlap issues on the edges of the individual images. This is likely a result of spherical aberration from the camera lens. It would be wiser in the future to take images with a larger degree of overlap so I could merely crop these artifacts out.
It was very cool to see the implementation of the harris points. I learned the importance of setting an epsilon criteria for the feature matching. It was also very nice to see how much better the computer is when computing statistics than my feeble human mind and hands.
I implemented rotational invariance. To do this, I calculated the gradient angle across the image via the inverse tangent. I then rotated the patch in the opposite direction of this angle and blurred + subsampled. I don't think this was a particularly successful attempt. I needed to increase the NN1/NN2 threshold from .2 to .3 to get a reasonable amount of points for RANSAC.
I'm getting a lot of false positives for correspondence points. There are issues but I'm not sure what they are.