For this part, I chose scenic shots to create mosaics. The first image I chose was near Moffit Library, near the entrance to a building. The second image I chose was near Berkeley Way West. The third image I chose was of my living room. To take these images, I first took an image in a fixed direction. Then, I rotated my phone camera either directly right or left, so that the two images wouldn't be seperated by rotations. To rectify images, I chose a picture of my roommate's Spongebob figure, since the shape is overall rectangular. I also chose a random textbook lying around. Below are the images I took:
For this part, I used the same label selection tool provided in the previous project. For each image, I selected more than 4 points, usually 6-8 points, to create a overdetermined system of equations. With each point, I first stacked them into two lists, one for image 1, and the other for image 2. For a pair of x,y values for a single point, the equation becomes H * [x_1, y_1, 1].T = [x_2, y_2, 1].T. To solve this, we try to apply SVD. We move all of the terms from the previous equation to one side, and then we take the coefficients of the terms of H and stack them together to create matrix A. This means that the first row of A is for all of the x terms, and the second row is for y terms. Because we have moved all of the terms to one side, we want to find some solution such that A*h = 0, where h is the vector of all terms from H stacked together. From SVD, we know that A = U * S * V.T, and that the last column of V is the solution to A*h = 0. After getting the last column, we now need to normalize it by making the bottom-right element 1. And to reverse this homography, we simply need to take the inverse of H, which is what computeH() will return.
Using computeH() from the previous part, I implemented warpImage() to warp the images. First, I got the homography H of the image to be warped and passed it in as a parameter. Then, I got the corners of the image. This is because I need to warp the corners in order to get the new dimensions of the warped image. Using this new information, I initialized a new image with large enough dimensions to store the warped image. Then, I got the polygons of the warped image, and I mapped this to the base image by using the inverse warp, meaning I took the inverse of H. Finally, I copied the pixels from the warped image onto this canvas, producing the final warped image. To test this part out, I tried rectifying the images below. Because of Spongebob's shape, the basic rectangular area has been rectified, where the face now appears to be facing more towards the front. The book also appears to be facing straight up. However, because the book was a bit worn out, the edges were not perfectly straight. The plushie's edges also were a bit round, since it is a plushing, and it is not a figurine with a fixed frame. Regardless, the images look good are are rectangular, with its face facing straight out of the screen. Below are the before and after images:
For this part, I simply used computeH and warpImage for the majority of the logic. First, I used warpImage to get the warped image. Next, I needed to create a mask for both images, from which I could create a mask of the overlapping region for the two images. This is crucial, because I don't want to naively blend the two images together. Both images do not completely fill up the entire canvas space, so there are a lot of black space which will ruin such a naive blend. By using this mask of the overlapping region between the two images, I can just blend this part only, which creates a seamless result. For this, I used alpha blending. For the photos near Moffit, I had a sharper turn, having an overlap of close to 40% of the images. Because of this, the warped image had to stretch the pixels much more, which yielded a lower resolution result near the left side. The living room blend, which had photos much closer to each other, showed less of a stretch. Below are the results:
This part of the project is an extension of the previous part, where now we want to automatically stitch images together instead of having to manually choose the feature points. To do this, we follow the MOPS paper. Mainly, we split this into 5 steps:
For this part, we first start with the Harris Interest Point Detector. Below are all related images to this part. This part of the code was provided for us as part of the project. However, this naive approach for colleecting feature points is not idea. As shown below, this meant that too many points are captured, some of which are due to noise. This also might be due to the fact that my camera lens isn't good, so the image quality for shots such as the sky might introduce some sort of pixelated gradient, which introduces incorrect "edges" in a supposed to be smooth section of the photo. As a result of this, the image is covered with feature points using the naive Harris corner detector. Below are the images for the H matrix and points detected of the sample image:
Again, notice how there are way too many points detected. Most are also simply noise, such as all the ones in the sky, which we don't want to affect our results. However, the Harris matrix H shows certain areas are "important". This is because their pixel corner scores are higher, so those areas are brighter in the matrix. To remove such noise, we can filter the detected images to keep only the points that have a high enough corner strength, which is given by values in H. Each value in H is the score for the corresponding pixel from the original sample image. I simply filtered by using a threshold of 0.1, which is a good value to keep only the most important points. After filtering the points, we get the improvement shown below:
Notice how the images shown above show a significantly less number of points captured. And, compared to the Harris matrix, the points do align with "important" features, such as certain areas in the trees or edges on the buildings. We now do this for both images taken so we have two sets of points, one for each image:
We can already see that some points can potentially match to each other. However, there are still way too many points, currently well above 1k. So, we use ANMS (Adaptive Non-Maximal Suppression) to only retain key points that are well spread out across the image.
As shown above, the number of points have been significantly reduced, while still keeping the most important ones. And, the points are still at important locations, such as the windows or nearing certain other edges/features. A visual check can be used by comparing these results to the Harris matrix; the final points are all in those bright regions too. So, these points are good enough, and we move onto the next step.
Now, we use MOPS (Multiscale Oriented Patches) to extract feature descriptors for each point. We take in the coordinate points from the previous step. For each point, we first create a 40x40 pixel patch around it. Next, to save computation, I downsized each patch to 8x8 pixels, and I normalized the values in this small patch. Finally, I flatten the collected descriptors each into a 1x64 vector, and I return the results. Below are some of the 8x8 patches detected for the right image:
After getting the two feature descriptors for both images, we now need to match them.
Now, the points above still aren't exactly perfect. On closer inspection, we can already spot some outliers. For example, for the image on the right, it includes part of the biology building that is not present in the first image. Yet, the matching algorithm still captured some points in this right image region. So, now we use RANSAC to find the best homography between these two images, while also further filtering and keeping a better list of points.
After this step, we can see that the number of points have shrunk down even more. And, the outliers from the matching step have been removed. Each point from one image has an exact match in the other, and only these points are retained.
For this step, we simply reuse code from the first part in the project. The results are basically the same, since for the manually selected points, I was careful to only click certain prominent corners in each image. Below are the results of the three mosaics, comparing them with the first part where the points are manually selected:
This project was really interesting and fun. The first part was a good introduction to homographies and warping images. The theory was not as challenging, but seeing the results was really cool. However, the coolest thing is the second part of the project. I didn't get the change to replicate a paper before, and especially implementing such an automated algorithm from scratch was really cool. Specifically, I thought that implementing RANSAC was the most interesting part. I played around with the parameters a lot, but past a certain point the resulting mosaics would converge to the same good result. Overally, this project deepend my understanding of homographies and taught me how to implement a paper from scratch.