A post by Yoni Yisraeli and Eyal Hashimshoni.
As a part of an assignment regarding image processing in our ‘Advanced Matlab’ course, we were interested in answering the everlasting question of our childhood – Where’s Waldo?
We set ahead to utilize state of the art computer-vision algorithms to locate Waldo in real Waldo-riddles. The challenge is great - even humans are often perplexed in attempts to locate Waldo in his typical colorful scenes which are often cluttered with details designed to distract the eye.
So dare we try automating the process and intelligently pin-point Waldo?
Step 1.0 - SURF
Unfortunetley, this worked poorly. SURF matching techniques are resilient to affine transformations but the Waldos we were searching for were not only mirrored or stretched but often had on them or rather were missing several items that the prototypical Waldo had (hat, waving hands); often Waldo was occluded and SURF did not seem to handle these very well.
Putatively matched points
Detected object (red lines enclose the, poorly detected, waldo)
Step 1.1 –Normalized Cross Correlation
Our next idea towards finding Waldo was relatively simple as well. We used Normalized Cross Correlation (NCC), a method often used in image processing to measure similarity of photos with brightness differences. Again, we compared a generic prototype of Waldo to a scene image containing Waldo using the normxcorr2 function. Given the normalized cross-correlation between the two images, we examined extremums of absolute cross-correlation.
NCC technique worked poorly as well, as we found that the appearance of Waldo greatly between scenes. So Waldo was still beyond our reach
Step 1.5 - Solving an easier task
To match numerous features between source and target, we have allowed a relatively low threshold to consider two feature as matched and hence the resulting match included some outlier-matches. To exclude these non-informative matches we have applied several iteration of RANSAC , chose the consensus set which included Waldo-only-matches and framed the matching areas with contentment.
Putatively matched pints (with outliers)
Detected box (red lines enclose the, real! , area)
Secondly, we re-applied the Normalized Cross Correlation (NCC) approach and compared between images of cropped Waldo and its corresponding scene image. This method was successful as well.
Step 2 - Cascade object detector
Following our success with a slightly easier problem we were encouraged to pursuit a more complex goal. We didn't want to compare Waldo riddles (target images) with Waldo's that were cropped from the same images - we wanted the real thing, finding any Waldo inside real, challenging, Waldo riddle. So we pulled out the heavy guns. So we puled out the heavy guns.
A cascade object detector (COD) is a sliding window which applies a sequence (a cascade) of comparisons between both positive (that should be found) and negative (that should not be found) features of a learning model of the desired object (Waldo). To consolidate the positive\negative features cascade, the COD should first be trained on a labeled (positive\negative) set of images. We have created a modest data-base of training images: a set of about 100 Waldo positive images containing Waldo were manually labeled with ROI's (regions of interest) via the Cascade Trainer GUI.
In addition, roughly 400 non-Waldo sceneries were uploaded to the GUI as negative images (most of them scene images that Waldo was cropped from).
We then trained our COD to extract Waldo (and non-Waldo) features. After some experimenting with several feature types (mainly with local binary filters), we found that the histogram of oriented gradients (HOG) yielded optimal results. And we were good to go.
A trained HOG-OCD was applied on a new testing set of Waldo riddles and yielded magnificent results. Even in extremely cluttered scenes (where the authors required up to 10 minutes to detect Waldo, see example below) the COD reliably detected Waldo's smiling face.
COD detection (blue transparent box)
However, in some of the riddles there was still false positive recognition of Waldo or no detecting of Waldo at all.
OCD false positives (left) and false negative (right)
A final touch
As mentioned, the COD's identification was not perfect. Mainly, our COD's set of parameters resulted a lot of false positives. The majority of the wrongly identified items were non-Waldo faces detected in the scene image. In an attempt to separate real Waldo's from his imposters, we routinely cropped all the COD-identified objects and tried several matching techniques.
We have tried to re-feed the cropped regions into the COD, resulting in eliminating the false positive images only in some of the cases. We then have tried our initial SURF matching between the prototypical Waldo and the cropped regions, but the results were still far from satisfactory. Another disappointment came from the NCC method which didn’t promote us towards our goal while we tried to find the maximum cross correlations between a generic Waldo and the cropped images.
Finally, we put our human intelligence into work and resolved an elegant solution. We have decided to cheat, and use a very prominent hint to classify Waldo. The glasses. Waldo's beautiful, perfectly rounded glasses. We have resized all the cropped detected regions into a standard size, applied a circle Hough transform to detect circles and constrained the discovered circles' radii. Finally, we classified instances which had two circles in the appropriate size as Waldoa and trashed the others.
Constrain bright circles radius
Conclusion. By applying a trained OCD detector on Waldo riddles we were able to successfully detect Waldo in cluttered scenes. To ensure reliable detection, we traded off accuracy with a high rate of false positives which we trimmed in a consecutive stage applying problem-specific knowledge.