Ohad Dan
  • About
  • Research
  • Teaching
  • Blog
  • About
  • Research
  • Teaching
  • Blog

​​

(Let  some computer-vision algorithms) Find Waldo

4/22/2016

 
A post by Yoni Yisraeli and Eyal Hashimshoni.
where's waldo
As a part of an assignment regarding image processing in our ‘Advanced Matlab’ course, we were interested in answering the everlasting question of our childhood – Where’s Waldo?

We set ahead to utilize state of the art computer-vision algorithms to locate Waldo in real Waldo-riddles. The challenge is great - even humans are often perplexed in attempts to locate Waldo in his typical colorful scenes which are often cluttered with details designed to distract the eye.
So dare we try  automating the process and intelligently  pin-point Waldo? 

​

Step 1.0 - SURF
where's waldo
In a first attempt to try and keep it simple, we have tried to find Waldo by matching speeded up robust features (SURF).  We chose a prototypical Waldo image, characterized a list of SURFs in it and  for each of our cluttered scenes , found its own SURFs as well. We then matched features between the two images by testing each feature of the prototypical image with all the features from the cluttered scenes and choosing a matching feature by minimizing coordinate-wise Euclidean distance. 

Unfortunetley, this worked poorly. SURF matching techniques are resilient to affine transformations but the Waldos we were searching for were not only mirrored or stretched but often had on them or rather were missing several items that the prototypical Waldo had (hat, waving hands); often Waldo was occluded and SURF did not seem to handle these very well.
SURF features matching
Putatively matched points
SURF identified match
 Detected object (red lines enclose the, poorly detected, waldo)
Step 1.1 –Normalized Cross Correlation ​
Our next idea towards finding Waldo was relatively simple as well. We used Normalized Cross Correlation (NCC), a method often used in image processing to measure similarity of photos with brightness differences. Again, we compared a generic prototype of Waldo to a scene image containing Waldo using the normxcorr2 function. Given the normalized cross-correlation between the two images, we examined extremums of absolute cross-correlation.
NCC technique worked poorly as well, as we found that the appearance of Waldo greatly between scenes. So Waldo was still beyond our reach
Step 1.5 - Solving an easier task
where's waldo
To convince ourselves the problem was solvable, we have decided to take a step back and try to first handle an easier task - finding in new pictures (target) the exact same Waldo (source) we cropped from the same Image (after locating Waldo manually). For our great delight - this worked out great. First, Our previous SURF matching procedure easily detected Waldo.
To match numerous features between source and target, we have allowed a relatively low threshold to consider two feature as matched and hence the resulting match included some outlier-matches. To exclude these non-informative matches we have applied several iteration of RANSAC , chose the consensus set which included Waldo-only-matches and framed the matching areas with contentment.
SURF matching with outliers
Putatively matched pints (with outliers)
Identified object
 Detected box (red lines enclose the, real! , area)
Secondly, we re-applied the Normalized Cross Correlation (NCC) approach and compared between images of cropped Waldo and its corresponding scene image. This method was successful as well.
​
Where's Waldo
Cluttered scene match
Step 2 - Cascade object detector
Following our success with a slightly easier problem we were encouraged to pursuit a more complex goal. We didn't want to compare Waldo riddles (target images) with Waldo's that were cropped from the same images - we wanted the real thing, finding any Waldo inside real, challenging, Waldo riddle. So we pulled out the heavy guns. So we puled out the heavy guns. 
A cascade object detector (COD) is a sliding window which applies a sequence (a cascade) of comparisons between both positive (that should be found) and negative (that should not be found) features of a learning model of the desired object (Waldo). To consolidate the positive\negative features cascade, the COD should first be trained on a labeled (positive\negative) set of images. We have created a modest data-base of training images: a set of about 100 Waldo positive images containing Waldo were manually labeled with ROI's (regions of interest) via the Cascade Trainer GUI. 
cascade object detector training ROI
In addition, roughly 400 non-Waldo sceneries were uploaded to the GUI as negative images (most of them scene images that Waldo was cropped from). 
cascade object detector negative samples
We then trained our COD to extract Waldo (and non-Waldo) features. After some experimenting with several feature types (mainly with local binary filters), we found that the histogram of oriented gradients (HOG) yielded optimal results.  And we were good to go.
A trained HOG-OCD was applied on a new testing set of  Waldo riddles and yielded magnificent results. Even in extremely cluttered scenes (where the authors required up to 10 minutes to detect Waldo, see example below) the COD reliably detected Waldo's smiling face.
Viola jones detection
COD detection (blue transparent box)
However, in some of the riddles there was still false positive recognition of Waldo or no detecting of Waldo at all.
viola jones false negative
OCD false positives (left) and false negative (right)
A final touch
As mentioned, the COD's identification was not perfect. Mainly, our COD's set of parameters resulted a lot of false positives. The majority of the wrongly identified items were non-Waldo faces detected in the scene image. In an attempt to separate real Waldo's from his imposters, we routinely cropped all the COD-identified objects and tried several matching techniques.
Viola jones false positive
​We have tried to re-feed the cropped regions into the COD, resulting in eliminating the false positive images only in some of the cases. We then have tried our initial SURF matching between the prototypical Waldo and the cropped regions, but the results were  still far from satisfactory. Another disappointment came from the NCC method which didn’t promote us towards our goal while we tried to find the maximum cross correlations between a generic Waldo and the cropped images. 
Picture
Finally, we put our human intelligence into work and resolved  an elegant solution. We have decided to cheat, and use a very prominent hint to classify Waldo. The glasses. Waldo's beautiful, perfectly rounded glasses.  We have resized all the cropped detected regions into a  standard size, applied a circle Hough transform to detect circles and constrained the discovered circles' radii. Finally, we classified instances which had two circles in the appropriate size as Waldoa and trashed the others. 

circle Hough transform
Constrain bright circles radius
Picture
Find Waldo
Conclusion. By applying a trained OCD detector on Waldo riddles we were able  to successfully detect Waldo in cluttered ​scenes. To ensure reliable detection, we traded off accuracy with a high rate of false positives which we trimmed in a consecutive stage applying problem-specific knowledge. 

    Matlab

    Archives

    February 2021
    June 2019
    January 2019
    February 2018
    January 2018
    February 2017
    June 2016
    May 2016
    April 2016
    March 2016

    Categories

    All
    Animation
    Art
    Computer Vision
    Graph-theory
    Image Processing
    Matlab

    RSS Feed

Powered by Create your own unique website with customizable templates.