|
HowSignfinderWorks
System OverviewThe following image describes the different modules in signFinder, and the data-flow between them. On this wiki-page, each of the modules, their internal working, and their location in the code-base will be described. Color-histogram matchingThe signFinder street-sign detection is largely based on color-histogram matching. The principle is simple: From images in a trainset, the colors that are in a streetsign (positive sample) and the colors that aren't in a streetsign (negative sample) are counted. If a specific color occurs sufficiently more often in a streetsign-sample than in a non-streetsign-sample, pixels with that color are marked as belonging to a street sign. The color-histogram matcher reads positive and negative color histogram samples from the postHist.hist and negHist.hist files as generated by the trainer executable (See also TrainingOtherCountries). These binned histograms basically contain color-counts: How often has a specific color occurred in a street-sign (positive sample), and how often has a specific color occurred outside a street-sign (negative sample). Each colored pixel is compared against the color histogram. It's marked positive if it's color contains at least THRESHOLD times more often in the positive sample than in the negative sample. Current threshold (as of 17/08) is 0.19. This is on the low side, but can be explained by the fact that area's outside the streetsigns in the training image are often much larger than area's inside the image. Moreover, the histogram matching is intentionally positively-biased. The Blob classification step deals with an excess of positively marked pixels. This results in an binary image with the same size as the original image, marking the positively-marked (white) and negatively-marked(black) pixels. The histogram-matcher is accessed in the SignFinder::histMatch() function in the signFinder class (signFinder.cpp). The histograms are loaded in the _posHist and _negHist variables on object creation. This function uses the histogram-matcher in ./lib/histogramtool Blob DetectionThe binary image from the color-histogram matching stage contains a number of connected area's (area's where you could get from every place inside the area to another, by only crossing white pixels), which could potentially contain street-signs. It's non-trivial (and computationally expensive) to detect these different area's or 'blobs'. These blobs are detected with 'bloblib' residing in ./lib/bloblib . The blob-detector is accessed in the SignFinder::readSigns() main entry function by calling the CBlobResult() constructor. Blob ClassificationThe Blob-detector offers a list of connected area's or blobs, each of which could potentially be a street-sign. However many blobs are just misdetected patches of sky, misdetected other objects, or just blobs consisting of a few pixels marking an artifact that just happens to have the right color. From all these blobs, we have to select the blobs that are likely to contain a street-sign. We classify the blobs based on the following characteristics:
The code to generate characteristics over blobs, display these, and filter them can be found in the SignFinder::classifyBlobs() function in signFinder.cpp . This function also compares detected blobs with known-correct labels in order to measure performance of the sign-detecting portion of the system. More about that in the last chapter of this document. Consider the following image:
This image shows the area's that have been matched with the histogram matcher in blue. Blobs that have been classified as containing street-signs are colored. These images are written in <file>_blob.jpg by the signFinder executable if the -v command-line option is used. Convex HullFrom the Blob classifier, we receive a list of blobs that we assume to contain a street-sign. However, a blob doesn't provide clear image-segmentation: It's not much more than a bunch of connected pixels, containing large holes or possibly being 'bitten out' in a part of the sign where the lighting is different. This is rectified by calculating the convex hull over the positively classified pixels in the blob. A convex hull is a line around the outer pixels of the detected blob, which doesn't form any sharp angle with the previous line segment. If the blob is of sufficient quality, the convex hull forms a nice line around the segmented street sign. This convex hull is the colored line drawn around the street-signs in the result image. The convex-hull is retrieved and drawn by the SignFinder::drawConvexHull() function in signFinder.cpp Corner FinderThe convex-hull basically consists of an counter-clockwise ordered array of points, describing the outer surface of the blob. In order to cut out the image and perform perspective correction, we need to find the corners of this convex hull. This is done as following: A new image is created, consisting of nothing but a filled convex hull. The image is smoothed to remove any artificial corners created by diagonal borders. A OpenCV good-features-too-track function finds the corners. We won't go into too much technical detail here, but 'good features to track' is a method that finds characteristic corners in an image. Not all corners are sharp. A round corner of a street-sign actually consists of many shallow corners. to prevent the good-features-to-track function from finding many corners in one round corner (and none of the others, since it has already found four corners), we use the constraint that two corners should be separated by at least 0.75 times the height of the sign. As the corners arrive from the good-features-to-track function, they're not ordered in any way. In order to organize them counter-clockwise and separate, say, the upper-left corner from the lower-right corner, we use the convex hull once again. We iterate over the points of the convex-hull, adding each of the found corners to an ordered array as we encounter their closest counterparts in the convex hull. The corner finder resides in the ./modules/CornerFinder.cpp file. Cut & Perspective CorrectionTo be able to read the street-sign as reliably as possible, we want the street sign to look as if we're standing right in front of it, even though the picture might have been taken under an angle. This can be done with perspective correction. A new image with the size of the street-sign is created. OpenCV has functionality to perform perspective correction, by simply giving the old and the new coordinates of the four corners. In our case: The old coordinates are given by the corner-finder, and the new coordinates are simply the corners of the new image. The image-source for the cutting operation is the original RGB picture. The streetsign-cutter resides in the cutSign() function in the ./modules/SignHandler.cpp file. Tesseract OCRThe Tesseract OCR engine is a Open-source Optical Character Recognition engine. It accepts greyscale .tif images, and writes the results to a file. In order to incorporate the tesseract engine in the source, a wrapper was used. This wrapper can be found in ./modules/OCRWrapper.cpp . If the tesseract-executable cannot be found, or doesn't execute correctly, a warning message is displayed and text recognition isn't performed. tesseract is configured to use the Dutch (NLD) language file. Moreover, the tesseract configuration file in ./modules/signOCR.conf limits the characterset to upper- and lowercase western alphabet. While tesseract uses greyscale images, a color RGB-image is what's received from the sign-cutter. Since we're detecting dutch street-sign images which have a blue background, the images have a relatively bright background in the blue color-channel, and a relative dim background in the red and green color channel. Contrast can be enhanced greatly by converting to greyscale over just the red channel, and this has proven to significantly improve OCR performance. street-name heuristicsthe OCR wrapper receives plain-text as detected by the tesseract OCR engine. However, we know a number of things about the structure of dutch street-names that can help us to improve performance. For example: For dutch street signs, the first character (and only the first character) of words is uppercase, or they're entirely uppercase. If we encounter an captical I in the middle of a word, it should probably be a lowercase l. Moreover, we don't expect to see words consisting of just one character, and names often start with 'De ' or 'Het '. The software is employing these heuristics to improve OCR performance:
These simple heuristics manage to catch the majority of common OCR mistakes. These simple heuristics live in the removeOneCharWords(),detectWrongCapitalI() and correctLidWoorden() functions in the ./modules/OCRWrapper.cpp file.
Image showing the colored convex hull, the yellow-marked found corners, the cut-out signs, and the OCRed text next to the upper-right found corner of the street signs. Performance MeasurementIf known-correct labels are present, signFinder automatically measures and reports its own performance. This is useful, as it allows us to quickly ascertain performance after changes in method, trainset, or testset. All performance-measurement specific code can be found in the ./modules/TestHandler.cpp source file. On termination, the signFinder executable shows performance metrics if available. This behavior can be disabled with the -p command-line option. Street-sign detectionThe performance measurements for street-sign detection uses the same <file>_mask.png files as that the trainer uses. These masks can be generated with the maskMaker tool as described in the TrainingOtherCountries document. The checkLabeledBlobs() function on the TestHandler.cpp source file is the entry-point for the performance measurement of the street-sign detection system. It calculates true positives (signs that the software has detected, that are actually signs), false positives (signs that the software has detected, but aren't actually signs), and multiple detections (detected signs that lay within a bigger detection). It generates this information, given the Blob-list of detected streetsigns, and the filename of the original image so it can find the labels. Judging whether a detected street sign is in fact a correct detection is a complex process. The function compares each detected street sign with each present label. If a detection overlaps at least 90% of the known-correct label and the detection doesn't extend more than 25% beyond the area that the known-correct label describes, it's considered a correct detection. Otherwise it's considered a false detection. Street-sign readingPerformance measurement for the street-sign reading is performed if <file>.txt files are available, describing the text on each of the street signs, one on each line. Examples are available here. The percentage of the signs that could be detected that are successfully read is returned. Moreover, the average amount number of edits required to get to the correct word or Levenshtein-distance per word is provided. |