Matt Bamberger - Pattern Recognition, Part 1

Pattern Recognition, Part 1

Tue, 02/07/2006 at 11:25

What I'm working on

I've been promising for a while now that I'd talk a bit about my recent Optical Character Recognition project. I spent a few months last fall working on a simple OCR project, and although I never finished it, I learned a great deal from it (which was the point of the whole exercise). I chose OCR for a number of reasons:

  • 1. I wanted to understand why AI is hard. On the face of it, OCR doesn't seem like a particularly hard problem, and yet it remains only partly solved, despite the fact that a great deal of effort has been put into solving it. Working on a popular problem has the added benefit that it's easy to benchmark your work against that of many other people.
  • 2. I wanted to work on a core AI problem. For reasons that I'll get into later on, I believe that pattern recognition is a fundamental part of human intelligence, and OCR is all about pattern recognition.
  • 3. OCR is a very tractable problem. It's small enough that one person can make substantial progress on it in a short period of time, and it deals with a kind of input (scanned images) that's very easy to produce, understand and reproduce.
  • 4. OCR very quickly subjects you to all kinds of real-world problems. It's easy to come up with sweeping theories of how AI works, but such theories tend to disintegrate when faced with the noisy, messy world in which we actually live. OCR subjects you to a lot of those problems up front, which helps quickly weed out promising but useless approaches. In general, I wanted a project where I could fail quickly.

Before we get started, let me just say that although I'm covering well-trodden ground, I'm almost completely ignorant of the extensive body of literature that exists on this topic. I don't see that as an entirely bad thing: given the rather dismal track record of traditional AI research, it's pretty clear that success in building AI is going to come from developing new approaches, rather than understanding existing research. With that out of the way, let's get on with it.

Let's begin with a definition of "pattern recognition". I claim that pattern recognition boils down to answering the question "How similar is thing A to thing B?" There are several different flavors of that question, but they all boil down to the same thing:

  • "Which known thing is thing A most similar to?" For example, "This is a picture of Bob."
  • "How is thing A different from thing B?" For example, "Bob has a new haircut."
  • "Here's a new thing. Remember it." For example, "Hello, Bob."

Jeff Hawkins, among others, has argued that pattern recognition is the core algorithm underlying human intelligence. His basic theory is the human neocortex is composed of a set of general-purpose pattern-recognition units arranged hierarchically. The basic idea is that raw sensory data enters the lowest level in the system, which recognizes certain patterns in it, and passes those patterns up to the next level. That level, in turn, recognizes certain higher-level patterns, which it passes to the next level.

My OCR project was essentially an application of a Hawkins-like pattern recognition system to the problem of recognizing printed or hand-written text. At a very high level, the architecture looks something like this:

  • Raw visual input (i.e., a bitmap) is fed into a special-purpose visual system which identifies lines.
  • Data about the lines in the image is passed up to the character recognition system, which uses a generic pattern recognizer to identify characters composed of the input lines.
  • Data about the characters in the image is passed up to the word recognition system, which uses the same generic pattern recognizer to identify words composed of the input characters.

A key feature of the system is that information flows up and down the system. The lower levels pass abstracted patterns up to the upper levels, and the upper levels query the lower levels and give them feedback. For example, the character recognizer might tell the word recognizer "I see the letters 'C', 'A', 'T', 'E', 'I', 'S', 'H'". The word recognizer might in turn ask the character recognizer "That doesn't seem very likely. Could the 'E' actually be an 'F'?", or even "Hey-- that 'E' is really an 'F'. Remember that the next time you see a character that looks like that."

Similarly, the character recognizer would have a similar conversation with the line recognizer: "Hey, those two short horizontal lines that almost meet end to end: could they really be a single long line?"

The idea behind an architecture like this is that it's adaptable to a wide range of problems. For example, a music recognition system might look like:

  • A special-purpose auditory system takes .wav files from the ears and breaks them into low-level sound units (for example, tone & volume information).
  • A generic pattern recognizer takes that information, and recognizes notes and intervals played by specific instruments.
  • A generic pattern recognizer takes the note & interval information and recognizes known melodies.

A face recognition system might look like:

  • A generic pattern recognizer takes basic geometric data from the visual system (which likely has already been through several layers of processing), and identifies specific facial features like noses and eyes.
  • A generic pattern recognizer takes the information about facial features and recognizes specific individuals.
  • A different pattern recognizer takes the same information about facial features and recognizes specific moods / expressions.

Obviously, the theory is seductive. Write a single generic pattern recognizer, and with the application of a little glue, you can solve a wide range of interesting problems. You can potentially apply the recognizer to much higher-level problems also:

  • "Does this plant look like something that's good to eat?"
  • "Are the animals at this watering hole behaving the way they did right before that lion sprang out of the grass and ate Bob last week?"
  • "What disease does this set of symptoms most remind me of?"

Super-cool! The AI problem is practically solved if we can just write a generic pattern recognizer. How hard can it be?

Next up: defining the OCR problem.