Patent No. 6581048 3-brain architecture for an intelligent descision and control system
Patent No. 6581048
3-brain architecture for an intelligent descision and control system (Werbos, Jun 17, 2003)
Abstract
A method and system for intelligent control of external devices using a mammalian brain-like structure having three parts. The method and system include a computer-implemented neural network system which is an extension of the model-based adaptive critic design and is applicable to real-time control (e.g., robotic control) and real-time distributed control. Additional uses include data visualization, data mining, and other tasks requiring complex analysis of inter-relationships between data.
Notes:
BACKGROUND
OF THE INVENTION
1. Field of the Invention
The present invention is directed to a neural network control system including,
in one embodiment, a computer-implemented method and apparatus using a computer-readable
medium to control a general-purpose computer to perform intelligent control.
2. Description of the Background
Science has been fascinated by the capabilities of the human mind, and many
have hypothesized on the process by which mammalian brains (and human brains
in particular) learn. When NSF first set up the Neuroengineering program in
1987, it was not motivated by any kind of desire to learn more about the brain
for its own sake. The program was set up as an exercise in engineering, as an
effort to develop more powerful information processing technology. The goal
was to understand what is really required to achieve brain-like capabilities
in solving real and difficult engineering problems, without imposing any constraints
on the mathematics and designs except for some very general constraints related
to computational feasibility. In a sense, this could be characterized as abstract,
general mathematical theory; however, these designs have been subjected to very
tough real-world empirical tests, in proving that they can effectively control
high-speed aircraft, chemical plants, cars and so on--empirical tests which
a lot of "models of learning" have never been confronted with.
More precisely, the Neuroengineering program began as an offshoot of the Lightwave
Technology (LWT) program at NSF. LWT was and is one of the foremost programs
in the U.S. supporting the most advanced research in optical technology. It
furthers the development and use of advanced optical fibers, lasers, holography,
optical interface technology, and so on, across a wide range of engineering
applications--communication, sensing, computing, recording, etc. Years ago,
several of the most advanced engineers in this field came to NSF and argued
that this kind of technology could be used to generate computing systems far
more powerful than conventional electronic computers.
The desktop computer has advanced remarkably over the computers of twenty years
ago. It is called a "fourth generation" computer, and its key is its Central
Processing Unit (CPU), the microchip inside which does all the real substantive
computing, one instruction at a time. A decade or two ago, advanced researchers
pursued a new kind of computer--the fifth generation computer, or "massively
parallel processor" (MPP) or "supercomputer." The MPP may contain hundreds or
thousands of CPU chips, all working in parallel, in one single box. In theory,
this permits far more computing horsepower per dollar; however, it requires
a new style of computer programming, different from the one-step-at-a-time FORTRAN
or C programming that most people know how to use. The U.S. government has spent
many millions of dollars trying to help people learn how to use the new style
of computer programming needed to exploit the power of these machines.
In the late 1980's, the optical engineering seemed to be a viable basis for
developing a sixth generation of computing, as far beyond the MPP as the MPP
is beyond the ordinary PC. Using lasers and holograms and such, it was believed
that a thousand to a million times more computing horsepower per dollar could
be produced compared to the best MPP. However, although skeptics agreed that
optical computing might be able to increase computing horsepower as claimed,
it would require a price. Using holograms, huge throughput can be achieved,
but very simple operations are required at each pixel of the holograms. This
requires replicating very simple operations performed over and over again in
a stereotyped kind of way, and the program is not easily replaced like a FORTRAN
program can be replaced or changed.
Carver Mead, from CalTech, then pointed out that the human brain itself uses
billions and billions of very simple units--like synapses or elements of a hologram--all
working in parallel. But the human brain is not a niche machine. It seems to
have a fairly general range of computing capability. Thus the human brain becomes
an existence proof, to show that one can indeed develop a fairly general range
of capabilities, using sixth generation computing hardware. The Neuroengineering
program was set up to follow through on this existence proof, by developing
the designs and programs to develop those capabilities. In developing these
designs, advances in neuroscience are used, but they are coupled to basic principles
of control theory, statistics and operations research.
However, sometimes terminology clouds advances in one area that are applicable
in another area. Some computational neuroscientists have built very precise
models that look like neural nets and use little circles and boxes representing
differential equations, local processing and so on. Other people use artificial
neural nets to accomplish technological goals. Further other scientists, including
psychologists, use yet another set of terminology. What is going on is that
there are three different validation criteria. In the computational neuroscience
people are asking, "Does it fit the circuit?" In connectionist cognitive science
they are asking, "Does it fit the behavior?" In our neuroengineering, people
are asking, "Does it work? Can it produce solutions to very challenging tasks?"
But in actuality, whatever really goes on in the brain has to pass all three
tests, not just one. Thus logic suggests a combination of all three validation
criteria is needed.
Present models must go beyond the typical test of whether or not a model can
produce an associative memory. The bottom line is that a new combination of
mathematics is needed.
Most of the engineering applications of artificial neural nets today are applications
of a very simple idea called supervised learning, shown in FIG. 2. Supervised
learning is a very simple idea: some inputs (X), which are really independent
variables, are plugged into a neural network, and a desired response or some
target (Y) is output. Some weights in the network, similar to synapse strengths,
are adapted in such a way that the actual outputs match the desired outputs,
across some range of examples. If properly trained, good results are obtained
in the future, when new data is applied to the network. These systems do have
practical applications, but they do not explain all the functioning of the brain.
To make things work in engineering a few components have to be added, above
and beyond cognition. A robot that does not move is not a very useful robot.
But even supervised learning by itself does have its uses.
For historical reasons, a majority of ANN applications today are based on the
old McCulloch-Pitts model of the neuron, shown in FIG. 3. According to this
model, the voltage in the cell membrane ("net") is just a weighted sum of the
inputs to the cell. The purpose of learning is simply to adjust these weights
or synapse strengths. The output of the cell is a simple function ("s") of the
voltage, a function whose graph is S-shaped or "sigmoidal." (For example, most
people now use the hyperbolic tangent function, tanh.) Those ANN applications
which are not based on the McCulloch-Pitts neuron are usually based on neuron
models which are even simpler, such as radial basis functions (Gaussians) or
"CMAC" (as described in D. White and D. Sofge, eds., "Handbook of Intelligent
Control," published by Van Nostrand, 1992; and W. T. Miller, R. Sutton &
P. Werbos (eds), "Neural Networks for Control," published by MIT Press, 1990).
Although in most applications today, the McCulloch-Pitts neurons are linked
together to form a "three-layered" structure, as shown in FIG. 4, where the
first (bottom) layer is really just the set of inputs to the network, it is
known that the brain is not so limited. But even this simple structure has a
lot of value in engineering. Further, there are some other concepts that have
arisen based on the study of neural networks: (1) all neural networks approximate
"nice" functions, (2) a four-layer MLP can be used for limited tracking control,
(3) as the number of inputs grow, the MLP does better, and (4) there is a speed
versus generalization dilemma. In "Universal approximation bounds for superpositions
of a sigmoidal function," IEEE Trans. Info. Theory 39(3) 930-945, 1993, A. R.
Barron showed that a simple three layered MLP can approximate any smooth function,
in an efficient way. Most people in engineering today will say that is the end
of the story, any smooth function, nothing else is needed. However, this structure
is not powerful enough to do all jobs. A broader concept of reinforcement learning
is needed.
Reinforcement learning has been a controversial idea in psychology. The reasons
for this are very strange. Back in the days of Skinner, he used to say that
this idea is too anthropomorphic, that it ascribes too much intelligence to
human beings and other animals. Nowadays many people are saying just the opposite--that
it's not purely cognitive enough (because it has motivation in there) and that
it's also too mechanistic. But in reality, it may be a good thing to pursue
an idea which is halfway between these two extremes. In any case, the problem
here for an engineer is straightforward. Assume there is a little person who
has a bunch of levers (labeled u.sub.l to u.sub.n) to control. The set of n
numbers forms a vector. Likewise, the person sees a bunch of light bulbs labeled
X.sub.l through X.sub.m, representing sensory input. Finally, there is something
that looks like a big thermometer which measures utility, U (not temperature).
The problem to be solved is as follows: find a computer program or neural net
design which can handle the job of the little person in this hypothetical. The
little person starts out knowing nothing at all about the connection between
the lights, the levers and the thermometer. He must somehow learn how these
things work, enough to come up with a strategy that maximizes the utility function
U over the long term future. This kind of reinforcement learning is not the
same as self-gratification. Although the function U can be thought of as a measure
of gratification, the problem here is more like a problem in delayed gratification.
The essence of the problem is not just to maximize this in the next instant.
The problem is to find a strategy over time to achieve whatever goals are built
into this U; these could be very sophisticated goals.
Almost any planning or policy management problem can be put into this framework.
An economist would say that this connection is very straightforward. If U is
chosen to represent net profits, then the learning task here--to maximize profits
over the long-term--encompasses quite a lot. The hypothetical may not be a good
higher order description of the brain, but it has been extremely productive
as a good first order motivator of engineering research.
There are a few other aspects of reinforcement learning of some importance to
understanding the brain. It turns out that a really powerful reinforcement learning
system can't be built if there is only one simple neural net. Modules within
modules within modules are needed, which is exciting, because that is also the
way the brain is believed to work. This is not like the AI systems where you
have an arbitrary kind of hierarchy. Instead, you have a lot of modules because
there are a lot of pieces that need to do this kind of task effectively over
time. Further, if a real engineering system is built that tries to learn how
to do this maximization task over time, then in order to make it work, human-style
control has to be added. For example, exploratory behavior appears necessary.
Without exploratory behavior, the system is going to get stuck; and it will
be a whole lot less than optimal. So there is a lot of behavior that people
do which is exploratory. Exploratory behavior is often called irrational, but
it appears useful if a human-like control system is to be built.
Another issue is that human beings sometimes get stuck in a rut. There are many
names for the ruts that humans get stuck in. Humans get stuck in less than optimal
patterns of behavior. Unfortunately, the same thing happens to ANNs as well.
They get stuck in things called local minima. If there were a mathematical way
to avoid local minima, in all situations, then it would be used. If there were
a mathematical way or a circuit way to keep the human brain from getting stuck
in a rut, nature would have implemented it too, but there isn't. It's just the
nature of complex nonlinear systems that in the real world have a certain danger
of falling into a local minimum, a rut. A certain amount of exploratory behavior
reduces that danger.
The bottom line here is that nobody needs to worry about an engineer building
a model so optimal that it is more optimal than the human brain could be. That's
the last thing to worry about, even though reinforcement learning may still
be a plausible first-order description of what the brain is doing, computationally.
A neurocontroller will be used hereinafter as a well defined mathematical system
containing a neural network whose output is actions designed to achieve results
over time. Whatever else is known about the brain as an information processing
system, clearly its outputs are actions. And clearly the function of the brain
as a whole system is to output actions.
For the brain as a computer, control is its function. To understand the components
of a computer, one must understand how they contribute to the function of the
whole system. In this case, the whole system is a neurocontroller. Therefore
the mathematics required to understand the brain are in fact the mathematics
of neurocontrol. Neurocontrol is a subset both of neuroengineering and of control
theory--the intersection of the two fields. The book, "Neural Networks for Control",
discussed supra, came from a workshop back in 1990 and really was the start
of this now organized field called neurocontrol. Later followed "Handbook of
Intelligent Control," discussed supra, which is still the best place to go to
find the core, fundamental mathematics, including all the equations. Also useful
as an introduction is "The Roots of Backpropagation: From Ordered Derivatives
to Neural Networks and Political Forecasting," by P. Werbos and published by
Wiley, 1994. Basically, it includes tutorials in the back explaining what backpropagation
is and what it really does. Backpropagation is a lot more general than the popularized
stuff. The book can help explain the basis for designs which use backpropagation
in a very sophisticated way. (Also, an abbreviated version of some of this material
appears in the chapter on back propagation in P. Werbos, Backpropagation, in
M. Arbib (ed) Handbook of Brain Theory and Neural Networks, MIT Press, 1995.)
Since 1992, there has been great progress in applying and extending these ideas.
See E. Fiesler and R. Beale, eds, Handbook of Neural Computation, Oxford U.
Press and IOP, 1996 for some of the developments in neurocontrol in general.
See P. Werbos, Intelligent control: Recent progress towards more brain-like
designs, Proc. IEEE, special issue, E. Gelenbe ed., 1996. for a current overview
of the more brain-like designs (and of some typographic errors in "Handbook
of Intelligent Control").
Neural networks have found three major uses: (1) copying expert using supervised
control, (2) following a path, setpoint, or reference model using direct inverse
control or neural adaptive control, and (3) providing optimal control over time
using backpropagation of utility (direct) adaptive critics. Thus cloning, tracking
and optimization make up the trilogy. Those are the kinds of capabilities that
can be used in engineering.
Cloning means something like cloning a preexisting expert, but this is not what
the brain does. There is some kind of learning in the brain based on imitating
other people, but it's nothing like the simple cloning designs used in engineering.
In fact, imitative behavior in human beings depends heavily on a lot of other
more fundamental capabilities which need to be understood first.
Tracking is the most popular form of control in engineering today. In fact,
many classical control engineers think that control means tracking, that they
are the same thing. This is not true. But a narrowly trained control specialist
thinks that control means tracking. An example of tracking is the monitoring
of a thermostat. There is a desired temperature, and you want to control the
furnace to make the real temperature in the room track the desired setpoint.
(The "setpoint" is the desired value for the variable which you are trying to
control.) Or you could have a robot arm, and a desired path that you want the
arm to follow. You want to control the motors so as to make the arm fit (track)the
desired path. A lot of engineering work goes into tracking. But the human brain
as a whole is not a tracking machine. We don't have anyone telling us where
our finger has to be every moment of the day. The essence of human intelligence
and learning is that we decide where we want our finger to go. Thus tracking
designs really do not make sense as a model of the brain.
FIG. 5 gives a simple-minded example of what is called direct inverse control--direct
tracking. The idea here is very simple: you want the robot hand to go to some
point in space, defined by the coordinates x.sub.1 and x.sub.2. You have control
over .theta..sub.1 and .theta..sub.2. You know that x.sub.1 and x.sub.2 are
functions of .theta..sub.1 and .theta..sub.2. If the function happens to be
invertible--and that's a big assumption!--then .theta..sub.1 and .theta..sub.2
are a function of x.sub.1 and x.sub.2. So what some robot people have done is
as follows; they will take a robot, and flail the arm around a little bit. They
will measure the x variables and the .theta. variables, and then they try to
use simple supervised learning to learn the mapping from the x's to the .theta.'s.
This approach does work--up to a point. If you do it in the obvious way, you
get errors of about 3%--too much for anybody to accept in real-world robotics.
If you are sophisticated, you can get the error down a lot lower. There are
a few robots out there that use this approach. But the approach has some real
limitations. One limitation is this assumption that the function has to be invertible;
among other things, this requires that the number of .theta. variables (degrees
of freedom) has to be exactly the same as the number of x variables. The other
thing is that there is no notion of minimizing pain or energy use. There have
been lots of studies by people like Kawato and Uno, and also a lot of work by
Mahoney from Cambridge University, who has done work on biomechanics. There
is lots and lots of work showing that the human arm movement system does have
some kind of optimization capability.
There are lots of degrees of freedom in the human arm, and nature does not throw
them out. Nature tries to exploit them to minimize pain, collision damage, whatever.
The point is that direct tracking models are simply not rich enough to explain
even the lowest level of arm control.
An interesting aspect of this is that there are lots of papers still out there
in the biology literature talking about learning the mapping from spatial coordinates
to motor coordinates. What I am saying is that this is only a metaphor. It is
not a workable system. Perhaps it is useful at times in descriptive analysis,
but it would be totally misleading to incorporate it into any kind of model
of learning.
In actuality, in neuroengineering, most people do not use direct inverse control,
even when they are trying to solve very simple tracking problems. There is another
approach called indirect adaptive control, where you try to solve a tracking
problem by minimizing tracking error in the next time period. This myopic approach
is now extremely popular in neuroengineering. But this approach tends to lead
to instabilities in complex real-world situations (using either ANNs or classical
nonneural designs). There are lots of theorems to prove that such designs are
stable, but the theorems require a lot of conditions that are hard to satisfy.
Because of these instability problems, I don't think that indirect adaptive
control is a plausible model of arm movement either. Furthermore, it still doesn't
account for the work of Kawato and Mahoney and such, who show some kind of optimization
capability over time. Therefore, I would claim that optimization over time is
the right way to model even the lowest level of motor control.
If you look back at the list of uses for neural networks, you will see that
there are two forms of optimization over time which have been used in practice
for reasonably large-scale problems in neuroengineering. (There are also a few
brute-force approaches used on much smaller-scale problems; these are obviously
not relevant here.) One of them is a direct form of optimization based entirely
on backpropagation. Direct optimization over time leads to a very stable, high-performance
controller. It has been used a whole lot in classical engineering and in neuroengineering
both. For example, I suspect that you will see it in ANNs in some Ford cars
in a couple of years. Nevertheless, the kind of stuff that you can do in the
brain is a little different from what you can do with microchips in a car. The
direct form of optimization requires calculations which make no sense at all
as a model of the brain. This leaves us with only one class of designs of real
importance to neuroscience--a class of designs which has sometimes been called
reinforcement learning, sometimes called adaptive critics, and sometimes called
approximate dynamic programming (ADP). Actually, these three terms do have different
histories and meanings; in a strict sense, the designs of real relevance are
those which can be described either as adaptive critics or as ADP designs.
The kind of optimization over time that I believe must be present in the brain
is a kind that I would call approximate dynamic programming (ADP). There is
only one other kind of optimization over time that anybody uses (the direct
approach), and that's not very brain-like. So this is the only thing we have
left. But what is dynamic programming?
Dynamic programming is the classic control theory method for maximizing utility
over time. Any control theorist will tell you that there is only one exact and
efficient method for maximizing utility over time in a general problem and that
is dynamic programming. FIG. 6 illustrates the basic idea of dynamic programming.
The incoming arrows represent the two things that you have to give to dynamic
programming before you can use it. First, you must give it the basic utility
function U. In other words, you must tell it what you want to maximize over
the long-term future. This is like a primary reinforcement signal, in psychology.
Second, you have to feed it a stochastic model of your environment. And then
it comes up with another function called a strategic utility function, J.
The basic theorem in dynamic programming is that this J function will always
exist if you have a complete state model. Maximizing J in the short term will
give you the strategy which maximizes U in the long term. Thus dynamic programming
translates a difficult problem in planning or optimization over time into a
much more straightforward problem in short term maximization.
If dynamic programming can solve any optimization problem over time, and account
for all kinds of noise and random disturbance, then why don't we use it all
the time? The real answer is very simple: it costs too much to implement in
most practical applications. It requires too many calculations. To run dynamic
programming on a large problem is too expensive. It just won't work. But there
is a solution to that problem, called approximation.
In Approximate Dynamic Programming (ADP), we build a neural net or a model to
approximate this function J. Thus instead of considering all possible functions
J, we do what you do if you are an economist building a prediction model. You
build a structure with some parameters in it and you try to adapt the parameters
to make it work. You specify a model or a network with weights in it, and you
try to adapt the weights to make this a good approximation to J. A neural network
which does that is called a Critic network. And if it adapts over time, if it
learns, we call it an adaptive critic. So right now in engineering we have almost
three synonyms. Approximate dynamic programming, adaptive critics, and reinforcement
learning--those are almost the same thing.
Based on all of this logic, I would conjecture that the human brain itself must
essentially be an adaptive critic system. At first glance, this may sound pretty
weird. How could there be dynamic programming going on inside the brain? What
would this idea mean in terms of folk psychology, our everyday experience of
what it feels like to be human? A good model of the brain should fit with our
personal experience of how the brain really works. That's part of the empirical
data. We don't want to ignore it. So does this theory make sense in terms of
folk psychology? I will argue that it does. I would like to give you a few examples
of where this J versus U duality comes in, in different kinds of intelligent
behavior.
Those of you who have followed artificial intelligence (AI) or chess playing
probably are aware that in computer chess the basic goal, the U, is to win the
game, and not to lose it. This is in computer chess, not in real chess, in computer
chess. But there is a little heuristic they teach beginners. They teach you
that a queen is worth 9 points, a castle is worth 5, and so on. You can compute
this kind of score on every move. This score has nothing to do with the rules
of the game. But people have learned that if you maximize your score in the
short term, that's the way to win in the long term.
When you get to be a good chess player, you learn to make a more accurate evaluation
of how well you are doing. For example, you learn to account for the value of
controlling the center of the board, regardless of how many pieces you have.
Studies suggest that the very best chess players are people who do really sophisticated
stuff, a really high quality strategic analysis of how good their position is
one move ahead. Those are the studies I've seen. So basically, this evaluation
score is like a J function. It's a measure of how well you are doing.
In animal learning, U is like primary reinforcement, the inborn kind of stuff.
It reminds me of the hypothalamus and the epithalamus. And J is like secondary
reinforcement, the learned stuff, learned reinforcers. U is like pleasure or
pain, an automatic kind of response, while J is like hope and fear. And in a
way all of this fancy theory is just saying hey, I think hope and fear is hard-wired
into the brain. We respond to hopes and fears from day one. Hopes and fears
drive everything we do and learn.
It turns out that this model also has parallels in physics. In fact, the Bellman
equation we use in dynamic programming is exactly what is called the Hamilton-Jacobi
equation in physics. If you read Bryson and Ho, Applied Optimal Control, Ginn,
1969, they even call it the Hamilton-Jacobi-Bellman equation. In physics, they
would say that the universe is maximizing a Lagrangian function instead of calling
it a utility function; thus they use the letter L instead of the letter U, but
it's the same equation. And it turns out that our J refers to something they
call "action." And the things we call "forces" in physics turn out to be the
gradient of the J function. (See F. Mandl, Introduction to Quantum Field Theory,
published by Wiley, 1959; and V. G. Makhankov, Yu. P. Rybakov and V. I. Sanyuk,
The Skyrme Model: Fundamentals, Methods, Applications, published by Springer-Verlag
(800-777-4643), 1993.)
SUMMARY
OF THE INVENTION
It is an object of the present invention to address at least one deficiency
in the intelligent control of external devices by using a new brain-like control
system.
Comments