My understanding is that he got VW to do all the hardware work (they built
the car and added all the sensors and computers). His focus was on the
software and project management. From a comment in some video I saw him
make, it sounded like the standard VW Touareg was a drive-by-wire car to
start with. That would mean the hardware work was adding the sensors and
extra computing power and interfacing to the control systems already in the
car. He made it clear in the video that his contribution to the project
was the software design. How much the VW group helped with the software I
have no clue.
Sensors and sensory processing are part and parcel of the same thing.
Sensor integration comes after that.
Yes, this is exactly the same idea I had 2 years or so ago [never
implemented !!] regards teaching my walkers how to walk. A simple sonar
"critic" feeds into a learning algorithm to evolve the gaits. EG, the
gaits on my walkers are generated by lookup tables of timed servo
movements. It would be fairly easy to have a higher-level program patch
new values into the lookup tables, based on some genetic algorithm or
other learning technique, in order to modify the servo movements. The
critic to guide the GA/learner would be the sonar input.
For instance, to learn how to stand up, you point the sonar towards the
ceiling, and set up the learning algorithm to fiddle the lookup table
values such that the desired goal is to reduce the distance to the
ceiling. Quite simple, really. Similar protocol for learning to walk,
once the bot has stood up successfully. This time, point the sonar
towards the wall ahead, and tell the learning algorithm the goal is
reduce the distance, once again. With 2 sonars, one pointed upwards and
the other ahead, you could get the bot to learn to walk forwards while
keeping the top of the bot fairly level at the same time. On and on.
No. This is just the basics we roboticists deal with everyday..
It might well make use of some short-term memory to keep it directed at
the proper goal, ie original light source, and not get distracted by
extraneous light sources that come along [eg, light coming though the
doorway it's passing by]. If it's too dumb, ie purely reactive, it'll
turn and go through the door, instead of following up on its orginal
goal. Here is the death knell of too-simplistic BBR.
I know from experience, because I implemented simple photovore behavior
in my hexapod, and it turns cutely and tracks light very nicely [thank
you], but it simple homes in on any "local maximum" it encounters, and
forgets which one specifically it was previously tracking.
Well, this is the challenge for small bots. Current computer vision
systems work best with supercomputer processing power to tap into. But
this thread is about directions for research.
You write that is would be fairly easy to do what you suggest
above so why haven't you done it Dan?
Of course the "critic" has to evolve as well in response to
the higher critic, reproductive success.
Stable mechanisms will not evolve, they have no motivation.
Unstable mechanisms have the potential to evolve but that is
only if by chance they have the right "goals" (requirements
for stability) that lead in that direction.
The advantage of having a motivating critic to get a robot
walking is that should it lose a leg while trekking mars it
will than start adapting its gait. Should it come across
a new kind of terrain it would adapt to that as well.
Of course a "critic" will never guarantee a solution is
possible for any particular mechanism.
For those that can't afford a real robot there is always the
simulated robot to test your theories.
Well, there are only so many hours in a day that I can assign for
play. Were I an academic, then I might be doing this on a paycheck :).
As it is, it's just for fun on the side.
This is a good point that Jordan Pollack discovered in his games with
evolutionary robotics, or whatever he calls it. If you just let the
algorithm crank, there is no telling what you might get as a solution.
Eg, if I were to use the algorithm plus sonar-critic to teach the
hexapod to walk forwards FIRST, instead of to stand up first, then it's
highly likely the gait would involve something like 1 or 2 front legs
literally dragging the rest of the passive and prone bot towards the
wall. Like an inch-worm. Unlikely this scheme would produce a nice
hexapod tripod or metachronal-wave gait.
OTOH, if I have the algorithm learning to stand first, and then make
use of that prior-knowledge during the walking phase, I'll probably get
a much better result. To me, this is the only way to go. To me, what
Curt usually proposes would take my bot 3.5 BY [of analogous computer
time] to learn a good solution. OTOH, if I help it a long a little, it
will produce a much better result many times quicker. IE, directed, and
opposed to purely random, evolution.
Ah, but the point is that the evolution _is_ directed and not random. It's
decent with modification - not just modification. Which means, any time if
finds something good, the "good" stuff stays around (decent), and is used
as the foundation for the next round. Learning systems that don't manage
to create a good decent with modification system are the ones that never
get very far. To continue to grow and advance, it must be able to build a
growing base of knowledge and not have to rebuild the entire knowledge base
from the ground up each time by random luck.
For example, on your walking machine, if it did learn to use two front legs
to drag the rest, it should then move on and find ways to modify the
behavior of the back legs, without forgetting the useful behaviors it
learned for the front legs.
Preventing new experience from erasing the valuable lessons learned in the
past is part of the trick of making evolutionary learning work. The way to
do that, is to generate behavior based on the current sensory context, and
to make learning, based on the same. When the sensory context changes, the
part of the machine which is currently being "trained" will change as well
(or the effects of training will be weighted differently at least). This
allows the old learned lessons, to remain mostly unchanged, while it's
learning a new lesson. The complex emergent behavior of the system is a
combination of all these little lessons it has learned over time.
Part of the trick to making that work, is giving it a good system for
evaluation the worth of each change so that progress can be constant. An
example of a bad critic, would be be on that taught it to walk by waiting
for it to move a fixed distance from the start, and then giving it a single
fixed sized reward for crossing the finish line. A good critic, will
reward it for every forward motion it made, and punish it for every
backward motion it made, such that the reward was always proportional to
the distance moved. Any small evolutionary change that helped it move
forward farther, or faster, would receive a greater reward.
In a walking machine, the behavior of each leg, needs to be a function, of
it's sensory awareness of the actions of all the other legs. For example,
a simple action like pushing a leg back might move the bot forward, and
that action might be rewarded. And pushing it forward, might move the bot
backwards, and be punished. So it would learn to push the legs back, and
never bring them forward. And the bot goes no where (but it at least
doesn't get punished for moving backwards). But, if the behavior, and
training, is made context sensitive, then it can learn to push it backwards
when lowered, and push it forward when raised. These two behaviors are
sensitive to the context of the current position of the leg. Learning to
push it forward when raised, is not erasing the previous learning that
happened when it was lowered.
Likewise, if the context is a function of the position of all the legs,
then it can develop a complex coordinated walking motion, one small step at
a time. One leg can learn to drag the body, and the others can learn to
respond to the motion of the first leg, in order to maximize forward
The point here is that it never has to learn a complex behavior (like 6
legged walking) by one very lucky random chance. The odds of it a complex
behavior by random chance is near zero. It's learning it by stringing
together 1000's of very trivial behaviors, each of which have very high
odds of being found quickly.
If you want to train a machine to output a sequence of 10 digits, and you
try to do it by letting it pick random combinations of digits until it gets
all 10 right, it will take a very long time for it to learn to produce the
right answer. But, if instead, you allow it to learn one digit at a time,
then it will find it very quickly, with each digit only taking around 10
guesses on average and all 10 only taking 100 guesses.
If you reward it based on the number of digits correct each time, you have
a problem somewhere between those two. It's not as easy as the second
because the system doesn't know which digits it's being rewarded for. It's
like a mastermind game (if you know that game). If the system tracks the
amount of reward, for each digit, in each position, it will find a
statistical correlation of more rewards for the right digit over time.
This system will be able to converge on the correct answer much faster,
than one which is only rewarded when all 10 digits are correct (but not as
fast as the one where each digit was trained separately). If the training
is able to reward closeness, or value, of a given behavior, then it can
converge on complex answers in reasonable time and not take billions of
years. It is just a matter of creating the training system (the critic)
and the learning techniques, to create these conditions where the system
can be directed to converge on a complex answer (as if it were being led by
the hand to the solution), instead of guessing it in one huge leap by
random chance. Large leaps of learning always take forever and will never
work. Learning is only workable when it can be broken up into many small
steps (decent with modification in small steps).
I more or less agree with what you've written here. Good learning
strategies vs bad strategies. Now, think of all the "innate" knowledge
the system has to contain in order to successfully implement your
"good" srategies, as compared to just doing everything randomly. All of
those good strategies require non-trivial protocols.
Eg, in regards my walker, a naiive solution, based upon a very
simplistic learning protocol might produce a result where only a single
leg learns to drag the walker along. This, I believe, is similar to the
kinds of solutions that Jordan Pollack was seeing. It's doubtful the
walker would learn a nice solution that would magically get all 6 legs
working together in a very efficient gait, like a tripod or metachronal
wave gait, by random chance. Similarly, for your 1-digit-at-a-time
OTOH, to implement your 1DAAT or an all-6 legs in synchrony protocol
will take a smart critic that oversees the results of previous learning
operations and strategically chooses the "correct" protocol for the
next learning stage. This is a non-trivial mechanism to implement, and
it does show up the weakness of raw generic learning protocols that
operate on the so-called tabula-rasa.
There are [at least] 2 approaches, as I see it. One is to implement
your temporal stage-sequencing "smart overseer". However, this thing is
already awfully intelligent itself, able to discern good results from
bad, at each stage, and then select the appropriate next stage ... and
somehow it needed to get that way. There's the rub.
The other way is the way I envision, which is to add the learning
modules atop a pre-existing mechanism that already performs the basic
operations, albeit non-optimally. IOW, use "genetically-determined"
gaits, which are very simple and probably non-optimal, and have the
learning module "fine-tune" the gaits to be more efficient and optimal.
Eg, with my controller, I can easily generate a basic gait in a couple
of minutes, by adding entries to the look-up table. However, it takes
me much longer to optimize the look-up parameters, even though there
are only 36 parameters for a 6-legged gait. Every small adjustment has
some effect on the overall result, and I can fiddle around seemingly
forever with hand-tuning.
In short, if I start with a basic gait, genetically-predtermined in
effect, and let the learning module make strategic adjustments to the
existing parameters, it should take a lot less time to produce an
efficient gait than if the system starts from zero in the first place.
This is an interesting problem that I think would be a fun challenge for
A lot of people seem to think that the only way learning can work is if the
"teacher" (aka critic) is as complex as the task it's trying to get the
"student" to learn. Meaning - that for a machine to learn something
complex like multileg walking, the "teacher" has to be more complex (or
nearly as complex) as the walking algorithm would be in the first place.
I think this fails to understand the power of the system learning how to
teach itself. This is the point of secondary reinforcement. It not only
learns how the behave, it also learns to train itself by making predictions
of value. As it learns more complex behaviors, it is also learning to make
more complex predictions of rewards - which it then uses to train the new
behaviors. So as it's learning complex behaviors to produce better
rewards, it's also becoming a more complex critic of itself. This is how
strong learning systems become stronger and continue to advance in
Nah, if done correctly, it starts out very dumb - which means there isn't a
lot of complex code that needs to be written. But as it learns behaviors
that work better, it's also becoming smarter at training itself. It's
intelligence grows, without us, as programmers ever having to create a
complex overseer. You only need to give it motivation - just as you
suggested - to find ways to move forward faster.
This stuff I'm talking about is already implemented and working in some
systems, like the TD-gammon backgammon playing program. The only reward
that system was giving, was winning. It has no clues about how to tell if
one move in the middle of the game might be better than another. It has no
"smart overseer" to give it hints about why one move was better than
another in the middle of the game. It only had a very dumb overseer who
gave it a "good boy" at the very end of the game when it managed to win.
But yet, as the program learned to play the game, it also developed its own
complex evaluation function. Given any game position, the evaluation
function could produce an estimated reward (aka the odds of winning the
game) for that position. So, as the game learned, it was also developing a
very intelligent "overseer". This is what it was in fact learning, how to
be a better "overseer". Nobody had to write a complex overseer to teach it
to play backgammon. It learned how to teach itself.
Yes, that's a very practical solution if you don't have strong learning
systems to work with. You do as much as you can with our limited human
minds, and then let the poor learning system extend the design a bit
further by making minor adjustments to our design.
But, if you had a strong learning system to start with, then you don't need
to start with anything other than a basic goal (make the bot move forward),
and let the learning system, step by step, create and improve the design.
Your walking problem seems like a very interesting test bed for learning
systems. My claim is that a strong learning system could in fact, on its
own, learn to produce complex gaits (in a reasonable amount of time),
without us having to give it 90% of the answer to start with. I should
work on that problem and see what I can come up with....
I don't look at it like that because any single data stream from a sensor
can be decomposed into multiple data streams and once you do that, you end
up with multiple sensory signals and the same sensory signal integration
problem you get if they came from different sensors. All "sensory
processing" is done by decomposing, and recombining data in an attempt to
remove the data you don't care about, and extract the data you do care
In effect, every sensory signal is feeding limited information about the
environment into the robot. But every signal will include both information
the robot cares about, or could make good use of, along with information
the robot probably has no need for (noise generated in the sensor for
The more sensors you have, the more paths there are for information about
the environment to flow into the robot, but the problem the robot has,
which is to find the data that is useful in those flows and ignore the data
which isn't is the same whether you are talking about one flow, or 100.
A sonar sensor for example in it's raw form is really two inputs. One input
that tells the robot when a ping is generated, and a return echo which
tells the robot when echoes are heard. The return echo is always a complex
temporal pattern of sounds as the ping bounces off multiple objects. But
most sonar sensors will perform some type of average or threshold detection
operation on that complex pattern and reduce all the information down to a
single distance measurement. This might all be done with analog circuitry
on the sensor because the data is even sent to a central processor. But
the hardware is already performing multiple decomposing, and combining
functions on the raw data in order to filter out most the data from the
sensor and reduce it to one small piece of data that robot can easily use
(a single distance measurement).
To me, the process of extracting one useful piece of data (a single
distance measurement) from those two complex temporal signals, is the same
generic problem that has to be solved at all levels of sensory signal
Ultimately, the robot must produce outputs, which are a complex function
all all the raw sensory data. Defining those functions is the problem of
how to program the robot. I don't see it as one step of sensor processing,
then another step of sensory integration, then anther step of output
pattern generation. I see it ultimately as one large step. We break it up
into small steps not because that's the best way to do it (and not because
it must be done that way), but because, that's the only way our limited
minds can comprehend how the bot is working. Sonar sensors are designed to
return a very simple limited distance measurement because that's something
we as programmers we can easily understand and write code to respond to.
If it instead simply returned a raw echo profile, it's not something we
could easily make use of. But that doesn't mean the raw echo profile
doesn't include a lot of good data that a more sophisticated data
processing system could make good use of.
Right. If you are coding such a system, you could think of the "memory
store" as weights that might exist in neural network. Long term memory
store are weights that are modified, and stay modified forever (creating
long term learning). Short term memory store are wights that can be
modified, but which always slowly return to their staring condition
(creating habituation). You could easily code a system that used a
combination of both. The shorter term memory would be modified quickly,
allowing the fish to learn to ignore the splash in only 10 trials. But the
long term memory would also be modified, but much slower. So if you come
back the next day, the with would have revered (almost) to the same
starting behavior. But, if you did this every day for a year, the long
term memory effect might eventually cause the fish to stop swimming towards
a splash all together.
Yes, I agree. It can correctly identify the sensory clues to distinguish
how the context you set up that day, was unique from other environments the
fish had experienced in the past. Better sensors of course help, but
better sensory processing algorithms I see as more important. There is a
lot of useful information in even simple sensors that most small bots fail
to make any use of. This includes the bot sensing what it is doing and
merging that with what it sensing externally. I suspect few small bots
make good use of what can be extracted from that data.
Interesting comment, Curt. My first reaction was, come on, you don't
know how poor these sensors are. Begrudgingly, I came around to suspect
you are right, and my recent experience supports you. I realized I have
been preparing to extract more information from my Sharp range sensors
for some time, and have not yet taken advantage of what I'm already
My mini-sumo base robot used in my university class is built on a
MarkIII chassis. In the mini-sumo competition, while the Sharps we used
were the analog output, all we have used so far is to threshold the
analog to say, there is or isn't something out there between 10 and
80cm. (Essentially the same as if we just had used the digital version
of the ranger.)
Since the class is hosted by the physics department, I laid the ground
work for taking more off the sensor. By reading the ranger, I got a
current range reading (even though its only use was as a comparison
point against a there/not-there range threshold. So I added a
differencing of each reading with the last, and thereby got a current
velocity. I differenced the new velocity to the old velocity and got a
current velocity reading. At that time I had not yet bothered to
linearize the Sharp which would have given real world significance and
scale to the numbers, so the readings were somewhat meaningless, but the
basis was there. (I just took the time to linearize the Sharp sensors on
my latest project, soccerbots for the Queen of Jordan's childrens
museum, and once mastered, I think the effort was well worth it, and the
code will undoubtedly migrate to the sum code next time I work on it.)
In other routines I had begung to compare the two adjacent (slight
horizontal displacement) parallel sensors. I checked the digital
conversion (target/no-target) and counted how long the detector had been
in the current state. This value was used in the sumo code. The detector
that last saw the opponent (left or right) determined which way the
robot turned to reacquire the target. On the analog side, I computed the
difference between the two current readings and an average of the
distance both saw.
Likewise, the simple digital edge detectors, looking for the presence of
the white line at the edge, were also further analyzed. I counted how
long the detector had been in the current state. These counts could be
used to know something about the angle the edge was approached by seeing
which one went on to the white line first. In pulling back from the
edge, a similar count comparison was used in the sumo code to decide
which way to turn around, to best position for an advance on the center
of the ring.
Now given our earlier conversation about the bump switches, this might
surprise you that I've made this much use of historical information on
two digital inputs. I will point out, this was done in a function
SENSOR-SCAN which was the first to run ahead of, and completely
independent of, my state based subsumption behavior routines. I see
nothing in Brooks that would approve of the choice, quite the opposite,
but I found it processor efficient to work out these somewhat
"statistical" details and have the processed information available to
(used in one or several or even none without matter) the later run state
Yeah, those are all good examples of how additional information is in the
sensor data which many times goes unused by simple hard-coded behaviors.
And much of it is in the temporal domain - meaning that when something last
happened (aka how long it has been since it happened) yields import
information for controlling the actions of the robot.
When we hand code these solutions, we write a limited number of if
statements to test sensor data and respond to. Like you example of testing
if the distance was >10 cm or < 80 cm (two simple IF statement tests).
This produces behavior which tends to be jerky and robotic like. That is,
when something is greater than 80 cm away, the robot behavior doesn't
change it all, it's as if the robot was totally unaware the thing existed.
When it crosses that 80 cm line, the behavior suddenly changes.
Humans and animals have very fluid and dynamic reactions to their
environment because they in effect, don't use 2 tests for data like that,
they might use 2000, or maybe 200,000. Every neuron in your brain is in
effect performing a simple conditional test for some very small but somehow
important sensory condition. To equal that type of complexity and fluid
reactions to the environment, we would have to hand-code thousands of tests
against the sensory data instead of the 2 you used in the first version of
your bot. This is just something that will never be practical for a human
to do. Like you have talked about, you might extract a few more bits of
data from the sensors and use 20 tests instead of 2. But creating a system
with 2000 subtle tests based on the current reading of a sonar distance
sensor is just something that isn't productive for a human to even try to
to. But it is the type of thing learning systems, like the brain, can
evolve, based on some goal the system is trying to reach.
The limit to how far we can hand-code reactive systems is the limit of what
we can understand. That I believe is what is slowing the advance of what
has been done with such systems. We need in effect to develop tools to
program the systems for us - and that's what a learning algorithm is - we
give it a goal, and the learning system evolves a complex design to
maximize whatever metric we choose to have it maximise.
Except of course, the species as a whole learned those as well through the
slower learning process of evolution. It's learning either way. They just
happen on different time scales. One example is the species learning as a
whole and the other is an individual learning in its life time.
Only if the programmer is able to know every possible environmental
condition the robot is going to experience in it's life - which they never
are. Which means, robots programmed like that only work well, in the
environments they were programmed to work in.
It seems I just duplicated a lot of what you already said in my previous
That is exactly why you need learning and why the programmer can't know
ahead a time how to answer these questions. If the bot is completing
against a bot the programmer has never seen before, it's unlikely that he
would have coded the correct amount of aggressiveness into his bot. But,
with context sensitive learning, the bot can learn to recognize the "blue"
bot and learn that trying to grab the ball away from it never works, so it
shouldn't bother trying. But it can learn that the red bot can't hold on to
shit and that he can grab the ball way every time. So if the red bot is
around holding a ball, it should always try to grab the ball from it. But,
it might also learn that the green bot is better than he is at grabbing the
ball from the red bot. So if the green bot is around, and is trying to
grab the ball, then he might as well go do something else that is likely to
be more productive.
All these priorities about which behavior is the most productive in
different environmental contexts, is something that must be learned (and
constantly adjusted) with experience, if you want to bot to act
"intelligently". And you will never make it work very well if that
"context" is defined by a hard-coded perception system. We need strong,
generic, statistical algorithms for merging all the sensory data, and using
that to select the best current behavior to produce.
Yes, you give them "frown" and "smile" hardware.
A strong reinforcement learning system, that had only a simple hard coded
reward like getting a ball in the goal, but which also had a strong context
sensing system could learn a large set of different behaviors what we as
humans would label as you did above. But internally, the bot only needs
one state or purpose - do whatever works best, in this context, to produce
the most expected future reward.
These systems, like I talked about in the other post, need to include a
strong internal, reward prediction system. If you take the output of that
internal reward prediction system, and wire it to an external signal, like
your lights, then you have simple smile and frown hardware that other
intelligent bots could learn to pick up on as you suggested.
Yeah, I think your example shows how good even simple animals are at
learning. Though a fish does have much better sensors than our typical
small bots (probably even better that most high end multimillion dollar
bots), that really isn't the big problem. The big problem is our bots
aren't making good use of the sensors they have - which means they are
making good use of the data coming from the sensors they have.
How for example, did the fish learn so quickly to stop chasing the pennies,
even when many never had the chance to try and eat the penny? I think your
answer about fish story telling was not as far off as you might have
After a life time of eating with other fish, the fish have probably learned
to read the behavior of the other fish. When they all swim towards a
splash that seem like it could be food, the first fish that taste the food,
and decide it's not food, probably behave very differently, than when it
was real food. Next time, take some bread and throw in chunks and watch
how they group reacts. You will probably notice a lot of extra splashing
and fighting over the real food that never happened with the penny.
The other fish, the ones that never made it to the food, or the penny, can
no doubt sense this difference. Just like we could. If they see a lot of
splashing and fighting going on, they know it's real food, and they join to
fight. If they don't see it, they know either it's fake food, or the food's
all gone, and there was now reason to swim over there. The behavior of the
mob is telling a story to the entire group, about whether there is food,
and indirectly, about how much food there is.
So every time you throw the penny in, and the fish swim towards it, and see
no resulting mob fighting, they know the splash was not food, even though
they never got close enough to test the food for themselves.
This type of behavior is easily explained in terms of reinforcement
combined with internal systems that predict rewards. The splash is a
predictor of reward (because the fish has many times in the past received a
reward from real food after sensing a near by splash). One the fish senses
the splash, not only does it trigger the behavior of swimming towards the
splash, it also causes the internal systems to increase the prediction of a
future reward. That increase, by itself, acts as a reinforcer to reward
whatever behaviors the fish was doing at the time it heard the splash
(swimming near the bridge for example). So, the splash itself, acts as
reward to encourage the fish to continue to swim near the bridge.
But, as the fish swims towards the splash, and then, the expected fighting
by the mob over the food doesn't happen, this causes the internal reward
predictions to drop (oh, we probably aren't going to get any food). That
drop in the internal reward prediction, acts as a punishment for the
behavior in action - swimming towards the sound of the splash.
This is how each time you throw a penny in, the fish are being taught a
lesson to now swim towards the splash.
But, if they learn so quickly (10 examples in 10 minutes) to not swim
towards the splash, is this not making the fish forget everything it knows
about getting food by swimming towards a splash? Well, if the sensory
systems could not tell the difference between that splash, in that place,
at that type of day, from all the other splashes, then sure, it would
quickly forget everything it knew about getting food by swimming towards a
splash. But if the sensory perception, is advanced enough, to tell the
difference between that splash, and other splashes it's experienced in the
past, then what it's learning, is to not swim towards that type of splash,
in this place (under the bridge in the afternoon when bot programmers tend
to throw pennies at them instead of early in the morning when the kids from
the school waiting for the buss throw crackers to them).
This is what the "better" sensory processing is needed for that you talked
about. We need automatic systems that can analyze, and combine, all the
sensory data from different sensors, and create the correct "context" for
associating the current rewards, with the current environmental context.
When the fish is being thrown crackers, the context can be different in
many ways. The daylight levels might be different, the precise sound of
the splash might be different, the number of other fish around might be
different. He might be in a different part of the stream. He might have
to swim harder because there's more current in the water. The fish might
be more hungry in the morning when the crackers tend to be thrown.
Anything the fish can sense, through any sensor, is helping to establish
the current environmental context. And with some contexts, the fish gets
more food than others. In some contexts, swimming towards a splash tends
to reward the fish with food, and in other contexts (which might be very
similar, but different in some detectable way) swimming towards the splash
doesn't tend to be rewarded with food.
My point here is there is 1), to show the habituation, the machine has to
have learning - it must be changing it's behavior in response to the
current context. Most simple bots have little or no learning coded in them
so they have no hope of acting like the fish did. 2), It must have strong
generic sensory processing that can analyze all the sensory data combined,
to identify the unique differences between which contexts lead to rewards,
and which don't. This is a complex statistical process - not something you
can make work by hand-coding behaviors (well, unless you have millions of
years to work with like evolution). And 3), the system must include reward
prediction, and those predictions will modify behavior - they will be the
true source of the changes to behavior. This is how the fish learns to
"read" the behavior of the other fish, without having to have that skill
hard coded into it by the creator (evolution for the fish, human
programmers for bots). It learns to read the behavior of the fish, just
like it learns to read everything about the environmental context. Some
contexts are predictors of rewards, and some are not. It must learn,
through experience, to tell the difference. Then, with a good reward
prediction system helping to shape behaviors, any behavior which causes the
bot to increase the prediction of future rewards, will be instantly
rewarded. No need to wait for actual food to show up to act as a hard
coded reward of the behavior.
If you want to make a bot act anything like fish (or any animal that learns
from experience), that is the type of system it must have built into it.
You can't do it by hard coding behaviors into the machine because unless
the programmer is there to teach it the difference between the sound of a
penny hitting the water, and a cookie hitting the water, it will never
learn to ignore the splash create by a penny. You instead, as the
programmer, hard code the result you want to bot to achieve (get real food
in it's stomach), as a hard-coded reward. Then you use a statical based
system to do the associations between the multimodal sensory contexts, and
the actions that work in each context for producing rewards, and the
behaviors that don't work.
This is how intelligent machines are going to work. The only unknown, is
the implementation details of how such a machine, is best able to take
multiple sensory inputs, and define the "context", and map each context, to
behavior. But it's going to be modality independent because
whatever mathematical techniques work for correlating sensory data to
actions is going to work well no matter what the sensory data "means" or
what the actions "mean". At this level, the only "meaning" that's
important, is how much reward does each mapping produce - i.e., how "good"
is each behavior, for each sensory context, based on past experience.
As a foot note, let me add I don't know anything about fish. It's possible
that fish for example are born hard wired with the ability to recognize the
activity of fish fighting over food as a reward. They might not have
learned it in their life though experience. They might also be born with
the behavior of swimming towards a splash instead of having to learn it.
And their learning to avoid the penny, might be only temporary. If you
were to stop throwing pennies for 30 minutes, you might find they had
forgotten all they learned, and were back to the old behavior of swimming
like mad towards the splash (i.e., it was a form a habituation instead of
long term learning). These are all variations of how evolution might have
implemented learning in fish to maximize its chance of survival. But these
and many other variations would would be easy to code, if we simply had
better generic learning algorithms to start with. Which we don't. Yet.
And that's what we are missing to be able to create more intelligent
behaviors in bots and all our machines.
I agree with you that we are lacking strong enough sensory perception code
to make use of more complex multiple modality sensory data. But I already
know what the code needs to do. It's not just a perception problem - it's
a generic statistical learning problem that maps sensory data to actions
(aka BBR), and adjusts the priorities of those mappings (learning), though
experience. It learns to abort the swimming behavior quicker and quicker
in that special sensory context defined by the sound of a penny splashing
in the water, in that stream, on that day, in that location. And BTW, the
reason it is "aborting" the swimming behavior, is simply because other
behaviors (like "hide in the rocks to protect yourself from predators") is
becoming the more dominate behavior in that context instead of "swim
towards splash". Each time the behavior is used and it fails to produce
the expected rewards, it gets demoted in that context. The system is
always picking the behaviors that seems like they will produce the most
rewards. When there is no sign of food around, the behavior that is the
most likely to produce the best rewards is the "hide in rocks" behavior, or
maybe, "look for mate", etc.
As Gordon pointed out, this was probably not really learning, as the
fish probably will react the same way the next day. Rather the changed
behavior was some sort of short-term memory and/or habituation. BTW,
ever hear the one about the goldfish ...
"Goldfish are said to have such short memories that every trip around
the bowl is a new experience"
How else to live in a bowl, and now go totally crazy.
Unfortunately bread makes no noise when it hits the water, and
immediately floats down stream, in that current. OTOH, I am sure the
fish in the creek were especially tuned to the particular "class" of
sound produced by the penny hitting the surface. Sharp and strong,
maybe like a dragonfly lighting down, etc.
This is rather interesting, because it's several levels more advanced
than a single fish sensing and perceiving a good stimulus [sound
strike] from 20' away - what I have been talking about. Here the fish
are able to both perceive the behaviors of their neighbors, and also
react in some appropriate manner.
You realize that, before they can do this - engage in complex social
behavior - FIRST they need the adequate perceptual mechanisms.
No doubt, something like that.
Yes, there are several problems here to be solved. One is better
sensors, and the other is sensory integration, and 3rd one [hiding in
the background] is the learning.
this last is why we have computers.
Yes, as mentioned. First the adequate perceptual capability, then the
other stuff. Regards evolution, one presumes this is how it happened.
IE, indiviudal fishes needed the sensory-action mechanisms before they
could properly use these in the context of interacting with other
fishes in large schools, etc.
You can give the bot, via both mechanical design and coded routines,
the basic ability to generate actions, and then use the other stuff to
subsume control over these actions. Layer upon layer of control.
Frogs are born with the ability to automatically snap at flies that fly
past - courtesy of evolution. Many [or most] habitual behaviorss in
lower animals are generally held to be instinctual, rather than learned
There are many ways that habituation to unproductive stimuli can be
wired in. It may simply be that, after making several warp-speed
attacks, lactic acid builds up in the fish muscles and doesn't quickly
dissipate, and fish aggressive behavior is geared inversely to the
level of lactic acid. Not much to do with learning or "memory" as
you're positing it. Etc.
In fact, what I think you have been clearly illustrating is that it's
really a lot easier to code behavior and learning than it is to
replicate good perceptual processing. Despite what you say in the next
Yes, exactly. But I see that perceptual mechanism as the same mechanism
needed to extract useful behavior from multiple sensory signals.
I don't really understand what you are thinking when you write that. What
do you mean by better? Are you talking cheaper sensors? Lower power?
Smaller? What exactly is wrong with any of the sensors we have now? I
don't see why we have a problem in that regard. The only issue is that
they cost to much for hobby work - both the high quality sensors and the
processing power to deal with the data. We have better sensors in terms of
what they can sense than just about all the senors in animals (except maybe
chemical sensors). So I don't get what your point is.
Or are you talking about the processing of the sensor data into an easier
to use form when you say better sensor?
I believe that processing raw sensor data in to a better form and sensory
integration are one and the same problems for example. For example, an eye
can be looked at as a million light sensors. To process that data into a
signal that represents "i see a cat", is a sensory data integration problem
to me. Automatic correlation of any two sensory signals should work the
same basic way whether it's two light sensors which are part of a bigger
eye or light sensor data and sonar distance data being integrated. It's
the same problem of correlating temporal signals and extracting the useful
information either way.
If you had the type of generic learning system I talk about working, then
there's no reason you wouldn't layer it on top of hard-coded behaviors
which you knew were a good starting point for what you wanted the bot to
do. The dynamic learning system would simply be configured to adjust and
override the instinctive behaviors as needed (and there might be some it
couldn't override). This would both reduce the amount of work the learning
system had to do as well decrease the time it takes for the system to
become good at some task. There is no end to the engineering options
available to optimize the design to fit the needs of the application. What
I feel we are most missing is that strong generic learning that can be
added at any point of a design as needed. Hard coding behaviors is
straight forward coding that any good programmer can do. Knowing what
behaviors to hard code is the harder part. The point of the learning
system is to do the real time testing to discover the behaviors that work
best. But that can only work if you can produce an automated test for
success (aka the critic). But many times, you can easily automate the test
for success and that's where dynamic learning systems would be very useful.
Polytechforum.com is a website by engineers for engineers. It is not affiliated with any of manufacturers or vendors discussed here.
All logos and trade names are the property of their respective owners.