the Fish and the Robot



My understanding is that he got VW to do all the hardware work (they built the car and added all the sensors and computers). His focus was on the software and project management. From a comment in some video I saw him make, it sounded like the standard VW Touareg was a drive-by-wire car to start with. That would mean the hardware work was adding the sensors and extra computing power and interfacing to the control systems already in the car. He made it clear in the video that his contribution to the project was the software design. How much the VW group helped with the software I have no clue.
--
Curt Welch http://CurtWelch.Com /
snipped-for-privacy@kcwc.com http://NewsReader.Com /
  Click to see the full signature.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
Curt Welch wrote:

Sensors and sensory processing are part and parcel of the same thing. Sensor integration comes after that.

Yes, this is exactly the same idea I had 2 years or so ago [never implemented !!] regards teaching my walkers how to walk. A simple sonar "critic" feeds into a learning algorithm to evolve the gaits. EG, the gaits on my walkers are generated by lookup tables of timed servo movements. It would be fairly easy to have a higher-level program patch new values into the lookup tables, based on some genetic algorithm or other learning technique, in order to modify the servo movements. The critic to guide the GA/learner would be the sonar input.
For instance, to learn how to stand up, you point the sonar towards the ceiling, and set up the learning algorithm to fiddle the lookup table values such that the desired goal is to reduce the distance to the ceiling. Quite simple, really. Similar protocol for learning to walk, once the bot has stood up successfully. This time, point the sonar towards the wall ahead, and tell the learning algorithm the goal is reduce the distance, once again. With 2 sonars, one pointed upwards and the other ahead, you could get the bot to learn to walk forwards while keeping the top of the bot fairly level at the same time. On and on.

No. This is just the basics we roboticists deal with everyday..

It might well make use of some short-term memory to keep it directed at the proper goal, ie original light source, and not get distracted by extraneous light sources that come along [eg, light coming though the doorway it's passing by]. If it's too dumb, ie purely reactive, it'll turn and go through the door, instead of following up on its orginal goal. Here is the death knell of too-simplistic BBR.
I know from experience, because I implemented simple photovore behavior in my hexapod, and it turns cutely and tracks light very nicely [thank you], but it simple homes in on any "local maximum" it encounters, and forgets which one specifically it was previously tracking.

Well, this is the challenge for small bots. Current computer vision systems work best with supercomputer processing power to tap into. But this thread is about directions for research.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
dan michaels wrote: [...]

[...]
You write that is would be fairly easy to do what you suggest above so why haven't you done it Dan?
Of course the "critic" has to evolve as well in response to the higher critic, reproductive success.
Stable mechanisms will not evolve, they have no motivation. Unstable mechanisms have the potential to evolve but that is only if by chance they have the right "goals" (requirements for stability) that lead in that direction.
The advantage of having a motivating critic to get a robot walking is that should it lose a leg while trekking mars it will than start adapting its gait. Should it come across a new kind of terrain it would adapt to that as well.
Of course a "critic" will never guarantee a solution is possible for any particular mechanism.
For those that can't afford a real robot there is always the simulated robot to test your theories.
http://remi.coulom.free.fr/Thesis /
-- JC
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
JGCASEY wrote:

Well, there are only so many hours in a day that I can assign for play. Were I an academic, then I might be doing this on a paycheck :). As it is, it's just for fun on the side.

This is a good point that Jordan Pollack discovered in his games with evolutionary robotics, or whatever he calls it. If you just let the algorithm crank, there is no telling what you might get as a solution.
Eg, if I were to use the algorithm plus sonar-critic to teach the hexapod to walk forwards FIRST, instead of to stand up first, then it's highly likely the gait would involve something like 1 or 2 front legs literally dragging the rest of the passive and prone bot towards the wall. Like an inch-worm. Unlikely this scheme would produce a nice hexapod tripod or metachronal-wave gait.
OTOH, if I have the algorithm learning to stand first, and then make use of that prior-knowledge during the walking phase, I'll probably get a much better result. To me, this is the only way to go. To me, what Curt usually proposes would take my bot 3.5 BY [of analogous computer time] to learn a good solution. OTOH, if I help it a long a little, it will produce a much better result many times quicker. IE, directed, and opposed to purely random, evolution.

Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

Ah, but the point is that the evolution _is_ directed and not random. It's decent with modification - not just modification. Which means, any time if finds something good, the "good" stuff stays around (decent), and is used as the foundation for the next round. Learning systems that don't manage to create a good decent with modification system are the ones that never get very far. To continue to grow and advance, it must be able to build a growing base of knowledge and not have to rebuild the entire knowledge base from the ground up each time by random luck.
For example, on your walking machine, if it did learn to use two front legs to drag the rest, it should then move on and find ways to modify the behavior of the back legs, without forgetting the useful behaviors it learned for the front legs.
Preventing new experience from erasing the valuable lessons learned in the past is part of the trick of making evolutionary learning work. The way to do that, is to generate behavior based on the current sensory context, and to make learning, based on the same. When the sensory context changes, the part of the machine which is currently being "trained" will change as well (or the effects of training will be weighted differently at least). This allows the old learned lessons, to remain mostly unchanged, while it's learning a new lesson. The complex emergent behavior of the system is a combination of all these little lessons it has learned over time.
Part of the trick to making that work, is giving it a good system for evaluation the worth of each change so that progress can be constant. An example of a bad critic, would be be on that taught it to walk by waiting for it to move a fixed distance from the start, and then giving it a single fixed sized reward for crossing the finish line. A good critic, will reward it for every forward motion it made, and punish it for every backward motion it made, such that the reward was always proportional to the distance moved. Any small evolutionary change that helped it move forward farther, or faster, would receive a greater reward.
In a walking machine, the behavior of each leg, needs to be a function, of it's sensory awareness of the actions of all the other legs. For example, a simple action like pushing a leg back might move the bot forward, and that action might be rewarded. And pushing it forward, might move the bot backwards, and be punished. So it would learn to push the legs back, and never bring them forward. And the bot goes no where (but it at least doesn't get punished for moving backwards). But, if the behavior, and training, is made context sensitive, then it can learn to push it backwards when lowered, and push it forward when raised. These two behaviors are sensitive to the context of the current position of the leg. Learning to push it forward when raised, is not erasing the previous learning that happened when it was lowered.
Likewise, if the context is a function of the position of all the legs, then it can develop a complex coordinated walking motion, one small step at a time. One leg can learn to drag the body, and the others can learn to respond to the motion of the first leg, in order to maximize forward motion.
The point here is that it never has to learn a complex behavior (like 6 legged walking) by one very lucky random chance. The odds of it a complex behavior by random chance is near zero. It's learning it by stringing together 1000's of very trivial behaviors, each of which have very high odds of being found quickly.
If you want to train a machine to output a sequence of 10 digits, and you try to do it by letting it pick random combinations of digits until it gets all 10 right, it will take a very long time for it to learn to produce the right answer. But, if instead, you allow it to learn one digit at a time, then it will find it very quickly, with each digit only taking around 10 guesses on average and all 10 only taking 100 guesses.
If you reward it based on the number of digits correct each time, you have a problem somewhere between those two. It's not as easy as the second because the system doesn't know which digits it's being rewarded for. It's like a mastermind game (if you know that game). If the system tracks the amount of reward, for each digit, in each position, it will find a statistical correlation of more rewards for the right digit over time. This system will be able to converge on the correct answer much faster, than one which is only rewarded when all 10 digits are correct (but not as fast as the one where each digit was trained separately). If the training is able to reward closeness, or value, of a given behavior, then it can converge on complex answers in reasonable time and not take billions of years. It is just a matter of creating the training system (the critic) and the learning techniques, to create these conditions where the system can be directed to converge on a complex answer (as if it were being led by the hand to the solution), instead of guessing it in one huge leap by random chance. Large leaps of learning always take forever and will never work. Learning is only workable when it can be broken up into many small steps (decent with modification in small steps).
--
Curt Welch http://CurtWelch.Com /
snipped-for-privacy@kcwc.com http://NewsReader.Com /
  Click to see the full signature.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
Curt Welch wrote:

I more or less agree with what you've written here. Good learning strategies vs bad strategies. Now, think of all the "innate" knowledge the system has to contain in order to successfully implement your "good" srategies, as compared to just doing everything randomly. All of those good strategies require non-trivial protocols.
Eg, in regards my walker, a naiive solution, based upon a very simplistic learning protocol might produce a result where only a single leg learns to drag the walker along. This, I believe, is similar to the kinds of solutions that Jordan Pollack was seeing. It's doubtful the walker would learn a nice solution that would magically get all 6 legs working together in a very efficient gait, like a tripod or metachronal wave gait, by random chance. Similarly, for your 1-digit-at-a-time example.
OTOH, to implement your 1DAAT or an all-6 legs in synchrony protocol will take a smart critic that oversees the results of previous learning operations and strategically chooses the "correct" protocol for the next learning stage. This is a non-trivial mechanism to implement, and it does show up the weakness of raw generic learning protocols that operate on the so-called tabula-rasa.
There are [at least] 2 approaches, as I see it. One is to implement your temporal stage-sequencing "smart overseer". However, this thing is already awfully intelligent itself, able to discern good results from bad, at each stage, and then select the appropriate next stage ... and somehow it needed to get that way. There's the rub.
The other way is the way I envision, which is to add the learning modules atop a pre-existing mechanism that already performs the basic operations, albeit non-optimally. IOW, use "genetically-determined" gaits, which are very simple and probably non-optimal, and have the learning module "fine-tune" the gaits to be more efficient and optimal. Eg, with my controller, I can easily generate a basic gait in a couple of minutes, by adding entries to the look-up table. However, it takes me much longer to optimize the look-up parameters, even though there are only 36 parameters for a 6-legged gait. Every small adjustment has some effect on the overall result, and I can fiddle around seemingly forever with hand-tuning.
In short, if I start with a basic gait, genetically-predtermined in effect, and let the learning module make strategic adjustments to the existing parameters, it should take a lot less time to produce an efficient gait than if the system starts from zero in the first place.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

This is an interesting problem that I think would be a fun challenge for learning systems....

A lot of people seem to think that the only way learning can work is if the "teacher" (aka critic) is as complex as the task it's trying to get the "student" to learn. Meaning - that for a machine to learn something complex like multileg walking, the "teacher" has to be more complex (or nearly as complex) as the walking algorithm would be in the first place.
I think this fails to understand the power of the system learning how to teach itself. This is the point of secondary reinforcement. It not only learns how the behave, it also learns to train itself by making predictions of value. As it learns more complex behaviors, it is also learning to make more complex predictions of rewards - which it then uses to train the new behaviors. So as it's learning complex behaviors to produce better rewards, it's also becoming a more complex critic of itself. This is how strong learning systems become stronger and continue to advance in complexity.

Nah, if done correctly, it starts out very dumb - which means there isn't a lot of complex code that needs to be written. But as it learns behaviors that work better, it's also becoming smarter at training itself. It's intelligence grows, without us, as programmers ever having to create a complex overseer. You only need to give it motivation - just as you suggested - to find ways to move forward faster.

This stuff I'm talking about is already implemented and working in some systems, like the TD-gammon backgammon playing program. The only reward that system was giving, was winning. It has no clues about how to tell if one move in the middle of the game might be better than another. It has no "smart overseer" to give it hints about why one move was better than another in the middle of the game. It only had a very dumb overseer who gave it a "good boy" at the very end of the game when it managed to win. But yet, as the program learned to play the game, it also developed its own complex evaluation function. Given any game position, the evaluation function could produce an estimated reward (aka the odds of winning the game) for that position. So, as the game learned, it was also developing a very intelligent "overseer". This is what it was in fact learning, how to be a better "overseer". Nobody had to write a complex overseer to teach it to play backgammon. It learned how to teach itself.

Yes, that's a very practical solution if you don't have strong learning systems to work with. You do as much as you can with our limited human minds, and then let the poor learning system extend the design a bit further by making minor adjustments to our design.
But, if you had a strong learning system to start with, then you don't need to start with anything other than a basic goal (make the bot move forward), and let the learning system, step by step, create and improve the design.
Your walking problem seems like a very interesting test bed for learning systems. My claim is that a strong learning system could in fact, on its own, learn to produce complex gaits (in a reasonable amount of time), without us having to give it 90% of the answer to start with. I should work on that problem and see what I can come up with....
--
Curt Welch http://CurtWelch.Com /
snipped-for-privacy@kcwc.com http://NewsReader.Com /
  Click to see the full signature.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
Curt Welch wrote:

"secondary reinforcement". "learns to train itself".
I don't see any reason to believe a learning system will do this on its own.

"if done correctly". Ditto to above.

I'll have to check this out.

Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

I don't look at it like that because any single data stream from a sensor can be decomposed into multiple data streams and once you do that, you end up with multiple sensory signals and the same sensory signal integration problem you get if they came from different sensors. All "sensory processing" is done by decomposing, and recombining data in an attempt to remove the data you don't care about, and extract the data you do care about.
In effect, every sensory signal is feeding limited information about the environment into the robot. But every signal will include both information the robot cares about, or could make good use of, along with information the robot probably has no need for (noise generated in the sensor for example).
The more sensors you have, the more paths there are for information about the environment to flow into the robot, but the problem the robot has, which is to find the data that is useful in those flows and ignore the data which isn't is the same whether you are talking about one flow, or 100.
A sonar sensor for example in it's raw form is really two inputs. One input that tells the robot when a ping is generated, and a return echo which tells the robot when echoes are heard. The return echo is always a complex temporal pattern of sounds as the ping bounces off multiple objects. But most sonar sensors will perform some type of average or threshold detection operation on that complex pattern and reduce all the information down to a single distance measurement. This might all be done with analog circuitry on the sensor because the data is even sent to a central processor. But the hardware is already performing multiple decomposing, and combining functions on the raw data in order to filter out most the data from the sensor and reduce it to one small piece of data that robot can easily use (a single distance measurement).
To me, the process of extracting one useful piece of data (a single distance measurement) from those two complex temporal signals, is the same generic problem that has to be solved at all levels of sensory signal processing.
Ultimately, the robot must produce outputs, which are a complex function all all the raw sensory data. Defining those functions is the problem of how to program the robot. I don't see it as one step of sensor processing, then another step of sensory integration, then anther step of output pattern generation. I see it ultimately as one large step. We break it up into small steps not because that's the best way to do it (and not because it must be done that way), but because, that's the only way our limited minds can comprehend how the bot is working. Sonar sensors are designed to return a very simple limited distance measurement because that's something we as programmers we can easily understand and write code to respond to. If it instead simply returned a raw echo profile, it's not something we could easily make use of. But that doesn't mean the raw echo profile doesn't include a lot of good data that a more sophisticated data processing system could make good use of.
--
Curt Welch http://CurtWelch.Com /
snipped-for-privacy@kcwc.com http://NewsReader.Com /
  Click to see the full signature.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

Right. If you are coding such a system, you could think of the "memory store" as weights that might exist in neural network. Long term memory store are weights that are modified, and stay modified forever (creating long term learning). Short term memory store are wights that can be modified, but which always slowly return to their staring condition (creating habituation). You could easily code a system that used a combination of both. The shorter term memory would be modified quickly, allowing the fish to learn to ignore the splash in only 10 trials. But the long term memory would also be modified, but much slower. So if you come back the next day, the with would have revered (almost) to the same starting behavior. But, if you did this every day for a year, the long term memory effect might eventually cause the fish to stop swimming towards a splash all together.

Yes, I agree. It can correctly identify the sensory clues to distinguish how the context you set up that day, was unique from other environments the fish had experienced in the past. Better sensors of course help, but better sensory processing algorithms I see as more important. There is a lot of useful information in even simple sensors that most small bots fail to make any use of. This includes the bot sensing what it is doing and merging that with what it sensing externally. I suspect few small bots make good use of what can be extracted from that data.
--
Curt Welch http://CurtWelch.Com /
snipped-for-privacy@kcwc.com http://NewsReader.Com /
  Click to see the full signature.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

Interesting comment, Curt. My first reaction was, come on, you don't know how poor these sensors are. Begrudgingly, I came around to suspect you are right, and my recent experience supports you. I realized I have been preparing to extract more information from my Sharp range sensors for some time, and have not yet taken advantage of what I'm already collecting.
My mini-sumo base robot used in my university class is built on a MarkIII chassis. In the mini-sumo competition, while the Sharps we used were the analog output, all we have used so far is to threshold the analog to say, there is or isn't something out there between 10 and 80cm. (Essentially the same as if we just had used the digital version of the ranger.)
Since the class is hosted by the physics department, I laid the ground work for taking more off the sensor. By reading the ranger, I got a current range reading (even though its only use was as a comparison point against a there/not-there range threshold. So I added a differencing of each reading with the last, and thereby got a current velocity. I differenced the new velocity to the old velocity and got a current velocity reading. At that time I had not yet bothered to linearize the Sharp which would have given real world significance and scale to the numbers, so the readings were somewhat meaningless, but the basis was there. (I just took the time to linearize the Sharp sensors on my latest project, soccerbots for the Queen of Jordan's childrens museum, and once mastered, I think the effort was well worth it, and the code will undoubtedly migrate to the sum code next time I work on it.)
In other routines I had begung to compare the two adjacent (slight horizontal displacement) parallel sensors. I checked the digital conversion (target/no-target) and counted how long the detector had been in the current state. This value was used in the sumo code. The detector that last saw the opponent (left or right) determined which way the robot turned to reacquire the target. On the analog side, I computed the difference between the two current readings and an average of the distance both saw.
Likewise, the simple digital edge detectors, looking for the presence of the white line at the edge, were also further analyzed. I counted how long the detector had been in the current state. These counts could be used to know something about the angle the edge was approached by seeing which one went on to the white line first. In pulling back from the edge, a similar count comparison was used in the sumo code to decide which way to turn around, to best position for an advance on the center of the ring.
Now given our earlier conversation about the bump switches, this might surprise you that I've made this much use of historical information on two digital inputs. I will point out, this was done in a function SENSOR-SCAN which was the first to run ahead of, and completely independent of, my state based subsumption behavior routines. I see nothing in Brooks that would approve of the choice, quite the opposite, but I found it processor efficient to work out these somewhat "statistical" details and have the processed information available to (used in one or several or even none without matter) the later run state machines.
--
Randy M. Dumse
www.newmicros.com
  Click to see the full signature.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

Yeah, those are all good examples of how additional information is in the sensor data which many times goes unused by simple hard-coded behaviors. And much of it is in the temporal domain - meaning that when something last happened (aka how long it has been since it happened) yields import information for controlling the actions of the robot.
When we hand code these solutions, we write a limited number of if statements to test sensor data and respond to. Like you example of testing if the distance was >10 cm or < 80 cm (two simple IF statement tests). This produces behavior which tends to be jerky and robotic like. That is, when something is greater than 80 cm away, the robot behavior doesn't change it all, it's as if the robot was totally unaware the thing existed. When it crosses that 80 cm line, the behavior suddenly changes.
Humans and animals have very fluid and dynamic reactions to their environment because they in effect, don't use 2 tests for data like that, they might use 2000, or maybe 200,000. Every neuron in your brain is in effect performing a simple conditional test for some very small but somehow important sensory condition. To equal that type of complexity and fluid reactions to the environment, we would have to hand-code thousands of tests against the sensory data instead of the 2 you used in the first version of your bot. This is just something that will never be practical for a human to do. Like you have talked about, you might extract a few more bits of data from the sensors and use 20 tests instead of 2. But creating a system with 2000 subtle tests based on the current reading of a sonar distance sensor is just something that isn't productive for a human to even try to to. But it is the type of thing learning systems, like the brain, can evolve, based on some goal the system is trying to reach.
The limit to how far we can hand-code reactive systems is the limit of what we can understand. That I believe is what is slowing the advance of what has been done with such systems. We need in effect to develop tools to program the systems for us - and that's what a learning algorithm is - we give it a goal, and the learning system evolves a complex design to maximize whatever metric we choose to have it maximise.
--
Curt Welch http://CurtWelch.Com /
snipped-for-privacy@kcwc.com http://NewsReader.Com /
  Click to see the full signature.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

Except of course, the species as a whole learned those as well through the slower learning process of evolution. It's learning either way. They just happen on different time scales. One example is the species learning as a whole and the other is an individual learning in its life time.
--
Curt Welch http://CurtWelch.Com /
snipped-for-privacy@kcwc.com http://NewsReader.Com /
  Click to see the full signature.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
John Mianowski wrote:

People behave the same way.
-- JC
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

Only if the programmer is able to know every possible environmental condition the robot is going to experience in it's life - which they never are. Which means, robots programmed like that only work well, in the environments they were programmed to work in.

It seems I just duplicated a lot of what you already said in my previous post.

That is exactly why you need learning and why the programmer can't know ahead a time how to answer these questions. If the bot is completing against a bot the programmer has never seen before, it's unlikely that he would have coded the correct amount of aggressiveness into his bot. But, with context sensitive learning, the bot can learn to recognize the "blue" bot and learn that trying to grab the ball away from it never works, so it shouldn't bother trying. But it can learn that the red bot can't hold on to shit and that he can grab the ball way every time. So if the red bot is around holding a ball, it should always try to grab the ball from it. But, it might also learn that the green bot is better than he is at grabbing the ball from the red bot. So if the green bot is around, and is trying to grab the ball, then he might as well go do something else that is likely to be more productive.
All these priorities about which behavior is the most productive in different environmental contexts, is something that must be learned (and constantly adjusted) with experience, if you want to bot to act "intelligently". And you will never make it work very well if that "context" is defined by a hard-coded perception system. We need strong, generic, statistical algorithms for merging all the sensory data, and using that to select the best current behavior to produce.

Yes, you give them "frown" and "smile" hardware.

A strong reinforcement learning system, that had only a simple hard coded reward like getting a ball in the goal, but which also had a strong context sensing system could learn a large set of different behaviors what we as humans would label as you did above. But internally, the bot only needs one state or purpose - do whatever works best, in this context, to produce the most expected future reward.
These systems, like I talked about in the other post, need to include a strong internal, reward prediction system. If you take the output of that internal reward prediction system, and wire it to an external signal, like your lights, then you have simple smile and frown hardware that other intelligent bots could learn to pick up on as you suggested.
--
Curt Welch http://CurtWelch.Com /
snipped-for-privacy@kcwc.com http://NewsReader.Com /
  Click to see the full signature.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

Yeah, I think your example shows how good even simple animals are at learning. Though a fish does have much better sensors than our typical small bots (probably even better that most high end multimillion dollar bots), that really isn't the big problem. The big problem is our bots aren't making good use of the sensors they have - which means they are making good use of the data coming from the sensors they have.
How for example, did the fish learn so quickly to stop chasing the pennies, even when many never had the chance to try and eat the penny? I think your answer about fish story telling was not as far off as you might have thought.
After a life time of eating with other fish, the fish have probably learned to read the behavior of the other fish. When they all swim towards a splash that seem like it could be food, the first fish that taste the food, and decide it's not food, probably behave very differently, than when it was real food. Next time, take some bread and throw in chunks and watch how they group reacts. You will probably notice a lot of extra splashing and fighting over the real food that never happened with the penny.
The other fish, the ones that never made it to the food, or the penny, can no doubt sense this difference. Just like we could. If they see a lot of splashing and fighting going on, they know it's real food, and they join to fight. If they don't see it, they know either it's fake food, or the food's all gone, and there was now reason to swim over there. The behavior of the mob is telling a story to the entire group, about whether there is food, and indirectly, about how much food there is.
So every time you throw the penny in, and the fish swim towards it, and see no resulting mob fighting, they know the splash was not food, even though they never got close enough to test the food for themselves.
This type of behavior is easily explained in terms of reinforcement combined with internal systems that predict rewards. The splash is a predictor of reward (because the fish has many times in the past received a reward from real food after sensing a near by splash). One the fish senses the splash, not only does it trigger the behavior of swimming towards the splash, it also causes the internal systems to increase the prediction of a future reward. That increase, by itself, acts as a reinforcer to reward whatever behaviors the fish was doing at the time it heard the splash (swimming near the bridge for example). So, the splash itself, acts as reward to encourage the fish to continue to swim near the bridge.
But, as the fish swims towards the splash, and then, the expected fighting by the mob over the food doesn't happen, this causes the internal reward predictions to drop (oh, we probably aren't going to get any food). That drop in the internal reward prediction, acts as a punishment for the behavior in action - swimming towards the sound of the splash.
This is how each time you throw a penny in, the fish are being taught a lesson to now swim towards the splash.
But, if they learn so quickly (10 examples in 10 minutes) to not swim towards the splash, is this not making the fish forget everything it knows about getting food by swimming towards a splash? Well, if the sensory systems could not tell the difference between that splash, in that place, at that type of day, from all the other splashes, then sure, it would quickly forget everything it knew about getting food by swimming towards a splash. But if the sensory perception, is advanced enough, to tell the difference between that splash, and other splashes it's experienced in the past, then what it's learning, is to not swim towards that type of splash, in this place (under the bridge in the afternoon when bot programmers tend to throw pennies at them instead of early in the morning when the kids from the school waiting for the buss throw crackers to them).
This is what the "better" sensory processing is needed for that you talked about. We need automatic systems that can analyze, and combine, all the sensory data from different sensors, and create the correct "context" for associating the current rewards, with the current environmental context. When the fish is being thrown crackers, the context can be different in many ways. The daylight levels might be different, the precise sound of the splash might be different, the number of other fish around might be different. He might be in a different part of the stream. He might have to swim harder because there's more current in the water. The fish might be more hungry in the morning when the crackers tend to be thrown. Anything the fish can sense, through any sensor, is helping to establish the current environmental context. And with some contexts, the fish gets more food than others. In some contexts, swimming towards a splash tends to reward the fish with food, and in other contexts (which might be very similar, but different in some detectable way) swimming towards the splash doesn't tend to be rewarded with food.
My point here is there is 1), to show the habituation, the machine has to have learning - it must be changing it's behavior in response to the current context. Most simple bots have little or no learning coded in them so they have no hope of acting like the fish did. 2), It must have strong generic sensory processing that can analyze all the sensory data combined, to identify the unique differences between which contexts lead to rewards, and which don't. This is a complex statistical process - not something you can make work by hand-coding behaviors (well, unless you have millions of years to work with like evolution). And 3), the system must include reward prediction, and those predictions will modify behavior - they will be the true source of the changes to behavior. This is how the fish learns to "read" the behavior of the other fish, without having to have that skill hard coded into it by the creator (evolution for the fish, human programmers for bots). It learns to read the behavior of the fish, just like it learns to read everything about the environmental context. Some contexts are predictors of rewards, and some are not. It must learn, through experience, to tell the difference. Then, with a good reward prediction system helping to shape behaviors, any behavior which causes the bot to increase the prediction of future rewards, will be instantly rewarded. No need to wait for actual food to show up to act as a hard coded reward of the behavior.
If you want to make a bot act anything like fish (or any animal that learns from experience), that is the type of system it must have built into it. You can't do it by hard coding behaviors into the machine because unless the programmer is there to teach it the difference between the sound of a penny hitting the water, and a cookie hitting the water, it will never learn to ignore the splash create by a penny. You instead, as the programmer, hard code the result you want to bot to achieve (get real food in it's stomach), as a hard-coded reward. Then you use a statical based system to do the associations between the multimodal sensory contexts, and the actions that work in each context for producing rewards, and the behaviors that don't work.
This is how intelligent machines are going to work. The only unknown, is the implementation details of how such a machine, is best able to take multiple sensory inputs, and define the "context", and map each context, to behavior. But it's going to be modality independent because whatever mathematical techniques work for correlating sensory data to actions is going to work well no matter what the sensory data "means" or what the actions "mean". At this level, the only "meaning" that's important, is how much reward does each mapping produce - i.e., how "good" is each behavior, for each sensory context, based on past experience.
As a foot note, let me add I don't know anything about fish. It's possible that fish for example are born hard wired with the ability to recognize the activity of fish fighting over food as a reward. They might not have learned it in their life though experience. They might also be born with the behavior of swimming towards a splash instead of having to learn it. And their learning to avoid the penny, might be only temporary. If you were to stop throwing pennies for 30 minutes, you might find they had forgotten all they learned, and were back to the old behavior of swimming like mad towards the splash (i.e., it was a form a habituation instead of long term learning). These are all variations of how evolution might have implemented learning in fish to maximize its chance of survival. But these and many other variations would would be easy to code, if we simply had better generic learning algorithms to start with. Which we don't. Yet. And that's what we are missing to be able to create more intelligent behaviors in bots and all our machines.
I agree with you that we are lacking strong enough sensory perception code to make use of more complex multiple modality sensory data. But I already know what the code needs to do. It's not just a perception problem - it's a generic statistical learning problem that maps sensory data to actions (aka BBR), and adjusts the priorities of those mappings (learning), though experience. It learns to abort the swimming behavior quicker and quicker in that special sensory context defined by the sound of a penny splashing in the water, in that stream, on that day, in that location. And BTW, the reason it is "aborting" the swimming behavior, is simply because other behaviors (like "hide in the rocks to protect yourself from predators") is becoming the more dominate behavior in that context instead of "swim towards splash". Each time the behavior is used and it fails to produce the expected rewards, it gets demoted in that context. The system is always picking the behaviors that seems like they will produce the most rewards. When there is no sign of food around, the behavior that is the most likely to produce the best rewards is the "hide in rocks" behavior, or maybe, "look for mate", etc.
--
Curt Welch http://CurtWelch.Com /
snipped-for-privacy@kcwc.com http://NewsReader.Com /
  Click to see the full signature.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
Curt Welch wrote:

As Gordon pointed out, this was probably not really learning, as the fish probably will react the same way the next day. Rather the changed behavior was some sort of short-term memory and/or habituation. BTW, ever hear the one about the goldfish ...
"Goldfish are said to have such short memories that every trip around the bowl is a new experience"
How else to live in a bowl, and now go totally crazy.

Unfortunately bread makes no noise when it hits the water, and immediately floats down stream, in that current. OTOH, I am sure the fish in the creek were especially tuned to the particular "class" of sound produced by the penny hitting the surface. Sharp and strong, maybe like a dragonfly lighting down, etc.

This is rather interesting, because it's several levels more advanced than a single fish sensing and perceiving a good stimulus [sound strike] from 20' away - what I have been talking about. Here the fish are able to both perceive the behaviors of their neighbors, and also react in some appropriate manner.
You realize that, before they can do this - engage in complex social behavior - FIRST they need the adequate perceptual mechanisms.

Ditto.
No doubt, something like that.

Yes, there are several problems here to be solved. One is better sensors, and the other is sensory integration, and 3rd one [hiding in the background] is the learning.

this last is why we have computers.

Yes, as mentioned. First the adequate perceptual capability, then the other stuff. Regards evolution, one presumes this is how it happened. IE, indiviudal fishes needed the sensory-action mechanisms before they could properly use these in the context of interacting with other fishes in large schools, etc.

You can give the bot, via both mechanical design and coded routines, the basic ability to generate actions, and then use the other stuff to subsume control over these actions. Layer upon layer of control.

Frogs are born with the ability to automatically snap at flies that fly past - courtesy of evolution. Many [or most] habitual behaviorss in lower animals are generally held to be instinctual, rather than learned individually.
There are many ways that habituation to unproductive stimuli can be wired in. It may simply be that, after making several warp-speed attacks, lactic acid builds up in the fish muscles and doesn't quickly dissipate, and fish aggressive behavior is geared inversely to the level of lactic acid. Not much to do with learning or "memory" as you're positing it. Etc.

In fact, what I think you have been clearly illustrating is that it's really a lot easier to code behavior and learning than it is to replicate good perceptual processing. Despite what you say in the next few sentences.

Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

Yes, exactly. But I see that perceptual mechanism as the same mechanism needed to extract useful behavior from multiple sensory signals.

I don't really understand what you are thinking when you write that. What do you mean by better? Are you talking cheaper sensors? Lower power? Smaller? What exactly is wrong with any of the sensors we have now? I don't see why we have a problem in that regard. The only issue is that they cost to much for hobby work - both the high quality sensors and the processing power to deal with the data. We have better sensors in terms of what they can sense than just about all the senors in animals (except maybe chemical sensors). So I don't get what your point is.
Or are you talking about the processing of the sensor data into an easier to use form when you say better sensor?

I believe that processing raw sensor data in to a better form and sensory integration are one and the same problems for example. For example, an eye can be looked at as a million light sensors. To process that data into a signal that represents "i see a cat", is a sensory data integration problem to me. Automatic correlation of any two sensory signals should work the same basic way whether it's two light sensors which are part of a bigger eye or light sensor data and sonar distance data being integrated. It's the same problem of correlating temporal signals and extracting the useful information either way.

If you had the type of generic learning system I talk about working, then there's no reason you wouldn't layer it on top of hard-coded behaviors which you knew were a good starting point for what you wanted the bot to do. The dynamic learning system would simply be configured to adjust and override the instinctive behaviors as needed (and there might be some it couldn't override). This would both reduce the amount of work the learning system had to do as well decrease the time it takes for the system to become good at some task. There is no end to the engineering options available to optimize the design to fit the needs of the application. What I feel we are most missing is that strong generic learning that can be added at any point of a design as needed. Hard coding behaviors is straight forward coding that any good programmer can do. Knowing what behaviors to hard code is the harder part. The point of the learning system is to do the real time testing to discover the behaviors that work best. But that can only work if you can produce an automated test for success (aka the critic). But many times, you can easily automate the test for success and that's where dynamic learning systems would be very useful.
--
Curt Welch http://CurtWelch.Com /
snipped-for-privacy@kcwc.com http://NewsReader.Com /
  Click to see the full signature.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

Polytechforum.com is a website by engineers for engineers. It is not affiliated with any of manufacturers or vendors discussed here. All logos and trade names are the property of their respective owners.