Where is behavior AI now?

- D
- D. Jay Newman
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sun, Sep 3, 2006 5:32 AM

And it had layers so that higher-level behaviors could suppress or encourage lower level behaviors.

I also prefer to call this "reactive programming" because "behaviors" rings wrong for me. This type of programming uses short sections of code that react to sensory data.

I like this. I think that the lower-level behaviors should be pure motor functions (power X to right motor), and "wall-following" would be a behavior that supressed/encouraged the proper lower behaviors.

As a future modification the wall-following behavior could be programmed with a learning algorithm of some type.

...

Pardon me, but I missed the abbreviation "RL"? I do agree that some sort of decision tree would be useful and quicker than a neural network. ...

This is what DSPs are made for. I believe that there are special purpose chips to do limited voice-recognition. It seems to me possible to use a DSP for the purpose of identifying and remembering these clues.

Yes, algorithms tend to give a bigger improvement than better hardware, but don't overlook brute force when necessary.

Yes, there are DSP chips and ARM chips which are almost as affordable as the PICs and AVRs and such.

Agreed.

- R
- Randy M. Dumse
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sun, Sep 3, 2006 9:44 PM

Yes, I can see mapping from that view. I should quote from Brooks to give you a sense of his point of view.

I've never thought of state information as short term memory (STM). Why would it be STM vs. LTM? What would LTM be, if not state information?

Agreed.

But usually you don't need to record inputs exactly as presented.

Is it consistent with _my_ use of states? Yes.

Is that consistent with his use of states? Oh, how I wish I could tell you! Brooks gives frustratingly little insight to his AFSMs.

In the first paper in "Cambrian Intelligence" the Augmented stands for added variables. The fullest explanation I've found is on page 13 in 3.1 of the first paper:

"Each module, or processor, is a finite state machine, augmented with some instance variables which can actually hold LISP data structures.

"Each module has an number of input ines and a number of output lines. Input lines have single element buffers. The most recently arrived message is always available for inspection. Messages can be lost if a new on arrives on an input line before the last was inspected."

He shows a little code there, but I'm not familiar with its syntax.

By the second paper it stands for Timers added to FSM's, where the communications wires between states have limited durations.

Trying to penetrate more deeply into what a FSM is to Brooks, if his model resembles what I understand them to be, is difficult.

To give you a flavor of what I've been saying about Brooks two of his papers are entitled "Intelligence without Representation" and "Intelligence without Reason". In the latter of representations he says,

"I make it clear in the paper that I reject traditional Artifical Intelligece representation schemes... I also made it clear that I reject explicit representations of goals within the machine."

and later,

"Her representations were totally decentralized and non-manipulable, and there is certainly no central control which build, maintain, or uses the maps."

Yes, averaging, or just remembering the last pass compared to the current one, or keeping counts, which is a way of condensing state information, however it also hides state information, and therefore should be done with care.

Well, I look at the state information a bit different way. I try to only pick up the essential data to begin with, and ignore the irrelavant. So I generally don't track inputs, and later remove duplicate information.

Let me give an example of this. Say I am implementing a combination lock with a telephone like keypad. The main input are the numbers 0-9. Let's say there is a four digit sequence necessary to open the lock, followed by a # to enter and try the combination.

The way I hear you are suggesting is I need a 10,000 state machine to track this input, 9999 and #. Actually the * and # complicate this matter, so perhaps I need a 12^5th number of states. Now I see this as an "input centric" approach to the problem.

My approach which I would see as a "data centeric", "content centeric" or perhaps a "minimal-necessary-historic content" approach, would be to have 5 states. The first state would have a transition out to the second state when the first correct number was pressed. Each following state would have a similar transition to the next state for a correct press, and a transition back to the first state if the wrong number were pressed. Four correct presses is the only way through the states to the final #-key test state that would operate the bolt.

In essense, the state information would only contain a very limited range (for 5 states, or 5 different values only) rather than 10000+. The wrong strokes are deliberately forgotten as irrelavent. No memory of their being made is kept. (Although we could make another machine that counted wrong attempts, if we wanted a fancier lock, but unless extra features are required, there's no sense in remembering undesired inputs. If I really needed to retain the inputs in original form, a circular rotating pointer buffer would be far more efficient than a state machine.)

I understand this processing to be sensor fusion. Do you use the same term? There's a nice series of articles on sensor fusion and multiple sensors running in Servo magazine just now. (Note though it has some terrible errors in it. For instance the conversion rate of Sharp sensors are stated to be 1000x what they actually are.)

- R
- Randy M. Dumse
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sun, Sep 3, 2006 10:41 PM

Why do I feel the sands shifting under my feet. Am I not expressing myself well? or is my meanind deliberately lost? As of yet, I cannot tell, but I do feel I'm not understood.

Note the follow is indeed a paraphrase, and not a quote of mine.

Part of the shifting sands is the argument made the transition from piano (which I didn't like because of all the modified intonations possible, but was at least quantum in possible tones, a harpsicord would have suited me better for a much more consistent note with each instance of output, to a violin with an analog range of frequencies, and even more nuances possible.

In every analogy, there's the part that fits, and the part that doesn't. You wish to move me more into the part that doesn't. Please note I resist the change as obsfuscatory.

I don't think your old enough to where Bach would have had an pinion on your intuition. Do you have any evidence to the contrary?

And here I think the my argument has been strawmanned, since it said nothing of human creativity, but instead was trying to pronounce on: if there are a closed set of atomic actions a robot can take (the analog in music being notes available on some intruments) and a closed set of inputs to trigger those atomic actions, reasons to transition from output action to output action, (the analog in music being different ways to go from note to note) then too it might (must) be a closed set.

The atomic output actions I mention, I have tried to make distinct form behaviors as they are now known, and called them atomic behaviors (although I think they should be called just behaviors and anything with complexity to it not be called a behavior).

None of this has reached the stage of a longer sequence of atomic behaviors, which are still called behaviors, but in music might be a thrum or a trill or chorus or stanza. Nor have we reach the length of combination which would rightly be a melody.

In the music analogy I am saying, given a limited set of notes, and a limited transition table from note to note, there is a closed set of note to note sequences. That, however, says nothing about limiting the amount or complexity of any possible melody. Melodies are infinitely variable. Melodies are formed by intelligence, human, or even birds, etc.

Some (most) instruments have a limited number of notes. Some (most) instruments are limited to express only one note at a time. Some (most) instruments have limited ability to go from note to note.

My "breakthrough" in behavior based robotics using the musical analogy is, the melody isn't in the note itself. It is in the manipulation and sequencing of the notes that makes the melody. In robotics terms it is, intelligence isn't in atomic behaviors, or the individual transition from atomic behaviors to atomic behaviors. It is in the whole sequencing of the atomic behaviors that intelligence arises.

Then don't bring it up, because it has no bearing on the discussion at hand?

- R
- Randy M. Dumse
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Sep 4, 2006 3:09 AM

Indeed.

But you see, in behavior based robots, it is thought that the layering of the behaviors is where it is at. Really the behaviors have almost nothing to do with it. That's the departure I'm pointing out.

There everything is lumped together again, and you conclude intelligence is defined by behavior. It is not, because behavior is too broad a word. Hence the distinction is lost.

It is Brooks subsumption that leads to intellegence, and not the layering of behavior.

But it isn't behavior at all. The structure of the subsumption is the intellegence. The layering and behaviors and all the rest are devoid of intelligence.

Emergence can happen, but the triggers and responses that cause outputs to be expressed are where the intelligence lies, emergent, or planned. That's where they all are.

To me, all behaviors are dumb rout responses/programs. Of course, I'm talking about the more exact meaning, being atomic behaviors. Composite behaviors might have some intelligence to them, but it isn't because of the behaviors, its because there's some scheduling structure in with them, and that's where the intelligence is coming from.

Combining them into some sequence so the whole is greater than the sum is where the "greater" part (intelligence) comes from.

Brooks quotes, fyi,

"Intelligence is determined by the dynamics of interaction with the world."

"Intelligence is in the eye of the observer."

- D
- dpa
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Sep 4, 2006 6:44 AM

I think the argument is still the same. I don't see how counting output devices or input sensors defines or limits what a clever human may deduce about the larger environment from those sensors, in some as yet un-thought-of way, and then program that into the robot, which thereafter has a new behavior. You suggest, as I understand, that all possible behaviors can be known just by counting the number of input and output devices on the robot. I disagree, and offer the absurd notion that one can as easily claim to know all violin music just by counting strings on the violin. Isn't that where we are in the discussion? What did I miss?

You also don't see what human creativity has to do with robots. Humans build the robots. The robots are as clever as the human can make them. The humans cleverness and creativity are intimately tied up in the behavior of the robot.

So when you say, "we can know if we have addressed all the robot can do" you are making a statement about the creativity of the humans that designed and built the robot. It's capabilities are limited by their cleverness and their vision.

I have a sonar array made of 4 sensors on one of my robots. I've been working with it for about two years now and I'm continually finding new things I can do with it, new ways of getting information from the data stream, and new ways for the robot to react to this new information. I'm certain persons more clever than myself can come up with many more. So while the I/O on the robot has not changed in a couple of years, the "interconnections" between them have changed substantially, and continue to change.

Now as I understand it, you want to count the number of sonar units and the number of drive motors and then pronounce some limitation on the number of clever ideas that can be devised for using these devices. That is where you lose me.

Your own words were so that "we can know if we have addressed all the robot can do."

All the robot can do. Seems like that requires us to be awfully clever.

Randy you can't weasel out of it by redefining "behaviors" to be just a synonym for your beloved "states." I think most people would agree that a robot seeking to high ground through a cluttered landscape is exhibiting a definable and observable "behavior." (which by the way is kind of cool to watch, remind me to show it to you).

best dpa

- R
- Randy M. Dumse
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Sep 4, 2006 5:13 PM

I think that fits well given my premise.

I don't think so. So far I think "behavior" as a word is so overloaded in meaning, it obscures what's going on, and the subcomponents of behavior are thereby going unnamed. Basic action states (music analogy notes) are what I think of as the real behaviors or as I am now calling them, atomic behaviors. A short fixed sequence of atomic behaviors which might be called an escape behavior as example (trill, or stanza, or maybe even a chorus, perhaps in the music analogy) I have no good name for to distinguish between atomic behaviors and this level. Mode doesn't really fit. Then there are long and complex things called behaviors, like "search for food", or "throw a tantrum", or what-have-you on human levels of action (music analogy melody or even opus). All these are simply called behaviors, and as such it is much to broad to be useful for the layering in behavior based robots.

Well, don't expect a large argument from me. I like reductionism in state machines, and I like to split state machines into even smaller and more insultated machines.

I find often the problem when someone designs a state machine, they make it too complex. Often a few simple machines, with intermachine communications, will have less states, less transitions, and be more robust as you say.

I've told the story several times before, but when I worked on a traffic light system, they wanted a state diagram of its operation. No one had ever been able to do it. I finally solved it by separating what what going on into several simple machines that interacted.

From problem too complex for anyone to solve, to problem solved with several small machines. If you didn't factor the machines like that, the number and complexity of the states just exploded.

Did you see my suggestion, in another post, though, that a robot that does an occassional or random change of state to escape, even if done at the wrong time, in the end looks more intelligent than a robot that does not?

A robot that always cruises will get trapped forever. A robot that occassionally throws in an escape sequence does not.

I think this is an argument of scale. While I get your point, what a curb is to nBot, a telephone pole laying on the ground is to jBot. You can build your robot for the expected environment, but it will still have to deal with those things at another scale, or it will not be robust.

So a robot with 200 foot wheels would be more robut than jBot, because there aren't many manmade structures it couldn't just roll over? What about a skyscrapper? Oh. It's just a mater of scale, and the problem returns at some point.

This is a really interesting derivation, because as I mentioned in another post, it can be used to reduce the number of states or atomic behavior. They way it does it is quite surprizing. By my definition of an atomic behavior, we have two variables instead of one, each with a gain function, and they are linear. Yet the output is not linear. It is a challenge to me to sort out if I think it is an atomic behavior or not.

I've run into a very similar thing in my walking robot. The center of turn when going straight is out at infinity. As it moves in toward the robot, the outer legs take their normal strokes, and the inner legs take shorter ones... until the center of rotation is under one leg, and it just stops and pivots. As the center of rotation mover under that leg and toward the center, the stroke of the leg reverses, and when the center of rotation is directly under the bot, the leg described is now making a full backwards stroke, and the robot's body rotates about a point.

This is attractive, yet fanciful. There are state changes going on. They have just been extracted from the output machine, and moved elsewhere.

But there must have been a mechanism (or transfer function if you prefer) that determined forward progress had stopped, and so rotational progress must begin.

Not knowing how the 0 value in rotate went up when the forward value went down, I am not able to fully comment, but the important point is, there was some function that did that transfer.

I would suggest, what you've done here is factor and hide your state information. Like the traffic light problem, you have reduced the complexity of the output machine, by separating the control to some other simpler machine.

Are you sure you haven't just moved the decision elsewhere?

The output routine uses two vectors. The issue is what fills the two vectors and how it does it.

I don't disagree, and I don't think it is heretical. I actually think it is pivotal. (no pun deliberately intended, but there's certainly one there now I see it written).

- R
- Randy M. Dumse
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Sep 4, 2006 5:32 PM

Would you say equally say, those who study phonetics and phonology are trying limit what can be said? Or are against free speach? or put a limit on human creativity? Absurd!

They wish to identify and count the atomic speach sounds (phoneme / phones). (But they don't deny some clever human can't make some other kind of sound.) I wish to identify and count what is the possible "aphabet" of some particular robots output capability. That doesn't limit what it can do, it only identifies the atomic elements of what it can use to do it.

- D
- dpa
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Sep 4, 2006 6:16 PM

No. This is the misunderstanding I was attempting to correct. The normal navigation mode uses differential distance returned by the sonar array to steer the robot, and the combined distance returned by the sonar to control the robot speed. When the velocity vector is driven to zero, all that is left is the differential. Rotating in place is just normal steering with zero velocity. It need not be activated in a special circumstance because it is always active, steering the robot as a continuous, not descrete, function.

Again. there is no "mechanism that determined forward progress had stopped, and so rotational progress must begin." Indeed "rotational progress" need not "begin" because it is already, and continuously, active.

I think the fascination with continuous behavior as a series of static states may make it more difficult for one to understand, or even accept the possibility of, a different paradigm.

Yeah. Nagle suggested that the IMU + Odometry navigation scheme on jBot was "straightforward." CRM is loaded with punsters.

Straightforward and pivotal. I think you have your "atomic" behaviors. As mentioned previously, seems like to make your argument you have to redefined "behavior" to mean "state," as you like to use the word. When I use the word behavior it has its own unique and richer meaning, not synonymous with the word "state" as you use it.

best, dpa

- R
- Randy M. Dumse
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Sep 4, 2006 9:35 PM

Okay, but this is not what I'm asking about.

But that implies rotation is never zero itself.

I don't buy that. It doesn't fit your example.

In your example. The robot drove down the driveway straight. Otherwise it would have curved or careened into the alley. Instead there is an implication the differential is zero as it drove straight in.

As the forward velocity is reduced to zero as the combined ranges come down, it is not obvious there would be any differential information either.

If you drove in straight, then you are likely perpendicular to the walls, or so close to it, that the differential drive is not sufficient to turn the robot fast enough to get a response - unless your transfer function has weighting and is non linear.

Okay, then how does your robot ever go straight? (Perhaps you are saying it doesn't?)

I get that there is a continuous transfer, no problem for me in that concept.

What I don't get is what prevents them from both becoming zero and the robot stop?

In the Navy our missle control circuits had a little noise added to them. It was called dither. The purpose of this signal was to keep the system from getting wedged into one position. Previous posts you talked about putting a random element in the escape sequence to prevent sticking, same idea.

If you don't have some sort of mechanism to prevent 0 0, then your robot is very likely to get stuck looking at a close perpendicular wall. I haven't seen it do that, so I'm assuming you've done something to prevent that. If not, I'm surprized. But it isn't because I can understand your transfer function is continuous, it is because I don't think as stated it is robust.

To me, the word "state" implies something is "static". The atomic behaviors having variable outputs are somehow a more complex concept than something static. So I am suggesting the use of a constant linear alogrythm to calculate the output is the static part. So they are not truly states as I like to use the word.

Again, I feel the overloading of the word behavior is a road block to our progression in robotics now, so how behavior is amplified, split, augmented or replaced is not much my choice, only that new language which enables discussion of the various aspect of the behavior is my goal.

The reason I have chosen to keep the lowest level of behavior as its true meaning is not so much my choice, as I see it to be a Brooks/Jones choice. Brooks clearly prefers reactive behaviors. Jones clearly prefers servo behaviors. Both are the low level type. So I have been trying to name the low level types as the true behaviors.

If it should be that behavior maintains a meaning for complex sequences of atomic actions appearing to attain a common purpose or goal, I'm fine with that. Then, unscientific use of the word stands, and I can still spank a kid for bad behavior, without assigning which actual atomic action (touching the candy bar, gripping the bar, lifting the bar, unwrapping the bar, putting the bar into the mouth without paying for the bar, etc.).

But without the language to differentiate a big long behavior with variable branches from a little atomic behavior with no subparts and therefore also no branches, we aren't free to think about the problem with any precision.

- D
- dpa
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Sep 5, 2006 6:14 AM

Yes it is. You are asking how the algorithm works, and I'm attempting to describe it. Patience, grasshopper.

Yes. When there are that many sonar reflections rotation is never zero.

It exactly fits the example. This is not conjecture. I'm describing how the algorithm works to produce the actual behaviors that you yourself have witnessed.

There is always some differential with that many sonar detections.

Rotation can only go to zero (drive straight) in the absence of sonar reflections, which is also when velocity is driven to full speed -- out in the open. They can't both be zero.

If velocity is driven to zero because of many sonar reflections, they are ALWAYS asymetrical enough to garauntee non-zero rotation.

It's like balancing a pencil on its point. Sure it's possible, but in the real world it never happens.

(Given that this is all sonar driven behavior and not target acqusition of some kind, also running simultaneously but subsumed, driving the rotations and velocities).

I'm pretty certain I have described this all to you in real-time while watching the robot maneuver and herding it around with our feet. I guess I have not communicated what it's doing as well as I thought I had, which surprises me.

Or perhaps you were viewing it through the filter of "it must all be a series of discrete states," and came away with your own opinions of how it "must" be accomplishing these maneuvers?

We can try again! RBNO?

regards, dpa

- D
- dpa
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Sep 6, 2006 11:44 PM

The comparison to phonetics is not apt, but I give up. I refer you only to your own posting on the "fish and robot" thread, where you discuss adding additional "phonemes" to your existing hardware sensors through the advanced application of cleverness, to suggest that your own experience mitigates against the existence of the "limited alphabet" of atomic elements you wish to count:

message #25.

best, dpa

- C
- Curt Welch
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sun, Sep 10, 2006 6:49 AM

Certainly, our language behaviors are what distinguish us from other animals. I'm not sure what you mean by "reflective" however, unless this is just a reference to our power to generate language behavior internally (our thoughts), and at the same time react to them by producing more internal language (aka reflect on our thoughts). I do strongly suspect that our brain features which support this type of activity (private behaviors) is quite a bit stronger than all the other animals.

I've not seen any evidence however to make me believe that the systems that control our language behavior is different in any significant way from the systems that control all our behavior. Producing language behavior shares all the same problems in common with producing all behavior. The brain must, at all times, be constantly generating a continuous sequence of behaviors. Whether that is mouth and lung motions for the purpose of producing spoken words, or whether it's a complex orchestration of limb movements to make ourselves a sandwich and eat it, the problem is basically the same - how does the brain determine what behavior to produce next?

Very true.

Reinforcement learning is making a comeback and making notable progress such as the success of the TD-Gammon game in the early 90's (based on temporal difference learning algorithms). Algorithms like Q-learning have been developed in the past 20 years. Though I don't follow the research closely, it's reported that there's been an explosion in the field in the past 10 to 20 years.

I came across this interesting 1 hour video today which is a talk about fairly recently (the past 5 years) discoveries on animal learning using Temporal Difference computer models which led to the discovery and understanding of yet another piece of the puzzle in what the brain is doing.

How Do We Predict the Future: Brains, Rewards and Addiction

formatting link

greymatters.ucsd.edu

formatting link

To me, reinforcement learning, the work you are known for publishing some of the first papers on in the context of computers and artificial intelligence, is the only foundation that can explain why, and how, humans use language.

Humans use language to direct all our high level behaviors. We make plans, and follow our dreams, by talking about them, either with others, or just to ourselves.

But, how does he brain learn language, and how does it select what language to generate, and when? Why do we suddenly stop what we are doing, and start talking to ourselves? What triggers that reaction? What controls it? Why do we suddenly generate language in the presence of another human? How does the brain select what language to generate? How does it know when to stop talking and start doing something else?

These are the high level human unique behaviors we must understand and build machines to copy.

We can record knowledge in a book by filling the book up with words. And likewise, we can fill a computer with knowledge in many different ways. But what is the purpose of the knowledge in the machine? There is only one ultimate purpose - to allow the machine, to know what behavior to generate next, at all times. This is the only knowledge that exists in a human brain - the knowledge about what to do next for any given environmental context - about what behavior is the next "note" to play next in our life long symphony (to use the metaphor someone else brought up here).

When we read a book, we can absorb knowledge from the book, into our brain. But how does this happen? How is it stored?

If the ultimate goal is to build a knowledge storing machine that can read books, and talk to other humans, and absorb knowledge through this interaction, then it's obvious you want to build a knowledge database. And you want to give it the power to learn from it's interactions. And, we would like it to have the power to interact with itself, to gain further understanding of it's own knowledge (talk to itself to discover, and create, new knowledge) (which might be the "reflective" thing you talked about above).

So I agree with your ultimate desire to duplicate high level language behavior in a machine, for the purpose of duplicating our must human of behaviors - with the hope of duplicating our most humans of powers at the same time. If we can make a talking machine which can interact with us like a human would, one which would do much better on the Turing test for example than any machine has yet done, that would be fantastic. I would love to have a computer I could ask to go do research for me on the Internet and report back to me what it had learned. I don't need it to have arms and legs or vision.

But, a machine with nothing but the ability to receive, and generate words through a communication channel, has the same fundamental behavior problems to solve, that all animals (and robots - trying not to forget what group this is) needs to solve - what should it do next. You can ask this question many ways - such as, what is the purpose of the machine, or what is its goal, or how does it pick its own goals? How does it demonstrate creativity? How does it demonstrate adaptability.

What makes human behavior intelligent, is that all our behavior is directed towards a purpose. Without a purpose, the machine has now way to know what to do next. Without a purpose, there is now way for the machine to evaluate which behavior is "better". It wouldn't care what it did next - anything would be just as good as the next. What makes us creative, is that we can find new behaviors on our own. How do we do this? By having a system which can understand the value of a behavior never before seen.

Any computer program which is going to attempt to produce human level intelligence, and creative, language behavior, is going to need an evaluation system which assigns value to all behaviors. Without this, the machine won't know a great idea when it has it. It won't know which idea to pursue, and develop further, and which to drop.

Likewise, humans don't have photographic memories. We selectively extract, and keep, the knowledge which we sense as being important from the environment. We follow ideas just like we follow bread crumbs to find food

- we seek out what we believe is valuable.

No knowledge based approach to AI that I've seen, seems to understand these two fundamental issues. First, the only knowledge we have, is knowledge about what is the best behavior to produce in a given context, and second, we have an intrinsic value system which is able to judge the value of all behaviors. It's this value system, that allows the brain, to determine what it should do next, and when a new behavior emerges, to recognize (and reward) its value.

I agree completely that we need to build knowledge systems and solve the problems of high level language production, but I think that most people working in that field have totally missed the mark on what they should be building. They have structured their systems more like electronic books, than like brains. Books are not intelligent. They lay there and do nothing until an intelligent agent interacts with it. We need to build a machine that duplicates the function of the brain reading the book, not the book. And humans make behavior choices (do I read this book, or that other book) based on the perceived value of each behavior.

Reinforcement learning is all about the creation of value based behavior systems. They explain how humans evolve their complex "intelligent" behaviors, and they explain what intelligent behavior is. They explain how it is possible for humans to be creative and inventive. They also all use a knowledge database to direct all their behaviors. But that knowledge database doesn't just store associations between facts. It instead creates a knowledge database in the form a value function which answers the question of how valuable different behaviors are, in different contexts. That's the type of knowledge database you must build, in order to direct a machine to act intelligently.

And it makes no difference if you chose to limit the machine to only being able to produce language behaviors, or if it's a robot with arms and legs and eyes and ears. If you want it to be intelligent, it has to be a reinforcement learning machine which directs behavior though a value system, and which also, has the power to evolve it's value system through experience.

The advantage to working with robots first, is that we can learn to produce strong reinforcement learning systems for behavior problems which are simpler than the full human language problem first. If you can't build a reinforcement learning machine that can learn to find food in a maze, then you aren't going to get a language machine to work. This is because the maze problem, like language, requires the machine to understand a long history of context (what is the history of turns I've made so far), and produce the correct next behavior, based on that long temporal context. Language, is the same, but even worse. Because the next word I need to speak, might be based on a long context of the last 500 words I just heard. If you can't build a mouse that can learn to correctly react to a small context created by a small maze, there's now way you are going to get a machine to correctly react to a long string of language.

I'm not aware of any robot mouse that can learn how to get itself out of different mazes, or to find food in different mazes, through generic learning techniques (aka a mouse that wasn't hard-coded just to solve mazes). But yet, this is a problem you looked at over 50 years ago. It's the same problem that wasn't solved then, and hasn't yet been solved, but needs to be solved before we are ever going to make a machine use language like humans do. It's an easier version of the same problem and the type of problem we should solve first.

I've been working in this same general areas for 25+ years. If you call it work - it's really more like play. I didn't pick this direction because it was popular - I picked it because it was the only one that looked like the approach that could explain how to create an intelligent machine.

I don't know what motivates the bulk of the AI community, but I never had to make a living, or build a career, so my decisions were not biased by needs like that. I simply wanted to figure out how to build an intelligent machine for the sheer intellectual challenge it presented (and OK, for the potential glory that it might bring if I succeeded in creating something significant before anyone else). But I had no downside to fear - no need to cover my bet to make sure I could keep eating.

Many that are making their living doing AI research, are no doubt highly motivated by projects they believe they can get funding for, and for which they believe they can create a success - to allow them to get funding for the next project. This no doubt motivates people to work on stuff that is currently "popular" in the eyes of the people with funding power. Mostly, because the lack of any significant breakthroughs in the past 40 years, I suspect this has caused people to set there sights far lower - to do simple things, and ignore the hard problems.

Creating new significant algorithms, is hard, and very risky. But I think that's what needs to be done. Instead, people probably look for projects that are only a small step forward - lets build a faster chess machine, or lets build an autonomous car that can drive a little faster.

But trying to understand how to create a value based knowledge data base for open ended high dimension problems like making a robot that can learn to find its way though a maze, requires some new insight into how to structure the database that might come after a 5 year project to look for it, or the 5 year project might produce nothing - greatly reducing the odds the researcher will get any more funding.

I've pre-ordered a copy. I don't like reading on-line all that much. :)

Ultimately, we have to bridge the gap from the top to the bottom. It really makes no difference whether we build from the top down, or the bottom up, as long as the end result is a complete bridge.

So far, I think most the top down approaches have been lost not knowing where they were headed (or headed in a direction that I think was a dead end). Knowing for example that we need to build a knowledge database is a top down issue. Knowing how to structure it, is the problem of not knowing where we are headed. This is because when we look into ourselves, we can't see the mechanisms that create our intelligence, we only see the top level end product - our behavior. So the top level problem is obvious - we need to build a machine to receive, and generate language - strings of words.

Reinforcement learning is one of the bottom up approaches that seemed hopeful very early on, but which never got very far and was given up before much of anything was built, simply because no one could see how to build the bridge any further - there were many problems, and no answers.

But, I think after many different top down, and bottom up approaches have all failed to close the gap, some people are beginning to see the light. Reinforcement learning is the only bottom up approach that explains human behavior. No matter how many problems are left unanswered in how this is implemented, that's the path we must take from the bottom, and it's the point any top-down approach, much close in on. Any top down approach which isn't headed towards creating a reinforcement learning machine, isn't creating machine intelligence - they are just creating yet another type of computer tool (like a game playing program, or a logic reasoning engine, expert system, etc.).

Machine intelligence, requires the type of creativity that only comes from critic based learning systems that can evolve their own complex future looking value prediction functions. Machines can't be intelligently creative, if they can't recognize the value of their own behaviors.

TD-Gammon, is a good example of a machine that can do this. It recognized, and learned on it's own (by playing itself - a very "reflective" behavior) a opening in the backgammon game that none of the expert players used. But after seeing TD-Gammon use the opening, the expert humans analyzed it, and decided it was the best way to play that opening, and now they use it. TD-Gammon showed how machines can be intelligently creative - to create things that no human has created. Other evolution based systems have done the same - (GA or evolution based learning systems are just another type of reinforcement learning).

At the low level, we know how to build "intelligent" machines, like TD-gammon. At the high level, we have built some interesting word and idea manipulation machines - but none that I know of have been built with a reinforcement learning core directing it's behavior - which is why the high end solutions don't look at all "intelligent" no matter how much they seem to "know". They are just not creative purpose driven machines yet.

What we don't know, is how to build a high end reinforcement learning language machine - one which is able to learn open ended language behavior, instead of just playing in a very limited environment like a board game. That's the gap we have to close.

I think playing with robots is a great way to help close the gap from the bottom up. From the top down, no one is going to get anywhere unless they realize where they need to head - which means they need to figure out how to build a knowledge database structured for the purpose of producing constantly changing language behavior, based on a reinforcement learning core. The knowledge database must be structured to answers to the only question the machine ever needs to answer - which is - "What behavior is most likely to produce the most value for the current situation?". It produces a constant string of intelligent behaviors, by continually answering that question (and constantly learning - aka changing its predictions of values based on experience).

- C
- Curt Welch
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sun, Sep 10, 2006 7:06 AM

That's a simple example of exactly what I was talking about. Machines need to respond to not only current inputs, but to temporal sequences of sensory inputs as well. It can't just respond one way to every light level reading. It needs internal memory (what you called state) to record whatever aspect of the temporal sequence is important to it. If the machine must react differently to two light flashes than to one, it needs to record what it has recently seen so when the light flashes, it can react one way, if it's the first flash, but a different way, if it's a second flash.

How many different ways the machine can react to a string of light flashes, is only limited by how much memory it has to store state information and how many behavior rules it can encode as a response to the different state values. It can have different reactions to different combinations of long and short flashes for example - just like a human can respond differently to various long Morse code sequences. The more state it can store, and react to, the more complex its behavior can be. So no matter how few atomic behavior exist in the machine, or how simple the sensors are (a light flash detector), the number of different choices it can make about which behavior to produce next, based on its state, is in effect unlimited

- it's limit is set only by how much state memory it has.

- C
- Curt Welch
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sun, Sep 10, 2006 7:17 AM

Reinforcement Learning.

That was one of the things that has really excited me about that approach. It was orders of magnitude faster than the typical neural nets I had spent years playing with and it had excellent scaling characteristics. There was no exponential explosion as the dimensions increased. At worse, it seemes to be only N Log N.

- C
- Curt Welch
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sun, Sep 10, 2006 8:04 AM

Both STM and LTM are state. The only difference is how far back in time the state is representing a past input.

For your example of a machine that responded differently to one light flash or two, it would have to record state information about the last flash. But if two flashes have to happen within one second for the machine to consider it two flashes instead of one, the state doesn't need to record anything more than one second old. I would state that only recorded information bout what happened in the last 1 second sort term memory.

Now, on the other hand, if a machine was building up a map of the environment it was exploring, and that map was going to stay around forever, then I would call that long term state.

The distinction isn't absolute, it's fairly arbitrary. How long does the state have to stay around before you call it LTM? Pick a number you like. :)

But in the context of the learning machines I've been looking at, it takes on a slightly different meaning to me. If you want to build a general purpose learning machine that can learn to react differently to two light flashes, than one it must have generic state memory to work with. And, unlike us, as programmers who know what we want to make the machine react to, a learning machine doesn't know ahead of time what state information it will need to store. Should it store the number of flashes? Or the length of the last flash? Or how long it's been since the last flash? Or what? A generic learning machine needs to try and store "everything" about past stimulus events to some extent and then produce behaviors which are a function of the current state. Through experience, it must learn that only the number of flashes is important and not for example, the amount of time each light was on, or the exact time between the two flashes.

This type of machine needs memory to store information about past sensory conditions, but since memory is always limited, there's always a limit to how far back in time the state will cover. This state that defines the context the machine is reacting to, is what I call short term memory - because even with a very large machine, the memory is only going to stare a very short amount of sensory information.

But the same type of learning machine needs to learn the value of each reaction it's producing. This is information which it needs to be tracked long term - meaning, for the entire life of the machine what is the average amount of reward each behavior ended up being associated with? This long term statistical information about rewards, is what I call the long term memory of this type of learning machine.

So a reinforcement learning machine that can learn to respond to different temporal sensory patterns, will need a form of short term memory which is the state that defines the current sensory context the machine is responding to, and long term memory of the statical value of every atomic behavior the machine can produce.

Right. They are normally summarized because the behaviors you want to produce are seldom a function of all input data, but just some limited amount of information in the inputs. The light sensor might tell you exactly how long the light has been on, and off, and what the light level is. But, if the robot only needs to count light flashes, it can turn the light level into a simple ON or OFF value by comparing it to a threshold, and it can track on and off transitions and count up to 2. All other information in the sensor can just be filtered out, or thrown away, and not recorded in the state.

Most systems will have more information flowing into it from the sensors, than flowing out in terms of output behaviors. This is very true for humans and I assume most if not all animals. The data following into the system though the sensors is far greater than what it needs to send out - so much data is filtered in the process - we see things and just ignore them - we don't even realize we saw them.

Yeah, he's really just getting into semantics there. His approach is a different way of structuring the specification, but like object oriented vs structured programing, the difference is only semantic and not substantive

- they are Turing machines either way.

If there are no limits on the number of states his modules can have, then they are complete Turning machines - even if it might quickly become impractical to specify the module when it has 2^1000 states. (aka 1000 bits of state information).

Yes, exactly. The purpose of the states are only to allow the machine to produce the outputs it needs to produce. If you know exactly what conditions it needs to produce the different outputs, you only need enough state information to track the sensory information which is important to those output decisions. And for many practical problems, that means very little state information is needed. But if you want to look at how much is available, then you see there is huge amount of information in the sensory inputs, which for a given need, we don't care about, but is there none the less (like the timing of the keys hit (measured to micro second accuracy) and how long he held down each key). All that is in the sensory data - but since we don't need it for a given application, we don't save it - we filter it out.

But, as I talked about above, if you are trying to build a learning machine (like I play with), and don't know ahead of time what sensory information is going to be needed, the problem gets a bit more interesting. The system must save as much information as it can.

It's a term I'm learning to use because of my recent exposure to robotics. It's a good term.

Yeah, this is where getting into Robotics is exposing me to some useful new ideas and techniques. It's why I think robotics is good for AI because it puts some simple real world problems in front of you to solve that are important stepping stones to harder AI problems (such as how to solve generic sensor fusion problems for a generic learning machine when you don't know ahead of time what the sensory data "means" and what needs to be fused or how it needs to be fused).

- R
- RMDumse
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Sep 13, 2006 9:10 PM

Now this helps, because I can remember seeing your robot come to a point and slow down to an "almost stop" and then slowly begin twisting away from some obstacle.

Talk about "emergence" and "in the eye of the beholder" - my external world view (more assignment of anima) to its action at that point, was, it is thinking of which way to go, and taking a very long time to consider it carefully with many sonar probings.

Instead you make me now aware, it was stuck against a flat spot and waiting to fall off one side or the other. Much less romantic.

This fits with the very slow turn away I saw.

As an aside, I hate the pencil balancing on its tip example, because it is offered as an objection to something I'm working on in my GR research.

Or I saw it's actions as louder than your words. :->

Yes, I saw that comment last time you made it, that I look at everything as a state machine and can't help myself, and grumbled at the suggestion, then too. I don't think you are correct on my bias here, but I can certainly see why you might think so.

Actually, I think I'm going to start a new thread, because I found a new comment from Brooks in Flesh and Blood that seems to apply to this discussion, and also fits my preset bias to see things in the light of state.

But in this case, I heard you say it was continuous, and I am willing to believe you, but wanted to ferret into it rather deeply to be sure there isn't some hidden trick that "tipped the pencil". So if I have a bias, it is probably completely antiopodal to yours. That is, I susect people are so biased about hidding state information, that they will do it, and swear up and down they haven't. Yet if you dig a little, you find it there.

For instance, I sitll wouldn't be surprised to find out you have a spot of code somewhere that says, if I've been sitting still over 20 seconds, see if we can't introduce some random noise, so we turn off to one side or another.

I was at the last two looking for a discussion, didn't see you. .

Randy

- R
- RMDumse
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Sep 13, 2006 10:19 PM

I invite you to join the new thread, Syntax and robot behavior. I propose what Brooks is really saying about the differences between animals and men is exactly the ability to store and respond to state. Our syntax, like your example above of usng Morse code, is our ability to store state information, which animals have a much lessor ability to do.

-- Randy M. Dumse

formatting link

Caution: Objects in mirror are more confused than they appear.

- R
- RMDumse
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Sep 13, 2006 11:10 PM

I hear you do not agree, and wish "give up".

I still think it is a strong argument, and this may or may not interest you, but if anyone else has interest, I'll continue a bit why.

Learning Tagalog (language from the Philippines) was about the hardest thing I ever tried. It should be a simple language, you'd think. It has less sounds than many others. Tagalog has only 21 phonemes; 16 consonants and 5 vowels. Syllable structure is relatively simple. Each syllable contains at least a consonant and a vowel. So as a result you get lots of a ba ca na sounding sylables. It is a very phonetic language, pronounced very much like it is written. And while we here in the states tend to remember the place of MacArthur leaving and the death march as Bataan (didn't you pronounce it Ba tawn?) the pronounciation in the Philipines is Ba Ta An, with each sylable getting an almost equal emphasis (at least to our ears).

My point here, is to speak in Tagalog, you get a limited set of sounds you can make. You can still talk about anything you want. But the sounds you can make are limited, and you tend to make lots more of them. You use many more simple sylables in series. But you can say the same thing in both languages.

But since we think in terms of the language we know, native Tagalog speakers don't have "q's" and "th's" in their thoughts. Many of them do speak English, so many have the ability to make far more sounds, but when it comes to communicating in pure Tagalog, those sounds just aren't there. And unless they are very very practiced, when they are speaking English... those sounds are often absent, too. "Tagalish tainted" as it were.

But! If you want to speak Tagalog, you learn a set of sounds, and that's all you need, that's all you'll think with, and that's all you'll need.

The parallel I see for robotics is we can come up with a set of atomic behaviors, and while it might be possible for some special effort to come up with a new one, there is still some basic limited set.

So was it wrong or unfruitful or somehow limiting to human creativity of the Philippino people that someone counted the basic atomic sounds as apparently expressed in their (most popular) language?

Or are Russian speakers, with their 33 character alphabet somehow automatically smarter than either Philippinos or 'canos because they have a richer set of phonemes? Or can you still express the same concepts in all three languages by using enough sylables to come to the same information content? And in all three cases, why did they stop at

21, 26, or 33 letters/phonemes or what have you? Why a limited, loosely-packed, data set?

Randy

- C
- Curt Welch
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Sep 14, 2006 12:24 AM

I think most of our brain is used the same way most of an animal brain is used. The state is stored in the activity of the neurons and for the most part, the higher animals have very similar brains with fairly similar state.

Just learning how to walk around and do simple things like pick up objects or follow something, or to see something we want (like food) and navigate over to it takes up a huge amount of our brain power - yet dogs and monkeys do this is well.

What I think we have that's important is this ability to have memories of past events (and to fabricate new ideas - which I think is nothing more than the brain merging together past memories). Plus, our language skills. But I think the part of the brain that gives us our language skill is really no different than the parts which allow us to do all the same things the animals do. It's just because we have a small section isolated in the correct way that it can be used for language processing that gives us all these extra language powers. I think the idea that we have "syntax" and the animals don't is about as far off the mark as you can get. They have all the same syntax powers we have and they use it all the time in order to interact with their environment. In all cases, the problems are the same, the system must recognize temporal patterns, and learn to produce different behaviors to different temporal patterns. To learn to walk we must process temporal patterns in our vision and sensory data and produce leg motions to keep us from running into things.

The difference is only in the length, and the nature, of the temporal syntaxial patterns the system is optimized to respond to. When we move, we need enough temporal pattern memory to be able to predict what the things around us are doing - how they are moving relative to us. You don't need

10 seconds of temporal pattern memory. Instead, you need more like .5 seconds of temporal pattern memory. But it needs to be a fairly high resolution memory to be able to do things like jump over a rock and land without loosing your balance etc.

Yet, to understand and react correctly to language, you need temporal pattern memory of many seconds. The context created by the words spoken a few seconds ago changes the meaning of the words we hear. That means the language processing part of the brain is creating different reactions to temporal patterns that go back many seconds.

What Brooks' is talking about as syntax I think is nothing more than the same basic temporal pattern reaction skills that exist in all the brain. But for the language section of our brain, it's configured to give us many seconds of low resolution pattern matching, instead of the normal configuration of a very high resolution, but very short length temporal pattern matching system used for driving all our normal physical behaviors.

Animals can't understand language not because they don't have syntax, but only because they their syntax scope is tuned for very short intervals.

They can't understand time like we do, because they don't have the memory we do - the ability to call up past brain states. If you can't call up past brain states to compare side by side with current state, then you have no way to understand time like we understand it.

I suspect both of these skills are just minor tweaks to the basic brain design which is why our brains don't seem very different from that of a chimp for example.

I think the brain is just a temporal reaction machine trained by reinforcement learning. It creates brain state from sensory data which represents the recent temporal patterns that have happened in the sensory data and it produces behaviors as a direct reaction to that state. The abilities we have over the animals are not because we have special hardware in our brain, we just have the same temporal reaction hardware the chimps have except some of it has been configured and tuned slightly differently to allow it to be used for this specialized function of language which requires a much longer short term temporal memory.

- C
- Curt Welch
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Sep 14, 2006 1:01 AM

1 and 0 works nicely. :)

It was probably optimized to best fit the structure of the brains and their mouths and their ears. I suspect, if you could find people in each of these cultures who haven't had their DNA too mixed with other cultures yet, you could probably do some testing and find that people from cultures with languages with higher phoneme counts had better phoneme discrimination skills.

It would be hard to test since someone growing up with that language will have their brain tuned to the needs of the language - so you would have to figure out a way to do testing before the brain had been trained by their language. Maybe adopted babies from one culture to another where they never learned the natural language of their culture would be a way to find some data.

Speech and language skills certainly can vary from person to person so it would make sense to me that an isolated culture might have developed different speech and language skills and that the language they created would match that.

The advantage to using more phonemes is that communication can be faster - you need to produce less syllables to communicate the same idea - i.e., you can communicate faster. We assign meaning to all the shortest sounds first, and as they get used up, new ideas that assigned to longer sequences. Words change over time so that their length is adjusted to match their frequency of use to keep the communication channel optimized (as per Huffman encoding to maximize information flow). Cellular Phone becomes cell phone, becomes cell, as in, "give me a call on my cell", as its frequency of use in our life rises.

So for the same reason, there is pressure to use as many phonemes as the body can produce, and the ear/brain can distinguish, without increasing the error rate. You would then expect the language used by a culture to drift to optimal usage patterns based on the limits of their brain and sound producing hardware.

I suspect if you were to test people, you could actually see this difference.

I know I for example have a hell of hard time telling a short e from a short i (as one of many issues I have with language). I suffered through grade school with speech problems and spelling has likewise been a struggle my whole life. I suspect most of it has been created by a limited sound discrimination ability (on pitch discrimination tests I score way below average even though I test normal on standard hearing tests based on volume). And likewise, my tonal memory skills are way below average, where as my visual memory skills are way above average. I can't tell hear difference between pin and pen, yet when someone pronounces them correctly side by side quickly, I can hear a difference. If more people were like me, then English would have changed by now.

Maybe my brain is better fit for a language like Tagalog? :)