Which term refers to organisms learning through interactions with their environment including reinforcement?

Generate emergent behavior in AI characters using Unreal Engine and the free MindMaker machine learning plugin

In the following article I discuss how to generate emergent behavior in AI characters using Unreal Engine, Reinforcement Learning, and the free machine learning plugin MindMaker. The aim is that the interested reader can use this as a guide for creating emergent behavior in their own game project or embodied AI character.

What is Emergent Behavior?

First a little primer about emergent behavior. Emergent behavior refers to behaviors that are not pre-programmed but develop organically in response to some environmental stimuli. Emergent behavior is common to many if not all forms of life, being a function of evolution itself. It is also more recently a feature of embodied artificial agents. When one employs emergent behavior methods, one does not rigidly program specific actions for the AI, but instead allows them to “evolve” through some adaptive algorithm such as genetic programming, reinforcement learning, or Monte Carlo methods. In such a setup — the behavior is not foretold at the start but allowed to “emerge” based upon a cascading series of events relying to some degree, on chance. This has the advantage of creating AI agents that behave more akin to carbon based life forms, are less predictable, and show a greater diversity of strategic behaviors.

A wide variety of embodied AI agents designed for entertainment or human interaction can benefit from emergent behavior so as to appear less static and more engaging to their user. A recent journal article in Nature demonstrated the robots showing increased variability in regards to response times and motion patterns were perceived as more human.

Emergent behavior can also overcome obstacles to which other forms of AI created with traditional programming methods can not. By allowing the AI to discover solutions that were not foreseen by the developer, the agent can explore a much broader solution space then would be possible with traditional programming methods. It is even possible that emergent behavior methods may lead to the creation of Artificial General Intelligence, an AI that rivals the diversity of skills humans possess.

There is a caveat to using emergent behavior methods however, if you are wanting to create a very specific behavior, for instancing mimicking precisely and repeatedly the actions of a specific animal, emergent behavior techniques may not be the best option. Emergent behavior is not well suited to repetitive high-fidelity tasks where no variation is desired. This is one of the reasons why humans and other organic lifeforms may struggle at highly repetitive tasks, we are variable by nature. Furthermore, when trying to replicate specific behaviors, it can be difficult to recreate the same chance events that led to that action pattern emerging in the real world. For instance, there may be a wide variety of solutions that a certain biological adaption or behavior evolved to fulfill, with the outcome modulated by chance occurrences. Therefor not all emergent methods will necessarily arrive at the same solution. An example is the way convergent evolution recreated the eye on several occasions but not necessarily with exactly the same form.

Emergent behavior methods are good for creating a diversity of behaviors, some of which might be similar to the behaviors of real animals but unlikely to be EXACTLY the same. On the other hand, one can expect plenty of interesting and unique behaviors to be generated with these methods. For example one can look at the variety locomotion methods discovered by a reinforcement learning agent to traverse a landscape.

The first step in creating emergent behavior is to decide which behavior one would like to make emergent and what the goal of that behavior will be. This in some sense is the easy part. The next portion is slightly more complicated and requires an overview of some theory on emergence.

Different Types Of Emergent Behavior

When one addresses topics of emergence, there is a distinction to be made between open-ended emergence and static or “fixed point” emergence. In fixed point emergence, a variety of behaviors may emerge at the outset but they will steadily converge to to a single solution or strategy and the amount of emergence will decrease over time. In essence this happens when there is a static global solution to the problem for which the behavior is employed to solve. Consider the game of tic-tac-to: while an AI employing emergent techniques such reinforcement learning or genetic algorithms might initially display a wide variety of strategies in the game of tic-tac-toe it would fairly quickly converge to the single dominant solution for the game. At this point no further emergence or strategies would appear.

When I speak of emergent behavior, and I believe what people generally think of when they approach the topic is “open ended emergence”. In this form of emergence there is no fixed solution, and an agent will continue generating new behaviors perpetually. To create such open ended emergence can be more complicated then creating fixed point emergence and one must consider the structures necessary to create it. Thankfully there has recently been some progress in understanding and codifying the requirements for open ended emergence, in particular the work done by Joel Liebo and others on auto-curricula. In a seminal paper “Autocurricula and the Emergence of Innovation from Social Interaction” the authors layout some of the conditions under which one can expect open ended emergence to occur. The following tabulation will help in understanding when to expect open-ended emergence.

Recipes for Open-Ended Emergence

1. Where the solution set to the behavior or goal is extremely large and therefore, for all intents and purposes, the amount of emergence will see no top-end during the time period under examination. This is not true open ended emergence in that there may exist global solutions which if given enough time the agent would discover and cease to adapt. In many cases however this could be beyond the time span of a human life so from perspective of an observer, the emergence would be open ended. An example might be the game of chess, where a global solution is thought to exist as in tic-tac-toe, but due to the complexity of the game, this has not yet been discovered and is unlikely to be in our lifetime.

2. Another recipe for creating open-ended emergence is to employ a behavior or goal state that is dependent on an environment which is continually changing. Think of the carbon based life forms on earth — the environment is continually shifting due to planetary climate conditions ensuring that animals must always be adapting to survive and there is no upper boundary on the level of emergence.

3. A third recipe for open-ended emergence is multi-agent scenarios where cooperation or competition is involved and agents are employing some adaptive learning strategy. In such a situation, an agent must adapt to the strategy of its companion or competitor and this in turn ensures that the other must adapt as well, creating a feedback loop of continual adaptation, a kind of evolutionary arms race. While this can lead to emergence, it can also lead to cycles of repeating behavior which loop indefinitely until some element in the environment nudges them out of cycle. It is not a true equilibrium in that the behavior or strategy isn't fixed, but the cycle itself becomes a kind of equilibrium. Ensuring that the environment is sufficiently complex and dynamic one way to avoid these cyclical behaviors or strategies.

Demonstrating Open-Ended Emergence

In the following example created with Unreal Engine, I chose to use the 3rd recipe listed above, a multi agent scenario involving populations from two virtual species. We will call one of them rhinos and the other, tigers. Both groups explore their environment looking for “berries”, which take the form of large circular objects in the Unreal Engine game environment. To make matters more complex a random assortment of these berry patches are populated by wild birds which can screech, scaring off the Tigers and Rhinos alike. However if a tiger or rhino teams up with another individual to approach the berries together, they scare the birds into silence. The can now access the berries though they will have to split them between each other.

In addition to this cooperative behavior, they have a competitive behavior in which they can issue a roar(tigers) or start stomping(rhinos), scaring off an agent from the opposing species, but also costing themselves valuable energy. In many ways this scenarios replicates some of the cooperative and competitive trade-offs available to animals in nature and for that matter, human corporations.

To compete or cooperate is one of the oldest questions, and by simulating this among virtual gents, we can gain valuable insights into the conditions that lead to various competitive or cooperative outcomes. Using such virtual agents, we can also engender much of the same behavioral complexity that one might observe in the animal kingdom or perhaps even in humans. A recent paper published by the OpenAI group was able to demonstrate the emergence of tool-use as a function of multi-agent competition among AI characters playing the game of hide and go seek.

Something similar can be done with any population of adaptive virtual agents that will be playing a game between each other in which the payoffs are dependent upon the other players strategy and the environment is being affected by their behaviors. The goal can be anything, only there must be some means for the agents to receive feedback about their environment including actions taken by the competitor and the ability to vary their own actions in response. Whether this goal is set by the player or by the programmer does not make much of a difference in terms of complexity. Choosing to have it set by the player could invite interesting options regarding the narrative of a game. The important thing to note — it doesn’t really matter who is selecting the goal of the behavior, only that this meets the requirements above for competition or cooperation.

But how does the emergence actually “emerge”? One way is using reinforcement learning or genetic algorithms. In our case we have a population of virtual agents employing random changes in their behavior to see if the goal state is affected. Agents than prioritize those behaviors that lead to good results. In genetic algorithms this is done by a kind of crossover function that occurs within the population, where randomly occurring good adaptions are conserved, and the comparatively not-so- useful ones are discarded. This is determined by a fitness function. This doesn’t need to look like sexual reproduction or anything of the kind since the existing agents can be seamlessly replaced by their adapted counterparts without the user seeing any of the crossover that is occurring. In our example we will be using reinforcement learning, an algorithmic technique I have covered extensively in another series of articles.

Similar to genetic algorithms, reinforcement learning relies upon the law of large numbers, that is, given enough random behavior, good actions will emerge and can be prioritized in the future. Initially the agent chooses a random series of actions, but if they lead to the agent receiving a reward, the value of that reward gets attributed to the actions that the agent took to recieve it is therefore made more likely to repeat those actions in the future. When this process of random actions is repeated sufficiently, those actions that are causally related to the agent receiving a reward will be distinguished from those that were mere chance events. This is why reinforcement learning is at heart, a causal algorithm and can be used to detect cause and effect relationships.

One can visualize this process like so.

Image by Author

For this project I employed the free MindMaker plugin and the Stable Baselines reinforcement learning suite of algorithms. With MindMaker, we can easily deploy a variety of RL algorithms in the Unreal Engine game environment using the OpenAI Gym format. This provides a universal structure for deploying reinforcement learning algorithms on a variety of simulated environments.

Summary and Future Developments

The simple setup described above results in a situation where the behavior of the differences digital species is unlikely to reach a stable equilibrium. The random distribution of birds at the berry patches ensures that the environment is always changing, and this in turn keeps changing the potential benefits to cooperation or competition for the agents. The activities of the agents themselves also effects the ratio of berry patches with birds and those without — as more agents cooperate with each other, there is less incentive to cooperate because the ratio of berry patches without birds will increase.

The idea is that these interacting forces create an environment that is always fluctuating and unstable, driving strategic adaptation. Given that our agents have minimal ways to interact with their environment — they cannot trap their competition or hide the berries using objects in the environment, our paradigm is unlikely to result in something as interesting as emergent tool use. It does however avoid fixed equilibrium and creates a continually shifting assortment of cooperative and competitive strategies. This can provide a far more interesting group of behaviors than typically appears in open world video games. I believe it is likely therefore that emergent behavior techniques such as the ones outlined here will dominate the next generation of video game AI, creating ever more interesting and alluring characters and behavioral repertoires.

Aaron Krumins is the author of “Outsmarted — The Promise and Peril of Reinforcement Learning” and has a background in web applications and machine learning. He currently works as a freelance machine learning consultant.

What is the term that refers to organisms learning to associate their Behavioural responses with specific consequences?

In operant conditioning, organisms learn to associate a behavior and its consequence (Table 1). A pleasant consequence makes that behavior more likely to be repeated in the future.

Which method of learning is a type of learning in which an organism learns through the consequences of its behavior?

In operant conditioning, organisms learn, again, to associate events—a behavior and its consequence (reinforcement or punishment).

What type of learning occurs when an organism responds in a particular way?

Classical conditioning refers to learning that occurs when a neutral stimulus (e.g., a tone) becomes associated with a stimulus (e.g., food) that naturally produces a behaviour. After the association is learned, the previously neutral stimulus is sufficient to produce the behaviour.

What type of learning occurs when an organism makes connections between stimuli or events that occur together in the environment?

Associative learning occurs when an organism makes connections between stimuli or events that occur together in the environment.

Toplist

Neuester Beitrag

Stichworte