Battle¶

Import	`from magent2.environments import battle_v4`
Actions	Discrete
Parallel API	Yes
Manual Control	No
Agents	`agents= [red_[0-80], blue_[0-80]]`
Agents	162
Action Shape	(21)
Action Values	Discrete(21)
Observation Shape	(13,13,5)
Observation Values	[0,2]
State Shape	(45, 45, 5)
State Values	(0, 2)

A large-scale team battle. Agents are rewarded for their individual performance, and not for the performance of their neighbors, so coordination is difficult. Agents slowly regain HP over time, so it is best to kill an opposing agent quickly. Specifically, agents have 10 HP, are damaged 2 HP by each attack, and recover 0.1 HP every turn.

Like all MAgent2 environments, agents can either move or attack each turn. An attack against another agent on their own team will not be registered.

Arguments¶

battle_v4.env(map_size=45, minimap_mode=False, step_reward=-0.005,
dead_penalty=-0.1, attack_penalty=-0.1, attack_opponent_reward=0.2,
max_cycles=1000, extra_features=False)

map_size: Sets dimensions of the (square) map. Increasing the size increases the number of agents. Minimum size is 12.

minimap_mode: Turns on global minimap observations. These observations include your and your opponents piece densities binned over the 2d grid of the observation space. Also includes your agent_position, the absolute position on the map (rescaled from 0 to 1).

step_reward: reward after every step

dead_penalty: reward when killed

attack_penalty: reward when attacking anything

attack_opponent_reward: reward added for attacking an opponent

max_cycles: number of frames (a step for each agent) until game terminates

extra_features: Adds additional features to observation (see table). Default False

Action space¶

Key: move_N means N separate actions, one to move to each of the N nearest squares on the grid.

Action options: [do_nothing, move_12, attack_8]

Reward¶

Reward is given as:

5 reward for killing an opponent
-0.005 reward every step (step_reward option)
-0.1 reward for attacking (attack_penalty option)
0.2 reward for attacking an opponent (attack_opponent_reward option)
-0.1 reward for dying (dead_penalty option)

If multiple options apply, rewards are added.

Observation space¶

The observation space is a 13x13 map with the below channels (in order):

feature	number of channels
obstacle/off the map	1
my_team_presence	1
my_team_hp	1
my_team_minimap(minimap_mode=True)	1
other_team_presence	1
other_team_hp	1
other_team_minimap(minimap_mode=True)	1
binary_agent_id(extra_features=True)	10
one_hot_action(extra_features=True)	21
last_reward(extra_features=True)	1
agent_position(minimap_mode=True)	2

State space¶

The observation space is a 45x45 map. It contains the following channels, which are (in order):

feature	number of channels
obstacle map	1
team_0_presence	1
team_0_hp	1
team_1_presence	1
team_1_hp	1
binary_agent_id(extra_features=True)	10
one_hot_action(extra_features=True)	21
last_reward(extra_features=True)	1

Version History¶

v0: Initial MAgent2 release (0.3.0)