We learnt previously to create simple custom Gym environments. We also learnt to create robotics simulations with the Pybullet engine. We will now combine these two skills to implement custom robotics environments that can then be used to train RL agents.
I implemented some custom environments here. Please follow the installation instructions. Some environments require to use ROS, a set of software libraries for building robot applications. We will learn how to use it in a future post.
The Pybullet environments require an XML file (generally in URDF, SDF or MJCF format) that describes the robot geometry and physical properties.
Environments description
Name | Action space | Observation space | Rewards |
---|---|---|---|
balancebot-v0 | Discrete(9): used to define wheel target velocity | Box(3,): [cube orientation , cube angular velocity , wheel velocity] | 0.1 - abs(self.vt - self.vd) * 0.005 |
particle-v0 | Box(2,): [force_x, force_y] | Dict(“achieved_goal”: [coord_x, coord_y], “desired_goal”: [coord_x, coord_y], “observation”: [pos_x, pos_y, vel_x, vel_y]) | - dist (dense) or bool(dist <= distance_threshold) (sparse) |
Reacher2Dof-v0 | Box(2,): [0.05 * torque_1, 0.05 * torque_2] | Box(8,): [target_x, target_y, dist_to_target_x, dist_to_target_y, joint0_angle, joint0_vel, joint1_angle, joint1_vel | [change in dist to target, electricity_cost, stuck_joint_cost] |
Reacher2Dof-v1 | Box(2,): [0.05 * torque_1, 0.05 * torque_2] | Dict(“achieved_goal”: [tip_x, tip_y], “desired_goal”: [target_x, target_y], “observation”: same as above ) | - dist |
widowx_reacher-v5 | Box(6,): [angle_change_joint1, angle_change_joint2, angle_change_joint3, angle_change_joint4, angle_change_joint5, angle_change_joint6] | Box(9,): [target_x, target_y, target_z, joint_angle1, joint_angle2, joint_angle3, joint_angle4, joint_angle5, joint_angle6] | - dist ^ 2 |
widowx_reacher-v7 | Box(6,): [angle_change_joint1, angle_change_joint2, angle_change_joint3, angle_change_joint4, angle_change_joint5, angle_change_joint6] | Dict(“achieved_goal”: [tip_x, tip_y, tip_z], “desired_goal”: [target_x, target_y, target_z], “observation”: same as above ) | - dist ^ 2 |
ReachingJaco-v1 | Box(7,): [joint1_angle + 0.05 * action1, joint2_angle + 0.05 * action2, joint3_angle + 0.05 * action3, joint4_angle + 0.05 * action4, joint5_angle + 0.05 * action5, joint6_angle + 0.05 * action6, joint7_angle + 0.05 * action7] | Box(17,): [gripper_x - torso_x, gripper_y - torso_y, gripper_z - torso_z, gripper_x - target_x, gripper_y - target_y, gripper_z - target_z, joint_angle1, joint_angle2, joint_angle3, joint_angle4, joint_angle5, joint_angle6, joint_angle7, gripper_orient_x, gripper_orient_y, gripper_orient_z, gripper_orient_w] | - dist |
CartPoleStayUp-v0 | Discrete(2): 0 = “move cart to position - pos_step (move left)” or 1 = “move cart to position + pos_step (move right)” | Box(4,): [base_position, base_velocity, pole_angle, pole_velocity] | if not done: reward = reward_pole_angle + reward_for_effective_movement else reward = -2000000 |
MyTurtleBot2Maze-v0 | Discrete(3): 0 = “move forward”, 1 = “turn left”, 2 = “turn right” | Box(6,): [laser_scan array] | if not done: reward = +5 (forward) or +1 (turn) else reward = -200 |
MyTurtleBot2Wall-v0 | Discrete(3): 0 = “move forward”, 1 = “turn left”, 2 = “turn right” | Box(7,): [discretized_laser_scan, odometry_array] | if not done: reward = +5 (forward) or +1 (turn) ; if distance_difference < 0: reward = +5 ; if done and in desired_position: reward = +200 else reward = -200 |
JacoReachGazebo-v1 | Box(6,): [joint_angle_array] | Box(12,): [joint_angle_array, joint_angular_velocity_array] | - dist |
JacoReachGazebo-v2 | Box(1,): [angle1_increment] | Box(4,): [joint1_angle, target_x, target_y, target_z] | - dist |
Balance Bot (Pybullet)
A simple Pybullet robot. The goal is to maintain the cube upwards as long as possible. Adapted from this repo.
Particle
A Goal Env (for testing Hindsight Experience Replay) where a red particle must reach the green target in a 2D plane. The particle is controlled by force. Adapted from here
## Reacher2D (Pybullet) An articulated arm in a 2D plane composed of 1 to 6 joints. The goal is to bring the tip as close as possible to the target sphere. Adapted from [this repo](https://github.com/benelot/pybullet-gym).
## WidowX arm (Pybullet) The WidowX robotic arm in Pybullet. The goal is to bring the tip as close as possible to the target sphere. Adapted from [this repo](https://github.com/bhyang/replab).
## Jaco arm (Pybullet) The Jaco arm in Pybullet. The goal is to bring the tip as close as possible to the target sphere. Adapted from [this repo](https://github.com/Healthcare-Robotics/assistive-gym).
## Cartpole3D (ROS / Gazebo) The Cartpole in ROS / Gazebo. The goal is to balance the pole upwards as long as possible. Adapted from [this repo](https://bitbucket.org/theconstructcore/openai_examples_projects/src/master/).
## Turtlebot2 Maze (ROS / Gazebo) The Turtlebot2 robot in ROS / Gazebo. The goal is to avoid touching the walls. Adapted from [this repo](https://bitbucket.org/theconstructcore/openai_examples_projects/src/master/).
## Turtlebot2 Wall (ROS / Gazebo) The Turtlebot2 robot in ROS / Gazebo. The goal is to avoid touching the wall. Adapted from [this repo](https://bitbucket.org/theconstructcore/openai_examples_projects/src/master/).
## Jaco arm (ROS / Gazebo) The Jaco arm in ROS / Gazebo. The goal is to bring the tip as close as possible to the target sphere.
## Minimal Working Example: foo-v0 A minimal environment to illustrate how custom environments are implemented. ## Tic-Tac-Toe environment The classic game made as a Gym environment.
Leave a comment