Data Analytics and AI
Build a custom Gym environment
Yizhe Zhang · May 6, 2021

Assume that you have tried all the games of Gym. Then you may ask that how to design a custom environment to solve your interested problem.

First, as the same as a build-in Gym environment. You need to design your own reward function that will return a score when your agent completes an action. In this article, I told you all details by an example about the Gomoku game.

Gomoku is a competition with two pieces, Black and White. Both pieces alternative drop on a chessboard. According to the rules, the black piece needs to go the first step. When either one firstly complete to link five pieces to a line it wins the game. Wither the line is vertical, horizontal, or diagonal.

So now we can start to design reward rules:

Action Score
A normal step (1 / Rows * Columns)
Win the game 1
Loss the game -1
An invalid step -10

If the environment finds that this action won the game or an invalid step. The game will finish.

Next, we can start to code the custom environment.

You need to import some necessary model:

import gym
from gym import spaces
import numpy as np

An example of Gym environment is:

class CustomEnv(gym.Env):
  metadata = {'render.modes': ['human']}

  def __init__(self):
    super(CustomEnv, self).__init__()
    # You will set some initial variables.

  def step(self, action):
    # You need to design some reward function to reactive the action.

  def reset(self):
    # You will reset the environment

  def render(self, mode='human', close=False):
    # you will print the current render to user.

You can set some constants about the game.

BLACK = 1
WHITE = 2

ROW = 15
COLUMN = 15
WINNUM = 5

Then we begin to design the init function

def __init__(self, player=BLACK):
  super(GomokuEnv, self).__init__()

  # Set the Gomoku current player
  self.current_player = player
  # Set the Gomoku chessboard
  self.chessboard = np.zeros((self.ROW, self.COLUMN, 1))

  self.action_space = spaces.Box(low = 0, high = self.ROW - 1, shape=(2,), dtype=np.uint8)

  self.observation_space = spaces.Box(low = 0, high = 2, shape=(self.ROW, self.COLUMN, 1), dtype=np.uint8)

  # StableBaselines throws error if these are not defined
  self.spec = None
  self.metadata = None

About the above codes, the most important parts are setting the action_space and observation_space . action_space defines each step parameters scales. And observation_spave defines the game space parameters scales.

Next, you need to design how to react when you receive a new step, reward or punish.

def step(self, action):

  # You need to define rules to reward or punish this step.
  ....

  # Return steps information
  return self.chessboard, reward, done, info

The step function will return four parameters. The first one is the game space. Secord one is a score, done means this game is complete or continued. You can return some custom message by the last variable info

Remember if you game not only one agent, you need to switch current_player to the next one before you return the step function's parameters.

Then you can start to code the reset function. If the current game is finished, your environment can start a new game.

  def reset(self, player=BLACK):
    # Reset the state of the environment to an initial state

    # Set the Gomoku current player
    self.current_player = player
    # Set the Gomoku plane
    self.chessboard = np.zeros((self.ROW, self.COLUMN, 1))

    return self.chessboard

Finally, if you want to display something to monitor the current status. You need to complete the function of render

def render(self, mode='human', close=False):
  # Render the environment to the screen
  print(self.chessboard)

After you finished all the above steps, you successfully designed a custom environment for your own game. You can use it to train an agent.

# Create GomokuEnv environment
env = GomokuEnv()

The entire code of Gomoku Environment

class GomokuEnv(gym.Env):
    metadata = {'render.modes': ['human']}

    BLACK = 1
    WHITE = 2

    ROW = 15
    COLUMN = 15
    WINNUM = 5

    def __init__(self, player=BLACK):
        super(GomokuEnv, self).__init__()

        # Set the Gomoku current player
        self.current_player = player
        # Set the Gomoku plane
        self.chessboard = np.zeros((self.ROW, self.COLUMN, 1))

        self.action_space = spaces.Box(low = 0, high = self.ROW - 1, shape=(2,), dtype=np.uint8)

        self.observation_space = spaces.Box(low = 0, high = 2, shape=(self.ROW, self.COLUMN, 1), dtype=np.uint8)

        # StableBaselines throws error if these are not defined
        self.spec = None
        self.metadata = None

    def step(self, action):
        # Execute one time step within the environment
        row = 0
        column = 0
        if action[0] is not None:
          row = int(action[0])
          column = int(action[1])


        # invalid action
        if self.chessboard[row, column] != 0:
            reward = -10
            done = True
        else:
            # drop
            self.chessboard[row, column] = self.current_player

            # iswin game
            if self.isWin(row, column, self.current_player) == True:
                reward = 1
                done = True
            else: # Reward 1/225
                reward = 1/(self.ROW * self.COLUMN)
                done = False
                self.current_player = self.BLACK if self.current_player == self.WHITE else self.WHITE

        info = {}

        # Return steps information
        return self.chessboard, reward, done, info

    def reset(self, player=BLACK):
        # Reset the state of the environment to an initial state

        # Set the Gomoku current player
        self.current_player = player
        # Set the Gomoku plane
        self.chessboard = np.zeros((self.ROW, self.COLUMN, 1))

        return self.chessboard

    def render(self, mode='human', close=False):
        # Render the environment to the screen
        print(self.chessboard)

    def isWin(self, row, column, piece):
        tempChessboard = self.chessboard.copy()

        tempChessboard[row, column] = piece

        # vertical
        for i in range(column, column - 6 if column - 6 >= -1 else -1, -1):
            if tempChessboard[row, i] == piece and i > 0:
                continue

            tempi = 0 if i == 0 and tempChessboard[row, i] == piece else i + 1

            for j in range(column, column + 6 if column + 6 <= self.COLUMN else self.COLUMN):
                if tempChessboard[row, j] == piece and j < self.COLUMN - 1:
                    continue

                tempj = self.COLUMN - 1 if j == self.COLUMN - 1 and tempChessboard[row, j] == piece else j - 1

                if tempChessboard[row, tempi:tempj + 1].shape[0] >= self.WINNUM:
                    return True
                break
            break

        # horizontal
        for i in range(row,row - 6 if row - 6 >= -1 else -1, -1):
            if tempChessboard[i, column] == piece and i > 0:
                continue

            tempi = 0 if i == 0 and tempChessboard[i, column] == piece else i + 1

            for j in range(row, row + 6 if row + 6 <= self.ROW else self.ROW):
                if tempChessboard[j, column] == piece and j < self.ROW - 1:
                    continue

                tempj = self.ROW - 1 if j == self.ROW - 1 and tempChessboard[j,column] == piece else j - 1

                if tempChessboard[tempi:tempj + 1, column].shape[0] >= self.WINNUM:
                    return True
                break
            break

        # positive diagonal /
        for i in range(row, -1, -1):
            dis = row - i
            j = column + dis

            if j < self.COLUMN and tempChessboard[i, j] == piece:
                continue

            if j >= self.COLUMN:
                j = self.COLUMN - 1

            tempRight = j if j == self.COLUMN and tempChessboard[i, j] == piece else j - 1

            for i in range(row, self.ROW):
                dis = i - row
                j = column - dis

                if tempChessboard[i, j] == piece and j >= 0:
                    continue

                if j < 0:
                    j = 0

                tempLeft = j if j == 0 and tempChessboard[i, j] == piece else j + 1

                if (tempRight - tempLeft + 1) >= self.WINNUM:
                    return True
                break
            break

        # negative diagonal \
        for i in range(row, -1, -1):
            dis = row - i
            j = column - dis

            if tempChessboard[i, j] == piece and j >= 0:
                continue
            if j < 0:
                j = 0

            tempLeft = j if j == 0 and tempChessboard[i, j] == piece else j + 1

            for i in range(row, self.ROW):
                dis = i - row
                j = column + dis

                if j < self.COLUMN and tempChessboard[i, j] == piece:
                    continue

                if j >= self.COLUMN:
                    j = self.COLUMN - 1

                tempRight = j if j == self.COLUMN and tempChessboard[i, j] == piece else j - 1

                if (tempRight - tempLeft + 1) >= self.WINNUM:
                    return True
                break
            break

        return False

# Create GomokuEnv environment
env = GomokuEnv()