Featured image for Building Chess Scanner.
February 16, 2023
13 min read▪︎

Building Chess Scanner

A high-level overview of how I trained a deep learning model to detect chess positions from images.

The API is currently deployed at https://chess-scanner.sumi.re, and is free to use. Please do not abuse it.

Intro

I enjoy watching chess content on YouTube. Sometimes I see an interesting position in a video, and wish to try my own variations. Now, how can I open the same position in Lichess’ analysis tool or Stockfish? I could install a browser extension like Chessvision.ai, but where would the fun be in that? In this post I’ll outline how to train your own chess position detector.

The Idea

  1. As an input, the program receives a cropped image of a chessboard.
  2. The image is evenly divided into an 8×8 grid, i.e. each cell is one square on the chessboard.
  3. A multiclass classifier is run on each square, determining the piece (if any), on that square.
  4. From this, a chessboard is constructed, and the position is returned as a FEN string, which can then be pasted into any chess analysis tool.

As there are 64 squares on the board, the classifier needs to be extremely accurate. A classifier that is 99% accurate is not enough, as the probability that it will correctly classify all 64 squares is only 0.996453%0.99^{64} \approx 53\%.

Step 1: The Data

It’s easy to think that all chess pieces look pretty much the same, but there’s actually a wide variety of different piece set designs, even in digital chessboards. In order for the model to be able to generalize, it must be trained on various piece sets and boards. Since Lichess and Chess.com are the most popular online chess websites, I decided to use their boards and piece sets for training.

White knights in various styles.
White knights in various styles.

Instead of creating a predetermined number of training and testing images, I decided to generate them on the fly. This gives us several advantages:

  • We can generate an almost endless amount of training data with random piece sets and boards, without having to store it on disk.
  • We can also do data augmentation on the fly.
  • We can control the distributions of the training data, e.g. the proportion of empty squares.
  • The model is less likely to overfit due to regularization from the data augmentation and randomness of the training data.

Let’s first read the images of the pieces and boards into lists. training.zip contains two folders, boards and pieces. boards has 45 PNG images of the pieces of a particular set. The naming follows the convention <piece>_<color>.png, where <piece> is one of k, q, r, b, n, or p, and <color> is either w for white or b for black.

import cv2
import numpy as np
import glob
import os

tile_images = []
piece_images = []

n_boards = 45
n_piece_sets = 59

# Make sure board_size is a multiple of 8
board_size = 400
square_size = int(board_size / 8)

# Process boards
for i in range(1, n_boards + 1):
  img = cv2.imread(f"boards/{i}.png")
  img = cv2.resize(img, (board_size, board_size), fx=0.5, fy=0.5, interpolation=cv2.INTER_AREA)

  # Iterate over each 50×50 square on the board
  for n in range(8):
      for m in range(8):
          square = img[n * square_size : (n + 1) * square_size, m * square_size : (m + 1) * square_size]
          # Append as [label, image]
          tile_images.append(["empty", square.flatten()])

# Process pieces
for i in range(1, n_piece_sets + 1):
  pieces = glob.glob(f"pieces/{i}/*.png")

  for piece in pieces:
    piece_type = piece.split("/")[-1].split(".")[0].split("_")[0]
    piece_color = piece.split("/")[-1].split(".")[0].split("_")[1]

    if piece_color == "w":
      piece_type = piece_type.upper()

    img = cv2.imread(piece, cv2.IMREAD_UNCHANGED)
    img = cv2.resize(img, (square_size, square_size), fx=0.5, fy=0.5, interpolation=cv2.INTER_AREA)

    piece_images.append([piece_type, img.flatten()])


tile_images = np.array(tile_images, dtype=object)
piece_images = np.array(piece_images, dtype=object)

We resize the boards to 400×400 pixels, which means that each trainable square is going to be 50×50 pixels in size.

For each board, we iterate over each square, extract it, and add it to the tile_images list. We also add the label empty, to indicate an empty square.

As the pieces are transparent PNG images, we need to use the option cv2.IMREAD_UNCHANGED, when reading the image. Otherwise the alpha channel will be stripped, and we cannot overlay the pieces on top of the board images. Similarly, we read all the piece images, resize them and add them to the piece_images list with their respective labels.

The reason why we separated the board and piece images into two separate lists is because we want to control the amount of empty squares in our dataset. On a chessboard, there can be no more than 32 pieces on the board at a time (16 white, 16 black, no captures), meaning that given any position, the number of empty squares is greater than or equal to the number of non-empty squares. When generating the training data, the proportion of empty squares in the data was set to 50%.

Now we can create a function that generates random chessboard pieces on the fly.

import random

def get_sample(empty_prop, augmentation=True):
  tile = random.choice(tile_images)[1].reshape(square_size, square_size, 3)

  if random.random() < empty_prop:
    return tile / 255.0, "empty"
  else:
    piece = random.choice(piece_images)
    piece_label = piece[0]
    piece_image = piece[1].reshape(square_size, square_size, 4)

    piece_alpha = piece_image[:,:,3]
    alpha = cv2.merge([piece_alpha, piece_alpha, piece_alpha])
    piece_image = piece_image[:,:,0:3]
    blend = np.where(alpha==(0,0,0), tile, piece_image)

    if augmentation:
      if random.random() < 0.2:
          blur_amount = random.choice([1, 3, 5])
          blend = cv2.GaussianBlur(blend, (blur_amount, blur_amount), 0)

      if random.random() < 0.5:
          shift_amount = random.choice([1, 2, 3, 4, 5])
          shift_direction = random.choice([-1, 1])
          shift_axis = random.choice([0, 1])
          blend = np.roll(blend, shift_amount * shift_direction, axis=shift_axis)

      if random.random() < 0.10:
        angle = random.choice([-2, -1, 1, 2])
        image_center = tuple(np.array(blend.shape[1::-1]) / 2)
        rot_mat = cv2.getRotationMatrix2D(image_center, angle, 1.0)
        blend = cv2.warpAffine(blend, rot_mat, blend.shape[1::-1], flags=cv2.INTER_LINEAR)

    return blend / 255.0, piece_label

The get_sample function is used to generate a single training image. The argument empty_prop is the probability that the square will be empty. The function returns a 50×50 image and its corresponding label. It first selects a random image of an empty square. Then, depending on the value of empty_prop and randomness, it either returns the empty square image, or it overlays a random piece on top of it. If augmenation is set to True, some random transformations may be applied to the image. The images can be slightly blurred, shifted, or rotated. All samples are normalized to the range [0,1][0, 1].

Below is a sampled collection of 1200 randomly generated chessboard squares. The probability of an empty square was set to 0.5.

1200 randomly generated chessboard squares.
1200 randomly generated chessboard squares.

Next, we can create a function that generates a batch of training data for the model.

piece_labels = ["empty", "k", "q", "r", "b", "n", "p", "K", "Q", "R", "B", "N", "P"]

def generate_batch(batch_size, empty_prop):
  batch = []
  labels = []
  for i in range(batch_size):
    sample = get_sample(empty_prop)
    batch.append(sample[0])
    label = sample[1]
    onehot_label = np.zeros(num_classes)
    onehot_label[piece_labels.index(label)] = 1
    labels.append(onehot_label)

  batch = np.array(batch)
  labels = np.array(labels)
  return batch, labels

The generate_batch function takes two arguments, batch_size and empty_prop. It returns a batch of images and their corresponding labels. The labels are one-hot encoded before sending them to the model.

Below is the data_generator function, which is used in training.

def data_generator(batch_size, num_samples, empty_prop):
  i = 0
  while i < num_samples:
      X_batch, y_batch = generate_batch(batch_size, empty_prop)
      i += batch_size
      yield X_batch, y_batch

Step 2: The Model

After several rounds of testing, finetuning, and tweaking, I ended up with the following model.

import keras
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers import MaxPooling2D
from keras.layers.convolutional import Convolution2D
from keras.optimizers import RMSprop
from keras.callbacks import ModelCheckpoint

batch_size = 256
num_classes = 13
epochs = 5
n = 2_000_000
steps_per_epoch = n // batch_size

checkpoint_filepath = "checkpoint.h5"
checkpoint = ModelCheckpoint(checkpoint_filepath, save_weights_only=True, save_best_only=True)

model = Sequential()
model.add(Convolution2D(32, (3, 3), input_shape=(square_size, square_size, 3)))
model.add(Activation("relu"))
model.add(Convolution2D(32, (3, 3)))
model.add(Activation("relu"))
model.add(Convolution2D(64, (3, 3)))
model.add(Activation("relu"))
model.add(Convolution2D(128, (3, 3)))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(256))
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(Dense(13))
model.add(Activation("softmax"))

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

history = model.fit_generator(
    generator=data_generator(
        batch_size=batch_size, num_samples=n * steps_per_epoch, empty_prop=0.5
    ),
    validation_data=validation_data_generator(
        batch_size=batch_size, num_samples=1000 * steps_per_epoch, empty_prop=0.5
    ),
    steps_per_epoch=steps_per_epoch,
    validation_steps=1000,
    epochs=epochs,
    verbose=1,
    callbacks=[checkpoint],
)

The model is a sequential CNN with four 2D convolutional layers, each followed by a ReLU activation function. Checkpoints are used to save the best weights.

Step 3: Results

After the first epoch of training, the model achieved an accuracy of 99.00% (validation accuracy was 99.95%). In total, the model was trained for 9 epochs, after which the accuracy was 99.94% (validation accuracy was 100.0%). All in all, that was about 2 hours of gpu time.

Below is the confusion matrix for the test set.

Confusion matrix for the test set.
Confusion matrix for the test set.

When testing with images of 64-square chessboards, the model achieved 100.0% on 1,000 random chessboard positions generated using chess-python and FICS games dataset.

Step 4: Creating a FEN string

Now that the model can predict individual squares, we can use it to recognize an entire chessboard. The idea is to divide the chessboard into 64 individual squares, and run them through the model one by one. The model will return a label for each square, which we can then use to create the FEN string for that position.

I used python-chess library to construct the chessboard, and output the FEN string.

from typing import List
import keras
import cv2
import numpy as np
import chess

board_size = 400
square_size = int(board_size / 8)

# Piece labels, lowercase for black, uppercase for white
onehot_labels = ["empty", "k", "q", "r", "b", "n", "p", "K", "Q", "R", "B", "N", "P"]


def restore_onehot(y_pred: np.ndarray) -> np.ndarray:
  y_pred = np.argmax(y_pred, axis=1)
  y_pred = np.array([onehot_labels[i] for i in y_pred])
  return y_pred


def construct_board(labels: List[str]) -> chess.Board:
  board = chess.Board()

  for square, pred_piece in zip(chess.SQUARES, labels):
    color = chess.WHITE if pred_piece.isupper() else chess.BLACK
    if pred_piece == "empty":
      board.remove_piece_at(square)
    elif pred_piece.lower() == "k":
      board.set_piece_at(square, chess.Piece(chess.KING, color))
    elif pred_piece.lower() == "q":
      board.set_piece_at(square, chess.Piece(chess.QUEEN, color))
    elif pred_piece.lower() == "r":
      board.set_piece_at(square, chess.Piece(chess.ROOK, color))
    elif pred_piece.lower() == "b":
      board.set_piece_at(square, chess.Piece(chess.BISHOP, color))
    elif pred_piece.lower() == "n":
      board.set_piece_at(square, chess.Piece(chess.KNIGHT, color))
    elif pred_piece.lower() == "p":
      board.set_piece_at(square, chess.Piece(chess.PAWN, color))

  return board


def extract_squares(image_data: np.ndarray) -> np.ndarray:
  squares = []

  # Split the image into 64 squares
  # We want to go through the squares in the order of A1, A2, A3, ..., H7, H8
  for i in range(7, -1, -1):
    for j in range(0, 8):
      square = image_data[
        i * int(board_size / 8) : (i + 1) * int(board_size / 8),
        j * int(board_size / 8) : (j + 1) * int(board_size / 8),
      ]
      squares.append(square.flatten())

  return np.array(squares).reshape(-1, square_size, square_size, 3) / 255.0


def set_below_confidence_to_empty(
  y_pred: np.ndarray, y_labels: list, confidence: float
) -> list:
  y_max = np.amax(y_pred, axis=1)
  y_dict = {}

  for i in range(0, 64):
    y_dict[chess.SQUARES[i]] = (y_labels[i], y_max[i])

  for square, (_, prob) in y_dict.items():
    if prob < confidence:
      y_dict[square] = ("empty", prob)

  return [x[0] for x in y_dict.values()]


def parse_board(image_data: bytes, to_play: str) -> chess.Board:
  # Read the image and resize it to 400×400
  photo = cv2.imdecode(np.frombuffer(image_data, np.uint8), cv2.IMREAD_COLOR)
  photo = cv2.resize(photo, (board_size, board_size), interpolation=cv2.INTER_AREA)

  # Extract the squares from the image, (reshape & normalize) for the model
  squares = extract_squares(photo)

  # Predict the piece in each square
  y_pred = model.predict(squares)

  # Restore original labels
  y_labels = restore_onehot(y_pred)

  # Heuristics to fix possible errors
  # If the model is not confident (95% or more) about a square, set it to empty
  y_labels = set_below_confidence_to_empty(y_pred, y_labels, 0.95)

  board = construct_board(y_labels)

  # If the board is from the perspective of black, flip it
  if to_play == "black":
    board = board.transform(chess.flip_vertical).transform(chess.flip_horizontal)

  return board
Detecting the position from an image
file_to_detect = "test.png"

b = parse_board(file_to_detect, "white")
print(b.fen())

Given the following screenshot of a chessboard as an input, we get this FEN string.

rnb3nr/pp1b1p1p/8/2p1q3/3pP1k1/2PP1N2/PP3P1P/RNBQKB1R
Position used for testing
Position used for testing

We can see in the Lichess analysis board that this is the same position as in the image.

Since the model was trained on a wide variety of different piece styles, it is able to recognize even more unusual piece sets, such as in the following example.

2r2rk1/pp2q1pp/4pp2/1b1p4/1n1P1P1N/P3P2Q/1P3RPP/RB4K1
Another position with a more exotic piece set
Another position with a more exotic piece set

The model is also able to generalize, and recognize positions scanned from chess books, such as in the example below.

rnbqkb1r/pppppp1p/5np1/8/3P1B2/5N2/PPP1PPPP/RN1qKB1R
Position from a scanned chess book
Position from a scanned chess book

Although, based on manual testing, detecting from scanned images is highly dependent on the quality of the scan. Any scanning artifacts, low scanning resolution, or skewed scans will increase the error rate.

Step 5: Creating an API

After successfully creating the model, I wanted to create a simple API that would allow me to upload a screenshot of a chessboard, and get a FEN string in return. This API can be deployed to a server, so that I have a quick access to it whenever I need it.

Below is a simple API created with FastAPI. Here is the main API code.

main.py
from fastapi import FastAPI, File, UploadFile
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import chess_eye

app = FastAPI()

app.add_middleware(
  CORSMiddleware,
  allow_origins=["*"],
  allow_credentials=True,
  allow_methods=["*"],
  allow_headers=["*"],
)


class Position(BaseModel):
  fen: str


class Error(BaseModel):
  error: str


@app.post("/api/detect/{color}", response_model=Position)
def detect(
  file: UploadFile = File(...),
  color: str = "white",
):
  if file.content_type not in ["image/jpeg", "image/png"]:
    return Error(error="Invalid file type. Must be JPG, JPEG, or PNG.")

  if color not in ["white", "black"]:
    return Error(error="Invalid color parameter. Must be 'white' or 'black'.")

  image_data = file.file.read()
  fen = chess_eye.get_fen(image_data, color)
  if len(fen) > 0:
    return Position(fen=fen)
  else:
    return Error(error="Unable to detect position")

There are two endpoints: /api/detect/white and /api/detect/black. This decides the direction of the constructed board.

And here is the detection module.

chess_eye.py
# Code for parse_board() shown in Step 4

def get_fen(image_data: bytes, to_play: str) -> str:
  image_buffer = np.frombuffer(image_data, np.uint8)
  try:
    board = parse_board(image_buffer, to_play)

    # Only include piece placement in the FEN
    fen = board.fen().split(" ")[0]

    return fen
  except Exception as e:
    print(e)
    return ""

And, finally, the Dockerfile to create the Docker image.

Dockerfile
FROM python:3.8
RUN apt-get update && apt-get install -y ffmpeg libsm6 libxext6 && rm -r /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt ./
RUN pip3 install --upgrade pip
RUN pip3 install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8080
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

See full code on GitHub.

Step 6: Keyboard shortcut

As an extra, here’s a handy way to use the API. I created a macOS Shortcut, that allows me to take a screenshot, and then send it to the API automatically. The FEN string that is returned is then automatically copied to the clipboard, and I can paste it into any chess app.

API shortcut
API shortcut

In Raycast, I bound this shortcut to +++C.