Project Links
The API is currently deployed at https://chess-scanner.sumi.re
, and is free to use. Please do not abuse it.
Intro
I enjoy watching chess content on YouTube. Sometimes I see an interesting position in a video, and wish to try my own variations. Now, how can I open the same position in Lichess’ analysis tool or Stockfish? I could install a browser extension like Chessvision.ai, but where would the fun be in that? In this post I’ll outline how to train your own chess position detector.
The Idea
- As an input, the program receives a cropped image of a chessboard.
- The image is evenly divided into an 8×8 grid, i.e. each cell is one square on the chessboard.
- A multiclass classifier is run on each square, determining the piece (if any), on that square.
- From this, a chessboard is constructed, and the position is returned as a FEN string, which can then be pasted into any chess analysis tool.
As there are 64 squares on the board, the classifier needs to be extremely accurate. A classifier that is 99% accurate is not enough, as the probability that it will correctly classify all 64 squares is only .
Step 1: The Data
It’s easy to think that all chess pieces look pretty much the same, but there’s actually a wide variety of different piece set designs, even in digital chessboards. In order for the model to be able to generalize, it must be trained on various piece sets and boards. Since Lichess and Chess.com are the most popular online chess websites, I decided to use their boards and piece sets for training.
Instead of creating a predetermined number of training and testing images, I decided to generate them on the fly. This gives us several advantages:
- We can generate an almost endless amount of training data with random piece sets and boards, without having to store it on disk.
- We can also do data augmentation on the fly.
- We can control the distributions of the training data, e.g. the proportion of empty squares.
- The model is less likely to overfit due to regularization from the data augmentation and randomness of the training data.
Let’s first read the images of the pieces and boards into lists.
training.zip
contains two folders, boards
and pieces
. boards
has 45 PNG images of the pieces of a particular set. The naming follows the convention <piece>_<color>.png
, where <piece>
is one of k
, q
, r
, b
, n
, or p
, and <color>
is either w
for white or b
for black.
import cv2
import numpy as np
import glob
import os
tile_images = []
piece_images = []
n_boards = 45
n_piece_sets = 59
# Make sure board_size is a multiple of 8
board_size = 400
square_size = int(board_size / 8)
# Process boards
for i in range(1, n_boards + 1):
img = cv2.imread(f"boards/{i}.png")
img = cv2.resize(img, (board_size, board_size), fx=0.5, fy=0.5, interpolation=cv2.INTER_AREA)
# Iterate over each 50×50 square on the board
for n in range(8):
for m in range(8):
square = img[n * square_size : (n + 1) * square_size, m * square_size : (m + 1) * square_size]
# Append as [label, image]
tile_images.append(["empty", square.flatten()])
# Process pieces
for i in range(1, n_piece_sets + 1):
pieces = glob.glob(f"pieces/{i}/*.png")
for piece in pieces:
piece_type = piece.split("/")[-1].split(".")[0].split("_")[0]
piece_color = piece.split("/")[-1].split(".")[0].split("_")[1]
if piece_color == "w":
piece_type = piece_type.upper()
img = cv2.imread(piece, cv2.IMREAD_UNCHANGED)
img = cv2.resize(img, (square_size, square_size), fx=0.5, fy=0.5, interpolation=cv2.INTER_AREA)
piece_images.append([piece_type, img.flatten()])
tile_images = np.array(tile_images, dtype=object)
piece_images = np.array(piece_images, dtype=object)
We resize the boards to 400×400 pixels, which means that each trainable square is going to be 50×50 pixels in size.
For each board, we iterate over each square, extract it, and add it to the tile_images
list. We also add the label empty
, to indicate an empty square.
As the pieces are transparent PNG images, we need to use the option cv2.IMREAD_UNCHANGED
, when reading the image. Otherwise the alpha channel will be stripped, and we cannot overlay the pieces on top of the board images. Similarly, we read all the piece images, resize them and add them to the piece_images
list with their respective labels.
The reason why we separated the board and piece images into two separate lists is because we want to control the amount of empty squares in our dataset. On a chessboard, there can be no more than 32 pieces on the board at a time (16 white, 16 black, no captures), meaning that given any position, the number of empty squares is greater than or equal to the number of non-empty squares. When generating the training data, the proportion of empty squares in the data was set to 50%.
Now we can create a function that generates random chessboard pieces on the fly.
import random
def get_sample(empty_prop, augmentation=True):
tile = random.choice(tile_images)[1].reshape(square_size, square_size, 3)
if random.random() < empty_prop:
return tile / 255.0, "empty"
else:
piece = random.choice(piece_images)
piece_label = piece[0]
piece_image = piece[1].reshape(square_size, square_size, 4)
piece_alpha = piece_image[:,:,3]
alpha = cv2.merge([piece_alpha, piece_alpha, piece_alpha])
piece_image = piece_image[:,:,0:3]
blend = np.where(alpha==(0,0,0), tile, piece_image)
if augmentation:
if random.random() < 0.2:
blur_amount = random.choice([1, 3, 5])
blend = cv2.GaussianBlur(blend, (blur_amount, blur_amount), 0)
if random.random() < 0.5:
shift_amount = random.choice([1, 2, 3, 4, 5])
shift_direction = random.choice([-1, 1])
shift_axis = random.choice([0, 1])
blend = np.roll(blend, shift_amount * shift_direction, axis=shift_axis)
if random.random() < 0.10:
angle = random.choice([-2, -1, 1, 2])
image_center = tuple(np.array(blend.shape[1::-1]) / 2)
rot_mat = cv2.getRotationMatrix2D(image_center, angle, 1.0)
blend = cv2.warpAffine(blend, rot_mat, blend.shape[1::-1], flags=cv2.INTER_LINEAR)
return blend / 255.0, piece_label
The get_sample
function is used to generate a single training image. The argument empty_prop
is the probability that the square will be empty. The function returns a 50×50 image and its corresponding label. It first selects a random image of an empty square. Then, depending on the value of empty_prop
and randomness, it either returns the empty square image, or it overlays a random piece on top of it. If augmenation
is set to True
, some random transformations may be applied to the image. The images can be slightly blurred, shifted, or rotated. All samples are normalized to the range .
Below is a sampled collection of 1200 randomly generated chessboard squares. The probability of an empty square was set to 0.5.
Next, we can create a function that generates a batch of training data for the model.
piece_labels = ["empty", "k", "q", "r", "b", "n", "p", "K", "Q", "R", "B", "N", "P"]
def generate_batch(batch_size, empty_prop):
batch = []
labels = []
for i in range(batch_size):
sample = get_sample(empty_prop)
batch.append(sample[0])
label = sample[1]
onehot_label = np.zeros(num_classes)
onehot_label[piece_labels.index(label)] = 1
labels.append(onehot_label)
batch = np.array(batch)
labels = np.array(labels)
return batch, labels
The generate_batch
function takes two arguments, batch_size
and empty_prop
. It returns a batch of images and their corresponding labels. The labels are one-hot encoded before sending them to the model.
Below is the data_generator
function, which is used in training.
def data_generator(batch_size, num_samples, empty_prop):
i = 0
while i < num_samples:
X_batch, y_batch = generate_batch(batch_size, empty_prop)
i += batch_size
yield X_batch, y_batch
Step 2: The Model
After several rounds of testing, finetuning, and tweaking, I ended up with the following model.
import keras
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers import MaxPooling2D
from keras.layers.convolutional import Convolution2D
from keras.optimizers import RMSprop
from keras.callbacks import ModelCheckpoint
batch_size = 256
num_classes = 13
epochs = 5
n = 2_000_000
steps_per_epoch = n // batch_size
checkpoint_filepath = "checkpoint.h5"
checkpoint = ModelCheckpoint(checkpoint_filepath, save_weights_only=True, save_best_only=True)
model = Sequential()
model.add(Convolution2D(32, (3, 3), input_shape=(square_size, square_size, 3)))
model.add(Activation("relu"))
model.add(Convolution2D(32, (3, 3)))
model.add(Activation("relu"))
model.add(Convolution2D(64, (3, 3)))
model.add(Activation("relu"))
model.add(Convolution2D(128, (3, 3)))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(256))
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(Dense(13))
model.add(Activation("softmax"))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
history = model.fit_generator(
generator=data_generator(
batch_size=batch_size, num_samples=n * steps_per_epoch, empty_prop=0.5
),
validation_data=validation_data_generator(
batch_size=batch_size, num_samples=1000 * steps_per_epoch, empty_prop=0.5
),
steps_per_epoch=steps_per_epoch,
validation_steps=1000,
epochs=epochs,
verbose=1,
callbacks=[checkpoint],
)
The model is a sequential CNN with four 2D convolutional layers, each followed by a ReLU activation function. Checkpoints are used to save the best weights.
Step 3: Results
After the first epoch of training, the model achieved an accuracy of 99.00% (validation accuracy was 99.95%). In total, the model was trained for 9 epochs, after which the accuracy was 99.94% (validation accuracy was 100.0%). All in all, that was about 2 hours of gpu time.
Below is the confusion matrix for the test set.
When testing with images of 64-square chessboards, the model achieved 100.0% on 1,000 random chessboard positions generated using chess-python
and FICS games dataset.
Step 4: Creating a FEN string
Now that the model can predict individual squares, we can use it to recognize an entire chessboard. The idea is to divide the chessboard into 64 individual squares, and run them through the model one by one. The model will return a label for each square, which we can then use to create the FEN string for that position.
I used python-chess
library to construct the chessboard, and output the FEN string.
from typing import List
import keras
import cv2
import numpy as np
import chess
board_size = 400
square_size = int(board_size / 8)
# Piece labels, lowercase for black, uppercase for white
onehot_labels = ["empty", "k", "q", "r", "b", "n", "p", "K", "Q", "R", "B", "N", "P"]
def restore_onehot(y_pred: np.ndarray) -> np.ndarray:
y_pred = np.argmax(y_pred, axis=1)
y_pred = np.array([onehot_labels[i] for i in y_pred])
return y_pred
def construct_board(labels: List[str]) -> chess.Board:
board = chess.Board()
for square, pred_piece in zip(chess.SQUARES, labels):
color = chess.WHITE if pred_piece.isupper() else chess.BLACK
if pred_piece == "empty":
board.remove_piece_at(square)
elif pred_piece.lower() == "k":
board.set_piece_at(square, chess.Piece(chess.KING, color))
elif pred_piece.lower() == "q":
board.set_piece_at(square, chess.Piece(chess.QUEEN, color))
elif pred_piece.lower() == "r":
board.set_piece_at(square, chess.Piece(chess.ROOK, color))
elif pred_piece.lower() == "b":
board.set_piece_at(square, chess.Piece(chess.BISHOP, color))
elif pred_piece.lower() == "n":
board.set_piece_at(square, chess.Piece(chess.KNIGHT, color))
elif pred_piece.lower() == "p":
board.set_piece_at(square, chess.Piece(chess.PAWN, color))
return board
def extract_squares(image_data: np.ndarray) -> np.ndarray:
squares = []
# Split the image into 64 squares
# We want to go through the squares in the order of A1, A2, A3, ..., H7, H8
for i in range(7, -1, -1):
for j in range(0, 8):
square = image_data[
i * int(board_size / 8) : (i + 1) * int(board_size / 8),
j * int(board_size / 8) : (j + 1) * int(board_size / 8),
]
squares.append(square.flatten())
return np.array(squares).reshape(-1, square_size, square_size, 3) / 255.0
def set_below_confidence_to_empty(
y_pred: np.ndarray, y_labels: list, confidence: float
) -> list:
y_max = np.amax(y_pred, axis=1)
y_dict = {}
for i in range(0, 64):
y_dict[chess.SQUARES[i]] = (y_labels[i], y_max[i])
for square, (_, prob) in y_dict.items():
if prob < confidence:
y_dict[square] = ("empty", prob)
return [x[0] for x in y_dict.values()]
def parse_board(image_data: bytes, to_play: str) -> chess.Board:
# Read the image and resize it to 400×400
photo = cv2.imdecode(np.frombuffer(image_data, np.uint8), cv2.IMREAD_COLOR)
photo = cv2.resize(photo, (board_size, board_size), interpolation=cv2.INTER_AREA)
# Extract the squares from the image, (reshape & normalize) for the model
squares = extract_squares(photo)
# Predict the piece in each square
y_pred = model.predict(squares)
# Restore original labels
y_labels = restore_onehot(y_pred)
# Heuristics to fix possible errors
# If the model is not confident (95% or more) about a square, set it to empty
y_labels = set_below_confidence_to_empty(y_pred, y_labels, 0.95)
board = construct_board(y_labels)
# If the board is from the perspective of black, flip it
if to_play == "black":
board = board.transform(chess.flip_vertical).transform(chess.flip_horizontal)
return board
file_to_detect = "test.png"
b = parse_board(file_to_detect, "white")
print(b.fen())
Given the following screenshot of a chessboard as an input, we get this FEN string.
rnb3nr/pp1b1p1p/8/2p1q3/3pP1k1/2PP1N2/PP3P1P/RNBQKB1R
We can see in the Lichess analysis board that this is the same position as in the image.
Since the model was trained on a wide variety of different piece styles, it is able to recognize even more unusual piece sets, such as in the following example.
2r2rk1/pp2q1pp/4pp2/1b1p4/1n1P1P1N/P3P2Q/1P3RPP/RB4K1
The model is also able to generalize, and recognize positions scanned from chess books, such as in the example below.
rnbqkb1r/pppppp1p/5np1/8/3P1B2/5N2/PPP1PPPP/RN1qKB1R
Although, based on manual testing, detecting from scanned images is highly dependent on the quality of the scan. Any scanning artifacts, low scanning resolution, or skewed scans will increase the error rate.
Step 5: Creating an API
After successfully creating the model, I wanted to create a simple API that would allow me to upload a screenshot of a chessboard, and get a FEN string in return. This API can be deployed to a server, so that I have a quick access to it whenever I need it.
Below is a simple API created with FastAPI. Here is the main API code.
from fastapi import FastAPI, File, UploadFile
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import chess_eye
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
class Position(BaseModel):
fen: str
class Error(BaseModel):
error: str
@app.post("/api/detect/{color}", response_model=Position)
def detect(
file: UploadFile = File(...),
color: str = "white",
):
if file.content_type not in ["image/jpeg", "image/png"]:
return Error(error="Invalid file type. Must be JPG, JPEG, or PNG.")
if color not in ["white", "black"]:
return Error(error="Invalid color parameter. Must be 'white' or 'black'.")
image_data = file.file.read()
fen = chess_eye.get_fen(image_data, color)
if len(fen) > 0:
return Position(fen=fen)
else:
return Error(error="Unable to detect position")
There are two endpoints: /api/detect/white
and /api/detect/black
. This decides the direction of the constructed board.
And here is the detection module.
# Code for parse_board() shown in Step 4
def get_fen(image_data: bytes, to_play: str) -> str:
image_buffer = np.frombuffer(image_data, np.uint8)
try:
board = parse_board(image_buffer, to_play)
# Only include piece placement in the FEN
fen = board.fen().split(" ")[0]
return fen
except Exception as e:
print(e)
return ""
And, finally, the Dockerfile to create the Docker image.
FROM python:3.8
RUN apt-get update && apt-get install -y ffmpeg libsm6 libxext6 && rm -r /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt ./
RUN pip3 install --upgrade pip
RUN pip3 install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8080
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
See full code on GitHub.
Step 6: Keyboard shortcut
As an extra, here’s a handy way to use the API. I created a macOS Shortcut, that allows me to take a screenshot, and then send it to the API automatically. The FEN string that is returned is then automatically copied to the clipboard, and I can paste it into any chess app.
In Raycast, I bound this shortcut to ⌘+⌥+⇧+C.