The ELO Rating System Explained: History, Math & Examples

8 min read

The ELO rating system is the most widely used method for ranking players in head-to-head games. It powers chess ratings, tennis ladders, competitive video games, and friendly office leagues. This guide explains where it came from, the surprisingly simple math behind it, and shows the exact numbers change after a few example matches.

Where ELO came from

ELO is named after Arpad Elo, a Hungarian-American physics professor and strong chess player. In the 1950s and 60s, the United States Chess Federation ranked players with a clumsy system that everyone agreed was unfair. Elo proposed a statistical replacement based on a simple premise: a player’s performance in any single game is a random variable around their “true” skill, and you can estimate that skill from results over time. FIDE, the international chess body, adopted his system in 1970, and it has since spread far beyond chess. (Note: “ELO” is a person’s name, not an acronym — but the all-caps styling has become so common that we use it here too.)

The core idea: expected score

Everything in ELO flows from one question asked before a match: given the gap between these two ratings, how likely is each player to win? That probability is called the expected score. A much higher-rated player might have an expected score of 0.9 (a 90% chance of winning); two evenly matched players each have an expected score of 0.5.

After the match, ELO compares what actually happened (a win counts as 1, a loss as 0) to what was expected. If you do better than expected, your rating goes up; worse than expected, it goes down. Beating a favorite is a big surprise, so it moves a lot of points. Beating someone you were supposed to beat is no surprise, so it moves only a few.

The formula

The expected score for player A against player B is:

E_A = 1 / (1 + 10^{(R_B − R_A) / 400})

And the rating update after the game is:

R_A′ = R_A + K × (S_A − E_A)

Here S_A is the actual result (1 for a win, 0 for a loss, 0.5 for a draw) and K is the K-factor — the dial that sets how much a single game can move a rating. TrackMyElo’s defaults, which every example below assumes, are a starting rating of 1000 and a K-factor of 32.

Keep two things separate. The expected score depends only on the rating gap and the “400” constant — not on the K-factor or the starting rating. The 400 just sets the scale: a 400-point gap means the favorite is expected to win about 10 times as often as they lose. The K-factor, by contrast, only governs how far ratings move afterward. To put the two on the same footing, with the default K of 32 a 400-point gap is 400 ÷ 32 ≈ 12.5 × K — so it takes on the order of a dozen maximum-sized rating swings to build a gap that large.

A worked example

Using the defaults (starting rating 1000, K-factor 32), say Alice has climbed to 1200 while Bob sits at 1000 — a 200-point gap. First, the expected scores:

E_Alice = 1 / (1 + 10^{(1000−1200)/400}) ≈ 0.76
E_Bob = 1 − 0.76 = 0.24

So Alice is expected to win about 76% of the time. Now consider two outcomes:

Alice wins (the expected result). Her new rating is 1200 + 32 × (1 − 0.76) ≈ 1208. Bob drops to 1000 + 32 × (0 − 0.24) ≈ 992. A small, ~8-point move.
Bob wins (an upset). Bob jumps to 1000 + 32 × (1 − 0.24) ≈ 1024, while Alice falls to 1200 + 32 × (0 − 0.76) ≈ 1176. A much bigger ~24-point swing.

Notice the asymmetry: the favorite gains only 8 points for a win they were supposed to get, but loses 24 for a stumble. Those swings are just the K-factor scaled by the surprise — the 8 is 0.24 × K and the 24 is 0.76 × K, with K = 32. Change the K-factor and they move proportionally: at K = 16 they’d halve to about 4 and 12, and at K = 48 they’d grow to roughly 12 and 36. The 76% expected score, though, wouldn’t budge — that is fixed by the 200-point gap alone. That’s ELO automatically rewarding upsets and punishing complacency — and you never have to do any of this arithmetic yourself.

Ratings are (almost) zero-sum

In the example above, Alice lost exactly as many points as Bob gained. When both players share the same K-factor, every game is zero-sum: points only move between players, never appearing or vanishing. That’s why a group’s ratings stay anchored around the starting value over time, and why you can’t inflate your rating just by playing a lot — you can only earn points by taking them from someone else.

What a rating number actually means

A rating is only meaningful relative to the others in your group. What matters is the gap between two players, not the absolute number. A handy rule of thumb from the 400 constant:

0 points apart → a coin flip (50/50).
~100 points apart → the favorite wins about 64% of the time.
~200 points apart → about 76%.
~400 points apart → about 91% (10-to-1 odds).

How a rating gap translates into the favorite's win probability.

These probabilities come from the rating gap alone — they’re identical no matter your starting rating or K-factor. If you’d rather think in terms of the default K-factor of 32, the same gaps are roughly 100 ≈ 3 × K, 200 ≈ 6 × K, and 400 ≈ 12.5 × K. In other words, a 91% favorite sits about a dozen full K-sized swings ahead of their opponent.

Settling in

A brand-new player’s rating is just a guess until they’ve played a handful of games. Early on their results carry the most information, so many systems use a higher K-factor for the first several matches and then lower it once the rating has stabilized. Expect a new player’s number to bounce around for their first 5–10 games before it settles near their true level.

Limitations worth knowing

Inactivity. A rating reflects your skill the last time you played, not today.
Small pools. In a group of four, ratings are noisy — there just isn’t much data.
It ignores margin by default. An 11–9 win counts the same as an 11–0 win unless you turn on score-margin scoring.
It assumes consistent conditions. Home advantage, fatigue, and mood aren’t modeled.

None of these are dealbreakers — they’re just the trade-offs of a system that stays simple enough to compute on the back of a napkin. For a deeper comparison with newer alternatives, see ELO vs Glicko vs TrueSkill.