The ELO Rating System: Chess's Mathematical Mirror — and How BigChess Uses It

Arpad Elo invented a formula to measure chess skill. It became one of the most influential rating systems in the world — and now it powers competitive play in BigChess.

Introduction: A Number That Tells Your Story

Every competitive chess player has a number. It sits beside their name on club walls, tournament tables, and online profiles. It climbs after victories and drops after losses. It is not a perfect measure of anything, and yet chess players guard it with a seriousness that borders on the personal. That number is their ELO rating, and it represents one of the most influential inventions in the history of competitive play.

The ELO system began as a tool for ranking chess players in the United States in the 1950s. It grew to become the official rating system of FIDE, the international chess federation. And then it escaped chess entirely, spreading into video games, professional sports, academic peer review, and a dating app that explicitly uses it to rank users by desirability. A Hungarian-American physics professor's mathematical formula for measuring chess skill became the dominant paradigm for measuring competitive ability in dozens of unrelated fields.

This is the story of the ELO rating system — its origins, its mathematics, its cultural diffusion, its limitations, and its implementation in BigChess, where it powers real-time matchmaking on a 10×10 board against opponents who share your skill level from anywhere in the world.

Part I: Arpad Elo and the Birth of the System

The Problem Before ELO

Before Arpad Elo's system, chess ratings were ad hoc and inconsistent. Various national federations used different methods. Some rated players based on tournament performance against players of known strength. Others used committee-based assessments. The United States Chess Federation used the Harkness system, invented by Kenneth Harkness, which assigned numerical ratings based on performance in tournaments but had significant flaws: it was not statistically grounded, it did not properly account for the strength of opposition, and it produced anomalous results when players competed against opponents of widely different strengths.

The fundamental problem was conceptual: how do you assign a single number that captures competitive chess ability, updates correctly as that ability changes, and remains meaningful when players compete against opponents of different strengths?

Arpad Elo: The Man Behind the Formula

Arpad Elo was born in Hungary in 1903 and emigrated to the United States in 1913. He became a physics professor at Marquette University in Milwaukee and a lifelong chess enthusiast. He served on the USCF's rating committee and grew increasingly dissatisfied with the Harkness system's mathematical foundations.

Elo brought the rigor of statistical physics to the problem. He proposed that chess skill, like many measured quantities in nature, follows a normal distribution — that if you plotted the true playing strengths of all chess players on a graph, you would get a bell curve. Furthermore, he proposed that the probability of Player A defeating Player B could be calculated mathematically from the difference between their ratings, using the cumulative distribution function of the normal curve.

His key insight was probabilistic rather than deterministic: he did not claim that a higher-rated player will always beat a lower-rated one, only that the probability of each outcome could be calculated from the rating difference. This probabilistic foundation made the system robust to upsets and allowed ratings to update incrementally after every game rather than requiring periodic recalculation.

Elo published his system in 1978 in his book The Rating of Chess Players, Past and Present, but the USCF had already adopted it in 1960, and FIDE began using it internationally in 1970. By the time Elo published his comprehensive treatment, his formula had been the gold standard of chess rating for a decade.

Part II: The Mathematics of ELO

The Expected Score Formula

The core of the ELO system is the expected score formula. Given Player A with rating R_A and Player B with rating R_B, the expected score for Player A in a single game is:

E_A = 1 / (1 + 10^{(R_B - R_A) / 400})

This formula produces the following key results:

When ratings are equal (R_A = R_B), E_A = 0.5 — a 50% probability of winning (or expected score of 0.5 accounting for draws).
When Player A has a 200-point rating advantage, E_A ≈ 0.76 — A is expected to score 76 points per 100 games.
When Player A has a 400-point rating advantage, E_A ≈ 0.91 — A is expected to score 91 points per 100 games.
When Player A has a 800-point rating advantage, E_A ≈ 0.99 — almost certain victory.

The choice of the divisor 400 is somewhat arbitrary but was calibrated by Elo to match observed game outcomes in real chess data. It sets the scale of the rating distribution and determines how large a rating difference corresponds to a given win probability. FIDE uses 400; some implementations use different divisors.

The Update Formula

After a game, both players' ratings update based on the difference between their actual score and their expected score:

R'_A = R_A + K × (S_A - E_A)

Where:

R'_A is the new rating for Player A
R_A is the old rating
K is the K-factor (see below)
S_A is the actual score (1 for a win, 0.5 for a draw, 0 for a loss)
E_A is the expected score from the formula above

The logic is elegant: if you beat a player you were heavily favored to beat (high E_A, S_A = 1), the difference S_A - E_A is small, and your rating increases only slightly. If you beat a player who was heavily favored over you (low E_A, S_A = 1), the difference is large, and your rating increases substantially. Upsets are rewarded more than expected results — exactly as they should be.

The K-Factor: Speed of Adjustment

The K-factor determines how quickly ratings respond to new results. A high K-factor means ratings change rapidly with each game; a low K-factor means ratings change slowly and are more stable.

FIDE uses a tiered K-factor system:

K = 40: New players in their first 30 rated games, or any player until they have played 30 rated games and their rating has exceeded 2300.
K = 20: Players with ratings below 2400 who have completed their initial period.
K = 10: Players who have ever achieved a rating of 2400 or above (Grandmaster level).

The rationale: new players' true strength is highly uncertain, so large K-factors allow rapid convergence to a more accurate estimate. Established players' ratings are more stable estimates of their true strength, so smaller K-factors prevent excessive noise from individual game results.

Rating Floors and Boundaries

FIDE's rating system includes floors — minimum ratings that players cannot fall below once achieved. A player who has achieved a rating of 2000, for example, cannot have their FIDE rating fall below 1000 through poor subsequent results. These floors were introduced to prevent active players from gaming the system by deliberately losing games to set up favorable matchups against lower-rated opponents.

Rating floors introduce a small systematic distortion into the rating pool — they prevent genuine deflation of the rating distribution at the lower end — but their anti-gaming function is considered worth the tradeoff.

Part III: The Limitations and Debates Around ELO

Rating Inflation and Deflation

One of the most persistent debates in chess rating theory concerns inflation: the tendency for rating averages to drift upward over time even when the actual distribution of player strengths has not changed. If a new cohort of players enters the rating pool, each starting at the default initial rating and most eventually declining, they inject rating points into the pool that do not disappear when these players stop playing. The players who beat them absorb those points, raising the overall average.

FIDE's own data show that the average rating of players at the top end of the scale has drifted significantly upward since the system's introduction. This means that a rating of 2700 today is not necessarily equivalent to a rating of 2700 in 1990 — the rating inflation effect makes historical cross-era comparisons unreliable. Precisely how much of the observed increase in top ratings represents genuine skill improvement versus inflation remains actively debated among rating theorists.

The Problem of Rating New Pools

When a new competitive community forms — whether a new online platform, a new variant, or a new game — the ELO system faces a bootstrapping problem. The expected score formula works correctly only when the rating pool has stabilized: when most players have played enough games that their ratings are reasonably accurate estimates of their true strength.

In a new rating pool, everyone starts at the same initial rating (or some default value), and all initial ratings are equally inaccurate. The first games between players are essentially guesses — neither player's rating is a reliable indicator of their strength. The pool gradually stabilizes as players play more games and their ratings converge toward accurate estimates of their true strength, but this process takes time and many games.

The bootstrapping problem is a particular challenge for a new game like BigChess, where the rating pool begins from scratch. We address how BigChess handles this below.

The Assumption of Stable Strength

ELO assumes that a player's true strength is stable over the measurement period. In reality, players improve over time (especially novices) and decline with age (sometimes). A system calibrated for stable strength underestimates the rating of improving players and overestimates the rating of declining ones.

Various modifications to the base ELO system attempt to address this: using higher K-factors for younger or newer players (so their ratings move faster), using separate K-factors for different time periods, or modeling player strength as a dynamic variable rather than a fixed one. FIDE's tiered K-factor system is a partial accommodation of this problem.

The Role of Draws

Chess produces draws at rates that vary with player strength: a game between two grandmasters may draw in roughly half of classical games, while games between novices draw rarely. The ELO expected score formula treats a draw as half a win, which is mathematically correct — a draw gives each player 0.5 points regardless of who seemed better during the game — but creates a situation where draws between roughly equal players produce essentially no rating change even if one player played far better and was "lucky" to hold a draw.

This draw insensitivity means that rating estimates for players who draw frequently are noisier than for players who have decisive results. Some proposed ELO modifications incorporate draw probability into the expected score formula, but these have not been adopted by major rating bodies.

Part IV: ELO Beyond Chess — A Universal Measure of Competitive Ability

Video Games

The first major non-chess application of ELO was in competitive video gaming. By the late 2000s, online multiplayer games needed systems to match players of equal skill for competitive play. ELO was the obvious choice — it was well-understood, statistically grounded, and had been tested extensively in chess.

Microsoft's Xbox Live matchmaking system, Blizzard's StarCraft 2 ladder, and Riot Games' League of Legends ranked system all use ELO-derived algorithms. The specifics vary — different K-factors, different rating scales, hidden versus displayed ratings — but the fundamental expected score and update logic is recognizably Elo's original formulation.

League of Legends is perhaps the most prominent example, with its ranked system tracking millions of players' ELO-style ratings (called MMR, for Matchmaking Rating) and using these to construct fair matches across its enormous player base. The system works at massive scale precisely because the underlying mathematics are simple and robust.

Sports

FIFA, football's international governing body, adopted an ELO-based system for ranking national teams, replacing an older performance-based formula in 2018. The World Football ELO Ratings at eloratings.net have tracked national team strength since 1872 and are widely cited by football analysts as more accurate predictors of match outcomes than FIFA's official rankings.

The National Football League, the NBA, and Major League Baseball all have analyst communities that apply ELO-based ratings to team strength assessment, often as alternatives or supplements to official standings. Nate Silver's FiveThirtyEight website popularized ELO-based sports predictions for a mainstream audience, making Elo's academic formula a household concept among sports statistics enthusiasts.

Academic Peer Review and Research

ELO-style ranking systems have been applied to academic journals, research papers, and even individual scientists' publication records. Systems like the Eigenfactor (which rates journals) and various h-index variations share conceptual roots with ELO's approach: ranking by competitive performance against peer-quality work.

Dating Apps

Perhaps the most culturally significant diffusion of ELO was its adoption by Tinder, the dating application, as the basis for its early "Elo score" — an internal desirability ranking that determined which profiles were shown to which users. Tinder's ELO score calculated user attractiveness based on swipe patterns, using the logic that being swiped right by high-ELO users was worth more than being swiped right by low-ELO users. This application was widely discussed in the press, and Tinder eventually replaced it with a different algorithm, but the episode illustrated just how far Elo's formula had traveled from its origins in chess club tournaments in 1950s Milwaukee.

Part V: BigChess and ELO — Rating a New Game

The Challenges of Rating a New Variant

Implementing ELO for BigChess presents challenges that don't arise for a game with an established rating community. Classical chess's ELO pool contains millions of players with stable, well-calibrated ratings built over decades. A new player's initial rating can be estimated from tournament performance before they enter the pool, and games played between established players produce reliable rating updates from the first game.

BigChess begins with none of this infrastructure. Every player is new. Every initial rating is equally uncertain. The opening period of a new rating pool is inherently noisier than a mature one.

A further challenge is that a classical chess player's classical rating is only a partial guide to their BigChess strength. Classical chess knowledge transfers to BigChess — piece values, pawn structure understanding, endgame technique — but Clone tactics, 10×10 opening principles, and the specific patterns of BigChess have no classical equivalent. A strong classical chess player may initially underperform their expected strength in BigChess as they adapt to the new piece and larger board, then improve rapidly as they learn BigChess-specific patterns.

How BigChess Implements ELO Matchmaking

BigChess uses the ELO system for real-time matchmaking, pairing players with opponents whose ratings are close to their own. The system is implemented in the game server's Nakama backend, which handles rating calculations, stores game history, and uses ratings to identify compatible opponents from the current player pool.

The matchmaking system uses a wider acceptable rating difference range early in a player's career (when ratings are highly uncertain and the pool is smaller) and narrows this range as players accumulate more games and their ratings stabilize. This is analogous to FIDE's higher K-factor for new players — the system acknowledges greater uncertainty for players with few games and adjusts accordingly.

Game history is stored fully for every player, allowing complete review of rating trajectory over time. Players can see exactly which games caused significant rating changes and analyze those games to understand why. This transparency — knowing not just your current rating but the history of every point gained and lost — is a feature of BigChess's rating implementation that is practically useful for improvement.

What Your BigChess Rating Means

BigChess ratings follow the standard ELO interpretation of expected performance differentials. The specific numbers will shift as the rating pool grows and stabilizes, but the relative meaning remains constant:

Rating Difference	Expected Win Rate for Higher-Rated Player
0 (equal)	50%
+100	~64%
+200	~76%
+400	~91%
+600	~97%

In practical terms, these numbers mean that BigChess ratings are meaningful predictors of game outcomes even in a relatively new rating pool. A player rated 400 points higher than their opponent will win approximately nine out of ten games; their rating advantage is not an artifact of a small sample but a genuine signal about the strength difference between the two players.

A Concrete Example: Rating Update After a BigChess Game

Consider two players: Player A (BigChess rating 1400) and Player B (BigChess rating 1200). The expected score for Player A is:

E_A = 1 / (1 + 10^{(1200 - 1400) / 400}) = 1 / (1 + 10^-0.5) ≈ 0.76

Player A is expected to score 76% against Player B. Now suppose Player A wins. The rating update (using K = 20) is:

R'_A = 1400 + 20 × (1 - 0.76) = 1400 + 20 × 0.24 = 1400 + 4.8 ≈ 1405

Player A gains about 5 points — a modest reward for winning a game they were heavily favored in. Player B loses the same 5 points. Their new ratings: Player A at 1405, Player B at 1195.

Now suppose instead that Player B wins this game. The update for Player A:

R'_A = 1400 + 20 × (0 - 0.76) = 1400 - 15.2 ≈ 1385

Player A loses 15 points — a significant penalty for losing to a weaker opponent. Player B gains 15 points for the upset. This asymmetry in rating changes for expected versus unexpected results is the core of how ELO incentivizes competitive play against opponents of equal or greater strength.

Part VI: Improving Your BigChess ELO Rating

What Actually Moves Your Rating

Because the ELO system rewards beating opponents near or above your own rating, the fastest path to rating improvement is playing consistently good chess against evenly matched opponents. Several principles apply specifically to improving your BigChess ELO:

Understand the Clone before grinding rated games. Players who develop a solid understanding of Clone tactics and basic BigChess strategy before playing many rated games will climb more quickly than players who learn through repeated losses. BigChess's puzzle system is specifically designed for this — working through Clone tactical puzzles before playing rated games accelerates the learning curve.
Use game history analysis. Every rated BigChess game is stored. Review games where you lost significant rating points and identify the specific mistakes — Clone tactics missed, pawn structure errors, king safety oversights. Pattern recognition improves fastest through analyzing your own games.
Accept losses to stronger opponents as learning opportunities. The ELO system penalizes losing to weaker opponents more than losing to stronger ones. Playing against stronger opponents, even while losing, produces smaller rating decreases than losing to weaker players would. More importantly, games against stronger opponents expose you to more sophisticated BigChess play.
Resist opening experiments in high-stakes rated games. BigChess has no established opening theory, but certain basic principles — Clone development, center control on the 10×10 board, king safety — apply reliably. Deviating from these principles with untested experimental openings in rated games risks early positional damage before the game has properly begun.
Study leaderboard games. BigChess's game history system makes top-rated players' games available for study. Observing how strong BigChess players handle Clone piece coordination, pawn structure decisions, and the transition between game phases provides a practical curriculum for improvement that no amount of abstract study can replace.

Rating Milestones in a New Game

Because BigChess is a new game with a developing rating pool, the current rating milestones will evolve as more players join and the distribution stabilizes. What matters is not the absolute number but where you fall in the current distribution. As BigChess's rating pool grows, the relative structure of skill levels will become more visible — and the leaderboard will become an increasingly meaningful guide to the community's competitive landscape.

Conclusion: A Formula That Changed the World

Arpad Elo's contribution was not just a formula. It was a conceptual framework for thinking about competitive ability as a continuous, probabilistically distributed quantity that can be measured, compared, and updated with each new piece of evidence. This framework proved so broadly applicable that it outlived its chess-specific origins by decades, embedding itself in the rating infrastructure of competitive activities ranging from video games to international football to dating.

In BigChess, ELO provides the competitive framework that makes the game's matchmaking fair, transparent, and motivating. Every game produces information — a rating update that accurately reflects what happened and adjusts the probability estimates for future matchups. Every improvement in BigChess understanding is eventually reflected in the rating. The number beside your name tells your story, imperfectly but honestly, in the same mathematical language that Arpad Elo derived from statistical physics in the 1950s.

The game is new. The formula is old. Together, they create a competitive environment where every game matters and every improvement shows.

Build your BigChess rating against opponents at your level. Play now at bigchessgame.com — available on iOS, Android, and web browser. Your ELO journey on the 10×10 board starts with the first game.