
Table of Contents
- Where Dictionaries Actually Come From
- Starting With the Evidence: The Corpus
- Citation Slips and Reading Campaigns
- Deciding Which Words Get In
- Writing the Definitions
- Capturing How Words Are Said
- Tracing Each Word's Backstory
- Flags for Register, Region, and Field
- Tying the Whole Thing Together
- What Changed With Computers
- The Arguments That Never Go Away
Where Dictionaries Actually Come From
Look up a word, scan the definition, close the tab. That is the entire interaction most of us ever have with a dictionary. What rarely crosses anyone's mind is the scale of human labor sitting behind that one-line gloss. A serious dictionary is the output of years of evidence-gathering, argument, and careful editorial craft—closer in spirit to a scientific reference work than to a simple word list.
The people who do this work are called lexicographers, and their field is lexicography. It pulls from linguistics, history, computing, and a certain patient temperament for detail. Samuel Johnson famously defined a lexicographer as "a harmless drudge," but the joke has always concealed something serious: these are the editors responsible for mapping how a language actually behaves at a given moment in time.
What follows is a tour through how a dictionary gets built—starting from raw language data and ending with a polished entry—so you can see the full pipeline that turns everyday usage into a reference work.
Starting With the Evidence: The Corpus
The first ingredient is a corpus: an enormous, tagged archive of real text that editors treat as their evidence base. A working corpus pulls from novels, news, court decisions, scientific papers, magazine features, blog posts, TV transcripts, forum threads—wherever language lives. Scale matters, because editors need enough examples of each word to see its quirks.
The numbers are striking. Oxford's English corpus runs past two billion words of current prose. Collins maintains one roughly ten times that. With tools sitting on top of those archives, a lexicographer can type in a phrase and get back thousands of real-world uses in seconds—a workflow that would have astonished anyone working with paper slips and index cards a century ago.
What Makes a Corpus Useful
A pile of random text is not a corpus. The archive has to be engineered so the samples reflect how the language is genuinely distributed:
- Register variety: Courtroom transcripts, group chats, academic prose, and barroom conversation all need to be present, because a word can behave very differently across them.
- Geographic spread: For English, that means pulling material from Nigeria, Ireland, Singapore, Jamaica, New Zealand, and elsewhere—not just London and New York.
- Genre mix: Fiction, technical writing, journalism, and informal chat are weighted to roughly match their share of real-world usage.
- Time depth: Recent text dominates for a contemporary dictionary, but older material is kept in the mix to catch meaning shifts.
Citation Slips and Reading Campaigns
Before computers made the corpus approach practical, editors ran reading programs. Volunteers around the world were assigned books, periodicals, and pamphlets, and told to flag any word that was new, strange, or used in an unfamiliar way. The volunteer then wrote the word, a short excerpt showing it in use, and the full source reference onto an index card. That card was a citation slip.
The Oxford English Dictionary is the most famous product of this method. Between 1857 and 1928, several million slips poured in from readers scattered across the English-speaking world. They were alphabetized and shelved in wooden pigeonholes, and the resulting archive became the evidence pile from which the first OED was written.
Paper slips are mostly gone now, but the logic behind them has not changed. A definition must follow real usage, not the editor's hunch about what a word ought to mean. That commitment to documented evidence is the dividing line between a professional dictionary and a well-meaning blog post.
Deciding Which Words Get In
Every dictionary is a filter. Even the unabridged OED, with more than 600,000 entries, does not claim to hold every English word ever used. Editors have to draw a line somewhere, and drawing that line is one of the most contested parts of the job.
What Qualifies a Word
Editors typically weigh a short list of factors when a candidate word comes up:
- How often it shows up: Corpus frequency is the first check. A word appearing hundreds of thousands of times cannot be ignored; one that appears twice in a single novel usually can.
- How widely it spreads: A term confined to one subreddit or one sports broadcaster is weaker than one used across magazines, podcasts, and novels.
- How long it has lasted: Most publishers want several years of continuous use before they commit. Buzzwords that vanish in six months are not worth a permanent slot.
- Whether users will expect it: A reader who hears "rizz" or "mid" on TikTok will probably reach for a dictionary. A transparent compound like "kitchen knife" rarely needs one.
- Whether the meaning is guessable: Opaque terms whose meaning is not clear from their parts—"gaslight," "serendipity"—are more urgent than self-explanatory ones.
When a dictionary adds something trendy, letters pour in complaining that standards have collapsed. When it omits a word that feels ordinary, a different set of readers accuse the editors of snobbery. Finding the middle path is, essentially, the permanent tension of the field.
Writing the Definitions
Defining—this is where the craft really lives. A strong definition is precise without being dense, complete without being bloated, and readable by the intended audience. It has to fence off the exact territory of one word without leaking into neighboring ones.
Rules Lexicographers Work By
A few working principles come up again and again inside the trade:
- It should slot into a sentence: A good definition can stand in for the headword. "Feeling or showing pleasure" should be droppable in place of happy without mangling the meaning.
- Name the category, then narrow it: Classical definitions place the word inside a bigger group and then mark off what is unique. A chisel is a hand tool (the group) with a sharpened edge used to shape wood, stone, or metal (what sets it apart).
- No chasing your own tail: Defining grief as "the state of being grieved" is useless. Any decent definition has to break out of the word's own family.
- Plain vocabulary: The words used inside the entry should be at least as common as the word being defined. Cracking open a second dictionary to read the first one is a design failure.
- Keep opinions out: Definitions describe; they do not praise or condemn. Older dictionaries slipped on this constantly, and some of the ugliest episodes in lexicographic history sit in those value-loaded entries.
Splitting a Word Into Its Meanings
Almost any common word means several things, and sorting those senses out is its own puzzle. Take set, get, or run: the OED lists dozens of separate senses for each. Deciding whether "to run a business" and "to run a simulation" count as one sense or two takes real corpus work and a lot of editorial argument. Get it wrong and the entry either balloons uselessly or collapses distinctions that readers actually care about.
Capturing How Words Are Said
Pronunciation notes come in two main flavors: the International Phonetic Alphabet, which is precise but opaque to most readers, and an in-house respelling system like Merriam-Webster's. Choosing what to print involves a few judgment calls:
- Which accent sets the baseline: American publishers default to General American; British ones traditionally use Received Pronunciation, though that norm is loosening.
- How to handle variants: Caramel, pecan, data, route—plenty of words have two or more defensible pronunciations, and the editor has to decide which comes first.
- What to do with new arrivals: A just-borrowed term like schadenfreude or a brand-name verb like Google may not have a single settled pronunciation yet, and the editor has to sample actual speech to call it.
The web has let publishers go further: most online entries now ship with an audio clip alongside the transcription, a clear win for digital dictionary formats over their paper ancestors.
Tracing Each Word's Backstory
The etymology section walks a word back through time, showing where it came from and how its form and meaning shifted along the way. Doing this well requires reading knowledge of several old languages, access to dated textual evidence, and a willingness to say "origin unknown" when the trail goes cold.
For English, the typical path runs through Middle and Old English into Proto-Germanic, with frequent stops in Latin, Greek, French, Norse, and Arabic. Borrowed words need their source language and route of entry pinned down. Folk etymologies—plausible-sounding stories with no evidence behind them—have to be spotted and dismissed. Work on roots, prefixes, and suffixes sits underneath all of it.
Flags for Register, Region, and Field
A word's meaning is only part of what a reader needs. The rest is context: when is this word okay to use, and when would it embarrass you? Usage labels carry that information:
- Register: formal, informal, slang, vulgar, literary, poetic, archaic
- Status: obsolete, rare, dated, nonstandard, offensive, disparaging
- Region: chiefly British, chiefly US, Australian, South African, Indian English
- Field: law, medicine, computing, music, botany, finance
Labeling is politically delicate work. Tag a word "slang" when it is really just informal and the dictionary looks like a scold. Leave a slur unlabeled and real harm follows. These are live decisions, revisited as usage shifts and as editorial standards catch up with communities the older dictionaries ignored.
Tying the Whole Thing Together
Once individual entries are drafted, the manuscript enters a long editorial pass. Consistency is the enemy: with dozens of contributors writing across hundreds of thousands of entries over many years, the same phrase can end up defined three different ways in three corners of the book. Senior editors comb for those mismatches, make sure cross-references actually point somewhere, and enforce house style on everything from comma use to label order.
Specialist editors handle their own passes. Pronunciation editors check every transcription. Etymology editors verify historical claims. Usage editors audit labels for fairness and accuracy. A final proofreading round chases down typos and layout errors. Start to finish, a full-scale dictionary project is measured in years, and sometimes in decades.
What Changed With Computers
Every stage of the workflow described above has been reshaped by digital tools. Corpora replaced reading programs. Databases replaced file cabinets. Online publication released the tight space limits that forced old print dictionaries into telegraphic abbreviations and six-point type.
The most important change is cadence. A print dictionary used to wait ten or twenty years between revisions. A digital dictionary can publish a new sense of ghost or add an entry for deepfake within weeks of its evidence crossing the frequency threshold. That responsiveness has pulled the reference profession closer to the speed of the language itself.
Computational linguistics also quietly hands editors new leverage. Software now surfaces words that are trending in the corpus, flags entries whose definitions no longer match current usage, and drafts initial definitions that a human then edits. The editor stays in charge, but a lot of the grunt work has been automated.
The Arguments That Never Go Away
Lexicography has always had its fights. The oldest is the tug-of-war between prescriptivists, who want dictionaries to tell people how to speak, and descriptivists, who want them to record how people actually speak. Close behind sit the arguments over offensive vocabulary, the commercial pressure to produce faster and cheaper editions, and the long-running question of which varieties of English deserve first-class treatment.
Sites like Urban Dictionary have added a newer layer to all of this. They capture slang with a speed no traditional publisher can match, but they lack the citation discipline, systematic evidence base, and editorial accountability of a reference dictionary. The likely future is a hybrid one—digital speed and reach on one side, and the careful methods of professional lexicography on the other.
Once you know what goes into building one, a dictionary stops looking like a neutral object on a shelf. Every entry is a small argument, settled by evidence, shaped by editors, and published with the expectation that someone, somewhere, will rely on it.
Look Up Any Word Instantly on Dictionary Wiki
Get definitions, pronunciation, etymology, synonyms & examples for 1,200,000+ words.
Search the Dictionary