
Table of Contents
- Introduction: The Art and Science of Dictionary Making
- Building a Corpus: The Evidence Base
- The Reading Program and Citation Collection
- Selecting Words for Inclusion
- The Art of Definition Writing
- Determining Pronunciation
- Researching Etymology
- Usage Labels and Style Guidance
- Editing and Cross-Referencing
- Dictionary Making in the Digital Era
- Challenges and Controversies
Introduction: The Art and Science of Dictionary Making
Most people consult dictionaries without ever wondering how they come into being. We look up a word, find its meaning, and move on—rarely pausing to consider the enormous intellectual effort behind each entry. Yet the creation of a dictionary is one of the most complex and labor-intensive projects in publishing, combining linguistic expertise, meticulous research, editorial judgment, and an almost obsessive attention to detail.
The discipline of creating dictionaries is called lexicography, and its practitioners—lexicographers—occupy a unique position at the intersection of linguistics, literature, and information science. As the great lexicographer Samuel Johnson once remarked, a lexicographer is "a harmless drudge, that busies himself in tracing the original, and detailing the signification of words." Behind that self-deprecating humor lies a profession of profound importance: lexicographers are the people who document, organize, and explain the words that make communication possible.
This article takes you behind the scenes of dictionary making, from the initial collection of evidence to the final published product, revealing the fascinating process by which raw language becomes organized knowledge.
Building a Corpus: The Evidence Base
Modern dictionary making begins with a corpus—a massive, structured collection of real-world text that serves as the evidence base for all editorial decisions. A corpus might contain billions of words drawn from newspapers, books, magazines, academic journals, websites, transcripts of spoken language, social media, and other sources. The goal is to represent the full range of how language is actually used across different contexts, registers, and time periods.
Major dictionary publishers maintain proprietary corpora of staggering size. The Oxford English Corpus, for instance, contains over two billion words of contemporary English. The Collins Corpus contains over 20 billion words. These digital databases allow lexicographers to search for any word or phrase and instantly see thousands of examples of its use in context—a capability that would have seemed miraculous to earlier generations of dictionary makers who relied on manual reading and handwritten citation slips.
Corpus Design Principles
Not any random collection of text makes a good corpus. Lexicographic corpora are carefully designed to be balanced and representative:
- Genre balance: Fiction, non-fiction, journalism, academic writing, informal communication, and technical documents are all represented in proportion to their real-world prevalence.
- Temporal coverage: Both contemporary and historical texts are included, with recent material weighted more heavily for a current-language dictionary.
- Geographic diversity: For a global language like English, the corpus must include texts from the US, UK, Australia, Canada, India, and other English-speaking regions.
- Register variety: Formal and informal language, spoken and written, professional and casual—all need representation.
The Reading Program and Citation Collection
Before digital corpora, dictionaries relied on reading programs—organized campaigns in which volunteer readers combed through published texts looking for interesting, new, or unusual word usage. When a reader found a noteworthy use of a word, they wrote the word, its context (a short quotation), and the bibliographic source on a small slip of paper called a citation slip.
The Oxford English Dictionary pioneered this approach on an industrial scale. Over the decades of its initial compilation (1857–1928), millions of citation slips were collected from thousands of volunteer readers worldwide. These slips were sorted alphabetically and stored in pigeonholes, creating a vast physical database of word usage that editors drew upon when writing entries.
While the physical citation slip has been largely replaced by digital corpus searches, the underlying principle remains the same: dictionary definitions must be based on evidence of how words are actually used, not on abstract reasoning about what words "should" mean. This evidence-based approach is what distinguishes professional lexicography from amateur attempts at dictionary making.
Selecting Words for Inclusion
One of the most challenging decisions in dictionary making is determining which words to include. No dictionary can contain every word in a language—even the unabridged OED, with its 600,000+ entries, doesn't claim complete coverage. Lexicographers must apply criteria to decide which words merit inclusion and which don't.
Criteria for Inclusion
The typical criteria for adding a new word to a dictionary include:
- Frequency: How often does the word appear in the corpus? High-frequency words are essential; extremely rare words may be excluded from smaller dictionaries.
- Range: Does the word appear across multiple sources, genres, and contexts, or is it confined to a single publication or author?
- Duration: Has the word been in use for a sustained period, or is it a flash-in-the-pan coinage? Most dictionaries require evidence of use over several years before adding a new word.
- Meaningfulness: Is the word's meaning evident from its parts, or does it need to be defined? A transparent compound like "coffee table" may be less urgently needed than an opaque word like "bamboozle."
- User expectation: Would a dictionary user reasonably expect to find this word? Technical terms, regional dialects, and slang all present different cases.
The question of when to add new words is perennial and often generates media attention. When a dictionary adds a trendy slang term, traditionalists may complain that standards are slipping. When it excludes a word in common use, descriptivists may accuse editors of gatekeeping. Navigating these tensions is a core challenge of the lexicographic profession.
The Art of Definition Writing
Definition writing—defining—is the heart of lexicography and arguably the most difficult part of the process. A good definition must be accurate, complete, concise, and accessible to the dictionary's target audience. It must capture the essential meaning of a word without being so broad that it could apply to other words, or so narrow that it excludes legitimate uses.
Principles of Good Definitions
Professional lexicographers follow established principles when crafting definitions:
- Substitutability: Ideally, a definition should be able to replace the word in a sentence without changing the meaning. If you define "happy" as "feeling or showing pleasure," you should be able to substitute "feeling or showing pleasure" wherever "happy" appears.
- Genus-differentia: The traditional definition structure places the word in a broader category (genus) and then specifies what distinguishes it (differentia). For example, "a hammer is a tool (genus) used for driving nails (differentia)."
- Avoid circularity: A definition should not use the word being defined, or a closely related word, in the definition itself. Defining "happiness" as "the state of being happy" is circular and unhelpful.
- Appropriate vocabulary: The words used in a definition should be simpler or at least as common as the word being defined. Using obscure language to define a common word defeats the purpose.
- Neutrality: Definitions should be objective and descriptive, avoiding personal opinions or cultural biases (though historical definitions often failed at this).
Polysemy and Sense Division
Most common words have multiple meanings (polysemy), and one of the lexicographer's key tasks is dividing these meanings into distinct senses and ordering them logically. The word "run," for instance, has dozens of senses in major dictionaries—from the physical act of running, to operating a machine, to a run in a stocking. Deciding where one sense ends and another begins requires careful analysis of corpus evidence and considerable editorial judgment.
Determining Pronunciation
Dictionaries provide pronunciation guidance using phonetic transcription systems. In English-language dictionaries, this is typically either the International Phonetic Alphabet (IPA) or a publisher-specific respelling system. Determining the "correct" pronunciation to record involves several considerations:
- Standard vs. regional pronunciations: Which dialect should serve as the reference? Most American dictionaries use General American; British dictionaries use Received Pronunciation.
- Variant pronunciations: Many words have multiple acceptable pronunciations (e.g., "either" as EE-ther or EYE-ther). Dictionaries typically list the most common pronunciation first.
- New words: Emerging terms may not have established pronunciations, requiring the lexicographer to consult multiple sources.
Modern dictionaries increasingly supplement written transcriptions with audio recordings, allowing users to hear the pronunciation directly—a significant advantage of digital dictionary formats.
Researching Etymology
The etymology section of a dictionary entry traces the word's origin and historical development. Etymological research requires expertise in historical linguistics, familiarity with multiple languages, and access to dated textual evidence showing how a word's form and meaning have changed over time.
For English words, etymological research often involves tracing a word through Middle English, Old English, and into Germanic, Latin, Greek, or other ancestral languages. Loanwords require identifying the source language and the route by which the word entered English. False etymologies and folk etymologies must be identified and corrected. The study of word roots, prefixes, and suffixes forms the foundation of this painstaking work.
Usage Labels and Style Guidance
Dictionaries don't just tell you what a word means—they tell you how, where, and when it's appropriate to use it. Usage labels provide this crucial contextual information:
- Register labels: formal, informal, slang, vulgar, literary, poetic, archaic
- Geographic labels: chiefly British, chiefly US, Australian, South African
- Subject labels: law, medicine, computing, music, botany
- Status labels: obsolete, rare, dated, nonstandard, offensive, disparaging
Assigning usage labels requires careful judgment. Labeling a word "slang" when it's actually "informal" can make a dictionary seem out of touch. Failing to flag an offensive term can cause harm. These decisions are among the most sensitive that lexicographers make, and they must be revisited as social attitudes and language use evolve over time.
Editing and Cross-Referencing
After individual entries are drafted, the dictionary undergoes extensive editing and cross-referencing. Editors check for consistency across entries—ensuring that related words use compatible definitions, that cross-references point to real entries, and that the same formatting conventions are applied throughout. In a large dictionary with hundreds of thousands of entries written by dozens of contributors over many years, maintaining consistency is a formidable challenge.
Specialized editors review specific aspects: pronunciation editors ensure phonetic transcriptions are accurate and consistent, etymology editors verify historical claims, and usage editors check that labels are applied fairly and accurately. A final round of proofreading catches typographical errors and formatting problems. The entire process from initial research to final publication can take years or even decades for a major dictionary project.
Dictionary Making in the Digital Era
The transition from print to digital has revolutionized every aspect of dictionary making. Digital corpora have replaced manual reading programs. Database software has replaced filing cabinets of citation slips. Online publication has eliminated the space constraints that forced print dictionaries to use abbreviations and tiny type.
Perhaps most significantly, digital dictionaries can be updated continuously rather than waiting for a new print edition every decade or two. When a new word enters common usage—or an existing word takes on a new meaning—a digital dictionary can be updated within weeks or months rather than years. This has made dictionaries far more responsive to linguistic change than they were in the print era.
Computational linguistics tools now assist lexicographers in ways that were impossible a generation ago. Algorithms can identify new words and emerging meanings in corpus data, flag entries that may need updating, and even generate draft definitions that human editors then refine. While human judgment remains essential, these digital tools have dramatically increased the speed and scope of the lexicographic process.
Challenges and Controversies
Dictionary making has always been accompanied by debate and controversy. Among the perennial challenges are the tension between prescriptivism (telling people how they should use language) and descriptivism (recording how people actually use language), the question of how to handle offensive or sensitive vocabulary, and the commercial pressures that can conflict with scholarly thoroughness.
The rise of user-generated content sites like Urban Dictionary has also raised questions about the future of professional lexicography. While crowdsourced dictionaries can capture slang and emerging terminology with impressive speed, they lack the editorial rigor, systematic evidence base, and professional expertise that distinguish reference-quality dictionaries. The best dictionaries of the future will likely combine the speed and breadth of digital technology with the depth and reliability of traditional professional lexicography.
Understanding how dictionaries are made deepens our appreciation for these remarkable reference works and reminds us that behind every definition lies a careful process of observation, analysis, and craftsmanship. Dictionaries are not simply lists of words—they are carefully constructed maps of human knowledge and communication.
