Home > Positioning > Subjects > DSL > External DSLs

External DSLs

An external DSL has its own grammar and its own parser. It stands independently of any host language — the user writes in the DSL’s syntax, and a dedicated tool processes it. SQL, regular expressions, CSS, Make, and GraphQL are all external DSLs. Each defines its own rules for what constitutes valid input and its own semantics for what that input means.

The term is one half of Fowler’s primary classification. Where an internal DSL borrows its syntax from a host language, an external DSL owns its syntax entirely. This gives full freedom over how the language looks and behaves, at the cost of building the entire processing pipeline from scratch.

The implementation pipeline

An external DSL requires a chain of processing stages to go from source text to execution. The stages vary in detail but follow a common pattern.

Lexing (tokenisation). The source text is broken into tokens — the smallest meaningful units. In SQL, tokens include keywords (SELECT, FROM, WHERE), identifiers (table and column names), operators (=, >), and literals (strings, numbers). The lexer strips whitespace and comments, producing a flat sequence of typed tokens.

Parsing. The token sequence is analysed against the language’s grammar to produce a parse tree or abstract syntax tree (AST). The parser enforces syntactic rules — that a SELECT must be followed by a column list, that parentheses must balance, that operators have the right number of operands. Parsing is where most syntax errors are caught.

Semantic analysis. The AST is checked for semantic validity — things the grammar cannot express. In SQL, this is where the engine checks that table names exist, that column types are compatible with operators, that aggregate functions are used correctly. Not all DSLs have a separate semantic analysis phase; simpler languages fold it into parsing or execution.

Execution or code generation. The validated AST is either interpreted directly (an interpreter walks the tree and performs actions) or compiled into another form (code generation produces output in a target language, or a query planner produces an execution plan). SQL query planners are a sophisticated example — the planner transforms the declarative query into an imperative execution plan optimised for the available indexes and statistics.

Parsing approaches

Several established techniques exist for building the parser, each with different trade-offs between power, ease of use, and the quality of error messages.

Parser generators. A grammar specification is fed to a tool that generates parser code. ANTLR (Terence Parr) is the most widely used modern parser generator, supporting LL(*) parsing with automatic error recovery. Yacc/Bison use LALR parsing and remain common in Unix-heritage projects. JavaCC generates Java parsers from LL(k) grammars.

Parser combinators. Small parsers are composed into larger ones using combinators — functions that take parsers as arguments and return new parsers. Parsec (Haskell) and FParsec (F#) are well-known implementations. Parser combinators blur the line between internal and external DSLs — the grammar is expressed in host-language code, but the language being parsed is external.

PEG parsers. Parsing Expression Grammars use ordered choice (try the first alternative; if it fails, try the second) rather than the ambiguous alternation of context-free grammars. Implementations include PEG.js (JavaScript), Pest (Rust), and parboiled (Java/Scala). The deterministic semantics make PEGs particularly suitable for languages where grammar ambiguity would otherwise need explicit resolution.

Hand-written recursive descent. A parser written directly in a general-purpose language, with one function per grammar rule. More labour-intensive than generated parsers, but produces the best error messages and gives full control over recovery behaviour. Many production parsers for widely used languages (GCC’s C parser, V8’s JavaScript parser) are hand-written recursive descent.

Trade-offs

The defining advantage of an external DSL is syntactic freedom. The language can look exactly the way the domain requires — no compromise with a host language’s syntax, no operator-overloading tricks, no blocks-that-aren’t-really-blocks. SQL reads as English-like queries because its designers chose that syntax; CSS reads as property-value declarations because that matches the domain of styling. Neither syntax could exist as an internal DSL without substantial distortion.

The defining cost is infrastructure. An external DSL needs a lexer, a parser, error reporting, and either an interpreter or a code generator. It needs documentation that explains the language on its own terms. It may need an editor plugin for syntax highlighting, a formatter, a linter. None of this comes for free — each piece must be designed, built, and maintained.

The tooling gap has narrowed. Modern parser generators (ANTLR), editor frameworks (tree-sitter, Language Server Protocol), and language workbenches have reduced the cost of building an external DSL substantially. What once required months of compiler engineering can now be prototyped in days.

Examples

SQL — relational querying. Designed by Chamberlin and Boyce (1974), based on Codd’s relational model. Declarative, non-Turing-complete in its standard form, and the most widely used DSL in production.
Regular expressions — pattern matching. Rooted in formal language theory (Kleene, 1956), extended through Unix tools (grep, sed) and Friedl’s comprehensive treatment.
CSS — visual styling. Selector-property-value syntax designed for separation of presentation from structure.
Make — build dependencies. Feldman (1976). Target-dependency-command syntax, tab-sensitive.
Awk — text processing. Aho, Weinberger, and Kernighan (1977). Pattern-action rules applied line by line.
GraphQL — API querying. Facebook (2015). A query language for APIs that lets the client specify exactly which data it needs.
Terraform HCL — infrastructure as code. HashiCorp Configuration Language. Declarative resource definitions with dependency tracking.
LaTeX — typesetting. Lamport (1984), built on Knuth’s TeX. A document preparation system where markup commands describe document structure and formatting.

Tooling

The modern ecosystem for building external DSLs includes:

ANTLR — parser generator supporting multiple target languages (Java, C#, Python, JavaScript, Go, C++). The most common starting point for new external DSLs.
Bison — the GNU successor to Yacc. LALR parser generator, still widely used in C/C++ projects.
tree-sitter — incremental parsing framework designed for code editors. Used by Neovim, Helix, and Zed for syntax highlighting and structural navigation.
Language Server Protocol — a standard protocol for editor-language integration. Allows a single language server to provide completion, diagnostics, and navigation to any editor that implements the protocol.

For a more integrated approach to DSL construction, see Language workbenches.

Sources

Fowler, M. (2010). Domain-Specific Languages. Addison-Wesley.
Parr, T. (2013). The Definitive ANTLR 4 Reference. Pragmatic Bookshelf.
Aho, A. V., Lam, M. S., Sethi, R., & Ullman, J. D. (2006). Compilers: Principles, Techniques, and Tools (2nd ed.). Addison-Wesley.
Friedl, J. E. F. (2006). Mastering Regular Expressions (3rd ed.). O’Reilly.