layout | title |
---|---|
page |
Programming languages resources |
This page is a collection of my favorite resources for people getting started writing programming languages. I hope to keep it updated as long as I continue to find great stuff.
I made a fun compilers t-shirt and also a fun JIT compilers t-shirt
- Tufts compilers course COMP/CS 181 (2006, but it's been taught more recently. I should probably ping Sam.)
- Cornell compilers course CS 6120 and interesting approach to project-based learning
- Nora Sandler's minimal C compiler
- Jack Crenshaw's let's build a compiler
- Recursive descent parsing in C. Note that this just verifies the input string, and more has to be done to build a tree out of the input.
- Vidar Hokstad's Writing a compiler in Ruby, bottom up
- Rui Ueyama's chibicc, a C compiler in the Ghuloum style
- The Natalie compiler for Ruby
- Compiler passes
- I've heard good things about Engineering a Compiler (3rd edition coming soon!)
- Destination-driven code generation (PDF)
- JavaScript AOT compilation
by Manuel Serrano
- Of JavaScript AOT Compilation Performance (PDF) by Manuel Serrano
- GitHub repo
- kanaka's mal
- leo (lwh)'s Building LISP
- Peter Michaux's Scheme from Scratch
- Daniel Holden's Build Your Own Lisp
- Anthony C. Hay's fairly readable Lisp interpreter in 90 lines of C++
- My own Writing a Lisp blog post series
- carld's Lisp in less than 200 lines of C
- UTexas's A simple scheme compiler
- Rui Ueyama's minilisp
- The Bones Scheme compiler
- The lecture notes for a course developing a Ghuloum-style compiler
- Ghuloum implementations
- Abdulaziz Ghuloum's minimal Scheme to x86 compiler (PDF)
- My adaptation in C (with implementation)
- Let's build a compiler
- Thorsten Ball's adaptation
- Nada Amin's adaptation
- Tao of Mac's Lisp implementation list
- sectorlisp and sectorlisp2 and lambda calculus in 383 bytes
- Termite: a Lisp for Distributed Computing (PDF)
- munificent's Crafting Interpreters book
- Mario Wolczko's CS 294-113, a course on managed runtimes
- My own bytecode compiler/VM blog post
- Justin Meiners and Ryan Pendelton's Write your own virtual machine
- Maxime Chevalier-Boisvert's website
- Serge's toy JVM
- Dragon taming with Tailbiter
- Phil Eaton's list of JS implementations
- Chris Seaton's The Ruby Compiler Survey and RubyConf 2021 talk (video) about it
- Laurence Tratt's "Why aren't more users more happy with our VMs?" Part 1 and Part 2
- Interesting runtimes
- Russ Cox's Regular expression matching: the virtual machine approach
- Bun tweet about DOMJIT
- Andy Wingo's a simple semi-space collector
Here are some resources I have found useful for understanding the ideas and research around optimizing dynamic languages.
- Efficient implementation of the Smalltalk-80 system
- Stefan Brunthaler's work
- Optimizing dynamically-typed object-oriented languages with polymorphic inline caches (PDF)
- Garbage collection in a large LISP system
- Urs Hölzle's thesis, Adaptive Optimization for Self (PDF)
- An inline cache isn't just a cache
- Baseline JIT and inline caches
- Javascript hidden classes and inline caching in V8
- CacheIR: A new approach to Inline Caching in Firefox
- Note on trial inlining using CacheIR
- Basic block versioning
- Stack Caching for Interpreters (PDF)
- Hotspot performance techniques
- Assembly interpreters
and follow-up
- Make sure to take a look at "Further Reading"
- A post including a snippet on direct-threaded dispatch in an assembly interpreter
- Stefan Marr's page about efficient and safe implementations of dynamic languages
- The Wikipedia page for Cheney's algorithm
- This web page about V8 internals
- Vyacheslav Egorov's inline cache explanation for JavaScript
- Caio Lima's inline cache explanation for JSC (with assembly!)
- V8's blog post about their baseline/template JIT
- V8's blog post about optimizing builtins with
CodeStubAssembler
- Object shapes
- Chris Seaton's RubyKaigi talk
- Aaron Patterson and Jemma Issroff's livestream (video)
- Kate Temkin's QEMU fork with a gadget-based pseudo-JIT and associated Twitter thread
- When pigs fly: optimizing bytecode interpreters
- I particularly like the snippet on bytecode VM traces
- Optimized Python runtimes
- Starlark is a total language similar to Python. It is used in build systems. I wonder if it could be used to generate Ninja files as a sort of "mini Bazel/Buck".
- This SSA paper: Simple and Efficient Construction of Static Single Assignment Form (PDF)
- Resources on mechanical sympathy and optimization coaching
- Optimization Coaching (PDF)
- Optimization Coaching for JavaScript (PDF)
- Vincent's thesis (PDF)
- JITProf and JITProf-visualization
- MonkeyType for Python
- specialist for Python
- Inspecting rustc LLVM optimization remarks using cargo-remark
- This paper about encoding low-level semantics in a higher-level language for optimizing code: Demystifying Magic: High-level Low-level Programming
- Meta-tracing JITs in native code
- Bump allocators: always bump downwards!
- Call-site optimization for Common Lisp (PDF)
- Posts about trace optimization:
- A nice PyPy trace viewer
- WebKit/JavaScriptCore stuff:
- FTL JIT
- B3, the Bare Bones Backend
- Speculation in JavaScriptCore
- Building the fastest Lua interpreter.. automatically!
- Threaded code by Anton Ertl
- Compiling coroutines/generators to state machines
And here are runtime optimization resources that I wrote!
- Inline caching, a post containing a small demo of how to speed up attribute lookups in an interpreter
- Inline caching: quickening, a post about speeding up interpreters using self-modifying bytecode ("bytecode rewriting" or "quickening")
- Small objects and pointer tagging, a post about speeding up interpreters using pointer tagging and encoding small objects inside pointers
Resources on representing small values efficiently.
- nikic's Pointer magic...
- Sean's NaN-Boxing
- zuiderkwast's nanbox
- albertnetymk's NaN Boxing
- Ghuloum's Incremental approach (PDF), which introduces pointer tagging in a compiler setting
- Chicken Scheme's data representation
- Guile Scheme's Faster Integers
- Femtolisp object implementation
- Leonard Schütz's NaN Boxing article
- Piotr Duperas's NaN boxing or how to make the world dynamic
- Fedor Indutny's SMIs and Doubles
Small JITs to help understand the basics. Note that these implementations tend to focus on the compiling ASTs or IRs to machine code, rather than the parts of the JIT that offer the most performance: inline caching and code inlining. Compiling is great but unless you're producing good machine code, it may not do a whole lot.
- Antonio Cuni's jit30min
- Christian Stigen Larsen's Writing a basic x86-64 JIT compiler from scratch in stock Python
- Ben Hoyt's Compiling Python syntax to x86-64 assembly for fun and (zero) profit
- My very undocumented (but hopefully readable) implementation of the Ghuloum compiler
- Matt Page's template_jit for CPython, which also contains a readable CFG implementation
Sometimes you want to generate assembly from a host language. Common use cases include compilers, both ahead-of-time and just-in-time. Here are some libraries that can help with that.
- Tachyon's x86-64 assembler (JS)
- Higgs' x86-64 assembler (D), which is based on Tachyon's
- yjit's x86-64 assembler (C) from Shopify's Ruby JIT, which is based on Higgs'
- Dart's multi-arch assembler (C++) and relevant constants, both of which need some extracting from the main project
- Strongtalk's x86 assembler (C)
- AsmJit's multi-arch assembler (C++)
- PeachPy's x86-64 assembler (Python)
- PPCI's x86-64 assembler (Python) and other great compiler infrastructure
- My small x86-64 assembler (C), which I forked from the pervognsen's original (C)
- A guide to using GCC inline assembly
- Whatever this is from the wasm micro runtime (C++)
- zasm (C++)
- monoasm (Rust)
For more inspiration, check out some of the assemblers in runtimes I mention in my Compiling a Lisp post.
- Bunny (C++)
- dstogov's IR (C)
- PeachPy (Python)
I have not written much about runtime optimization yet, but I would like to write about:
- Assembly interpreters (known to the JDK folks as a "template interpreter")
- Inline caching for attribute lookup
- Including actually-inline assembly caches with
cmp
/jmp
and stub, and a C++ wrapper (How does V8 do it? Hotspot? Dart (maybe)? JSC?)- Andy Wingo's notes
- Feedback vectors in V8 (video) (code)
- Notes on Hotspot CompiledIC
- Object shapes / hidden classes / layouts
- Compact objects
- Including actually-inline assembly caches with
- Attaching intrinsic functions or assembly stubs to well-known functions
- Garbage collectors
- Heap and GC characteristics from Garbage collection in a large LISP system
- Object handles in a copying collector (see Andy Chu's comment)
- Fast paths for common cases ("do less")
- JIT intermediate representations and how they help solve problems around megamorphic call sites, inlining, etc
- The GDB JIT interface & maintaining a parseable stack for unwinding
- Exception handling side-tables instead of block stacks
- Debugging mindsets
- Ways to think about debugging that make the process less stressful and thrashy
- Code transformations and analysis
- Definite assignment analysis
- Static Single Assignment (SSA)
- Writing JITs without writing assembly
- Tail-calls for efficient interpreters
- Including (top of) stack caching
- Copy-and-Patch compilation (PDF)
- Simplifying this would probably make for a fun blog post and could be combined with ICs and quickening from my runtime optimization series
- Lua interpreter post
- Lua JIT post
- Deegen talk (PDF)
- Tail-calls for efficient interpreters
- Precise native stack roots
- Accurate Garbage Collection in an Uncooperative Environment (2002, PDF)
- Accurate Garbage Collection in Uncooperative Environments Revisited (2006, PDF)
- Accurate Garbage Collection in Uncooperative Environments with Lazy Pointer Stacks (2007, PDF)
- Precise Garbage Collection for C (PDF)
- Skybison/V8/... handles and handle scopes
- Using LLVM's stack maps to do free precise runtime handles
- Type lattices
- Destination-driven code generation (PDF)
- Destination-Driven Code Generation (PDF)
- Destination-Passing Style (PDF)
- Yesterday, my program worked. Today, it does not. Why? (PDF)
- Interaction nets (PDF)
This is mostly a reminder for myself because I can never remember the order of registers. Sourced from the AMD64 ABI Draft 1.0 (PDF).
Caller-saved: rax
, rcx
, rdx
, rsi
, rdi
, r8
-r11
, xmm0
-xmm15
.
Callee-saved: rbx
, rsp
, rbp
, r12
-r15
.
Return values in rax
and rdx
.
Parameter | Register |
---|---|
1 | rdi |
2 | rsi |
3 | rdx |
4 | rcx |
5 | r8 |
6 | r9 |
Once registers are assigned, the arguments passed in memory are pushed on the stack in reversed (right-to-left) order
Return values in xmm0
and xmm1
.
Parameter | Register |
---|---|
1 | xmm0 |
2 | xmm1 |
3 | xmm2 |
4 | xmm3 |
5 | xmm4 |
6 | xmm5 |
7 | xmm6 |
8 | xmm7 |
This is a sort of grab-bag for helpful or interesting tools for programming language implementation.
- Blinkenlights, a visual x86-64 emulator
- Cosmopolitan libc
- Cosmopolitan ftrace
(wow, this is turning into a Justine Section)
- egg
- Representing loops within egg
- optir, which uses egg
- Search-based compiler code generation
- Compiler optimizations are hard because they forget
- Cranelift: Using E-Graphs for Verified, Cooperating Middle-End Optimizations
- Optimizing compilation with the Value State Dependence Graph (PDF)
Right now this is probably going to just be a section on Ninja clones.
- Ninja, the original version
- n2, another implementation by the original author (Rust)
- samurai (C99)
- Turtle, a version focused on high-level languages (Rust)
- The Pan Docs, which give technical data about the Game Boy hardware, I/O ports, flags, cartridges, memory map, etc
- This excellent explanation of the boot ROM
- This opcode table that details the full instruction set, including CB opcodes
- This full opcode reference for the GBZ80
- The Game Boy CPU manual (PDF)
- The GameBoy memory map
- This blog post that gives a pretty simple state machine for the different rendering steps
- The Ultimate Game Boy Talk (video) by Michael Steil at CCC
- This ROM generator for custom logos
- This sample DAA implementation
- This awesome-gbdev list
- This excellent emulator and debugger
- Another emulator and debugger
- The Game Boy complete technical reference (PDF)
- This Gameboy Overview
- blargg's test ROMs which have instruction tests, sound tests, etc
- gekkio's emulator and his test ROMs
- This fairly readable Go emulator, which has helped me make sense of some features
- This fairly readable C emulator
- This fairly readable C++ implementation
- This helpful GPU implementation in Rust
- This reference
for decoding GameBoy instructions.
- NOTE: This has one bug that someone and I independently found. The original repo has fixed the bug but not the page linked above.
- This summary blog post explaining GPU modes
- And of course /r/emudev
- DIY emulator/VM resources
This is a potentially fun way to render the screen without SDL, but only for non-interactive purposes.
This YouTube playlist looks like it could be worth a watch, but it's a lot of hours.
I should probably pick and choose some great stuff from these lists to copy onto this page.