the roadmap

where razen is today and where it's heading. phases 1-3 complete, phase 4 in progress.

Razen Compiler Roadmap (C++ Implementation)

Philosophy: Meaningful. Accurate. Simple. Maximum Performance. No Hidden Magic. Style: Direct. No fluff. Every checkbox is a concrete deliverable.


Legend

MarkMeaning
Done — tested and working in pipeline
Partial — parsed/validated but codegen missing or broken
Not started

Stage 0: Project Infrastructure

Build System

  • ✓ Makefile build system (clang++-20, C++20, make && ./razenc)
  • ✓ Dependency on C++20 or later (-std=c++20)
  • razenc CLI binary (separate from host build)
  • ✓ Source file input (positional .rzn args + -f flag)
  • ◐ Output file flags (emitObject/emitAssembly via IRGen — no --emit= CLI flag yet)
  • ☐ Target triple specification for cross-compilation
  • ☐ Optimization level flags (-O0 through -O3)
  • ☐ DWARF debug info generation (no -g CLI flag yet)

Documentation

  • ✓ README.md with philosophy, build, and quick start
  • ✓ ROADMAP.md (this file)
  • ✓ docs/ — introduction, basics, types, functions, control flow, behaviours, std_lib, compilation, syntax, style, faq, memory, modules, expressions, generics, attributes, error handling, testing
  • ✓ design/ — keywords, std_new (detailed std spec), examples
  • ☐ Language specification (formal grammar)
  • ☐ Compiler internals guide

Testing

  • ✓ Sample programs in src/samples/ (6 sample headers with 30+ test programs)
  • ☐ Automated test runner
  • ☐ Unit tests for lexer, parser, semantic, codegen
  • ☐ Integration tests (compile + verify LLVM IR output)
  • ☐ Fuzz testing for parser and semantic analyzer
  • ☐ Regression test suite for all open issues

Stage 1: Lexer (Phase 1) ✓ Complete

Token Types — All Tokens Defined and Lexed

  • ✓ Keywords: func, ret, if, else, loop, break, skip, match, const, mut, pub, use, mod, struct, enum, union, error, behave, ext, async, defer, try, catch, type, true, false, void, noret, any, test, needs
  • ✓ Primitive types: i1-i128, u1-u128, isize, usize, int, uint, f16-f128, float, bool, char, str, string
  • ✓ Operators: +, -, *, /, %, =, ==, !=, <, >, <=, >=, +=, -=, *=, /=, %=, !, &&, ||, &, |, ^, ~, <<, >>, ., .., ..., ..=, ->, =>, ~>, :, :=, ,, ;, @, ?
  • ✓ Delimiters: (, ), {, }, [, ]
  • ✓ Integer literals (decimal)
  • ✓ Float literals
  • ✓ String literals with escape sequences (\n, \t, \", \\, \', \0, \r, \xNN)
  • ✓ Char literals with escape sequences
  • ✓ Bool literals (true, false)
  • ✓ Single-line comments (//) with correct line tracking
  • ✓ Block comments (/* */) spanning multiple lines with correct line tracking
  • ✓ Line/column tracking on every token
  • ✓ EOF token

Lexer Architecture

  • ✓ Stateful Lexer class with position, line, char tracking
  • ✓ Character-by-character processing loop
  • ✓ Operator multi-character peek-ahead
  • ✓ Dot operator differentiation (., .., ..., ..=)
  • ✓ Word/keyword/number disambiguation
  • ✓ Identifier tokenization
  • ✓ Unrecognized character handling (throws LexerError with context)

Stage 2: Parser & AST (Phase 2) ✓ Complete

AST Node Types (48 node types)

  • ✓ Literal nodes: IntegerLiteral, FloatLiteral, StringLiteral, CharLiteral, BoolLiteral, ArrayLiteral, TupleLiteral, ArrayType
  • ✓ Identifier node
  • ✓ Type nodes: VarType, PointerType (*T), ArrayType ([T], [T;N]), OptionalType (?T), FailableType (!T), ErrorUnionType (Error!T)
  • ✓ Declaration nodes: FunctionDeclaration, VarDeclaration, ConstDeclaration, StructDeclaration, EnumDeclaration, UnionDeclaration, ErrorMapDeclaration, TypeAliasDeclaration, ModuleDeclaration, UseDeclaration, BehaviourDeclaration, ExtDeclaration, Annotation, GenericParams
  • ✓ Statement nodes: ReturnStatement, IfStatement, ElseIfStatement, LoopStatement, MatchStatement, TryStatement (TryExpression), CatchBlock (CatchExpression), DeferStatement, Assignment, Block, TryBlock
  • ✓ BreakStatement / SkipStatement — dedicated AST nodes with correct loop-scope validation
  • ✓ Expression nodes: BinaryExpression, UnaryExpression, FunctionCall, MemberAccess, BuiltinExpression, RangeExpression, CaptureBlock
  • ✓ Structural nodes: ReturnType, IfBody, ElseBody, LoopBody, MatchBody, MatchCase, Parameters, Parameter, Argument
  • ✓ Comment node

Declaration Parsing

  • func name(params) -> ret_type { body } — full function parsing
  • pub func / async func / const func / ext func / ext struct / ext enum / ext union variants
  • ✓ Generic parameters: @Generic(T), @Generic(T, E) — parsed with GenericParams node, stored on declaration
  • ✓ Parameter parsing with mut/const prefix, variadic ...
  • struct Name { fields... } with methods, ~> trait impls, ~> rename syntax, field defaults
  • enum Name: backing_type { variants... } with explicit values, methods, ~> (traits in children, consistent with struct/union)
  • union Name { variants... } — tuple-style, struct-variant
  • error Name { variants... } — error set declaration
  • behave Name { needs..., func... } — behaviour/trait declaration, with ~> inheritance
  • const Name: type = expr — compile-time constants
  • type Name = Type — type aliases
  • mod Name; — module declarations
  • use dotted.path; — import statements
  • pub visibility flag on all declarations

Statement Parsing

  • ✓ Variable declarations: name: type = expr, name := expr, mut variant
  • ✓ Assignment: name = expr, name +=/-=/*=//=/%= expr
  • ret expr / ret (void return)
  • if cond { ... } else { ... } — including else if chaining
  • loop { ... } — infinite loop
  • loop cond { ... } — conditional loop
  • loop expr |item| { ... } — iterator loop (parsed)
  • break, skip
  • defer { ... }, defer stmt
  • match expr { pat => body, ... } with literal/enum/destructure/wildcard patterns
  • try expr, try expr catch |err| { ... }, try { ... } catch (err) { ... } (block variant)
  • @as(Type, expr) and other @ builtins (parsed as BuiltinExpression)

Expression Parsing

  • ✓ Full precedence-climbing expression parser (12 precedence levels)
  • ✓ All binary operators with correct associativity
  • ✓ Unary: -, !, ~ (bitwise not), & (address-of), * (dereference via ptr.*)
  • ✓ Pointer dereference: ptr.* (postfix)
  • ✓ Member access: a.b.c
  • ✓ Function calls: f(args) with argument lists
  • ✓ Array literals: [1, 2, 3]
  • ✓ Tuple literals: .{a, b, c}
  • ✓ Range expressions: .., ..=, ... parsed as dedicated RangeExpression nodes with precedence 11
  • ✓ Capture blocks: |e| expr
  • ✓ Parenthesized grouping
  • ✓ Type annotations in expression context

Type Parsing

  • ✓ All primitive types (i1-u128, f16-f128, bool, char, void, noret, any)
  • ✓ Pointer types: *T
  • ✓ Optional types: ?T
  • ✓ Failable types: !T
  • ✓ Error union types: Error!T (named) and error!T (anonymous)
  • ✓ Array types: [T], [T; N]
  • ✓ Collection types: vec[T], map{K,V}, set{T}
  • ◐ Builtin types: @Self, @Type, @Generic(T) — parsed as identifiers, no special validation
  • mut type modifier

Stage 3: Semantic Analysis (Phase 3) ✓ Complete

Symbol Table & Scope Management

  • ✓ Scope class with parent chain for lexical scoping
  • ✓ Symbol types: Variable, Function, Struct, Enum, Union, Trait, ErrorSet, Module, TypeAlias
  • ✓ PushScope / PopScope for block boundaries
  • ✓ Two-pass design: pass 1 (declare globals) + pass 2 (analyze bodies)

Name Resolution

  • ✓ Global declaration registration (functions, structs, enums, unions, traits, behaviours, aliases, modules)
  • ✓ Variable name resolution in expressions
  • ✓ Function name resolution for calls
  • ◐ Module-scoped name resolution (mod / use paths — basic path tracking, no multi-file linking)
  • std identifier whitelisted
  • ✓ Built-in identifier whitelist: self, true, false, null, print, println, printf, puts, eprint, eprintln, exit, assert, panic, clock_ms, clock_ns

Declaration Validation

  • ✓ Duplicate declaration detection in same scope
  • ✓ Function parameter count validation on calls
  • ✓ Function argument count validation
  • ✓ Mutability checks (immutable assignment detection)
  • ✓ Undeclared identifier detection
  • ✓ Return type validation (shows expected vs actual types with typeToString)
  • ✓ Function argument type matching
  • ☐ Constant expression evaluation (comptime)
  • ✓ Struct field declaration tracking
  • ✓ Enum variant tracking

Type Checking

  • ✓ Expression type inference for :=
  • ✓ Operator type compatibility (arithmetic, comparison, logical, bitwise) with rich error messages
  • ✓ Assignment type compatibility with typeToString diff messages
  • ✓ Pointer/reference type validation (address-of returns pointer type, dereference requires pointer)
  • ✓ If condition must be boolean
  • ✓ Loop condition must be boolean
  • ✓ Break/skip outside loop detection
  • ✓ Struct field access validation (field existence, returns field type)
  • ✓ Error union handling (error_type extracted from named error sets, try/catch recognized)
  • ☐ Array/slice index validation
  • ☐ Behaviour implementation signature checking
  • ☐ Comptime const expression validation

Error Reporting

  • ✓ Categorized errors: [TypeError], [NameError], [MutError], [ReturnError], [DeclError], [FlowError], [ArgError], [SyntaxError], [FieldError]
  • ✓ Color-coded output with RED category tags and CYAN position info
  • ✓ Line:column position on every error (line N:M)
  • ✓ Expected vs found type display via typeToString() for pointer, optional, error union, struct, enum types

Stage 4: LLVM IR Code Generation (Phase 4) ≈85%

Phase 4 Architecture

  • ✓ Module preamble: source_filename, target layout (e-m:e-p270:...), target triple (x86_64-pc-linux-gnu)
  • ✓ Libc function declarations (printf, puts, exit, abort) with LLVM attributes
  • ◐ Std library IR injection (std.fmt module injection from src/std/fmt.rzn)
  • ✓ Global node dispatch (genNode switch on all node types)
  • ✓ IRGen shared state (Codegen struct with locals, types, enums, unions, errors maps)
  • ✓ StringBuilder for efficient IR assembly (IRBuilder with LLVM API)
  • ✓ Comment/Annotation/GenericParams/Module/Use/Behave/TypeAlias nodes skipped in codegen
  • ✓ Optimization pipeline (mem2reg + instcombine via new PM PassBuilder)
  • ✓ Object/assembly emission (emitObject/emitAssembly via TargetMachine)
  • ✓ CLI: --verbose/--debug, --help, --version, -f, positional file args

Type Mapping to LLVM

  • ✓ Primitive types: i1-u128, f16-f128, bool→i1, char→i8, void, str→ptr, string→ptr, any→ptr
  • ✓ Pointer types: *Tptr (opaque)
  • ✓ Compound types: structs→%T, enums→iN, unions→{i32,[N x i8]}, error unions→{i1,T}, optionals→{i1,T}, failables→{i1,T}
  • ✓ Array types: [N x T], slice→ptr

Variable Declarations

  • ✓ Local alloca with store initializer
  • ✓ Global const declarations with InternalLinkage and constant initializers
  • ✓ Global non-const declarations with deferred init via __raz_global_init() constructor
  • ✓ Type inference via := with type mapping

Function Code Generation

  • define @name with parameter list
  • ✓ Parameter alloca and store at entry block
  • ✓ Return value handling with default zeroinit for void
  • ✓ External function declarations (ext func) with variadic support
  • ✓ Self parameter handling for method calls

Expression Code Generation

  • ✓ Literals: int (i32), float (double), bool (i1), char (i8), string (global @.str.N with dedup)
  • ✓ Identifier: load from alloca with type tracking, enum/error lookup, null → null constant
  • ✓ Binary operators: arithmetic (Add/FAdd), comparison (ICmp/FCmp), logical (And/Or i1), bitwise (And/Or/Xor/Shl/AShr) — all with float/int dispatch and SExt/Trunc widening
  • ✓ Unary operators: negate (Neg/FNeg), not (Xor 1), bitnot (Xor all-ones), address-of (&), dereference (.* via LoadInst)
  • ✓ Function calls with argument type widening (SExt/Trunc)
  • ✓ Member access: struct field GEP, enum variant constant, error set constant, method call with mangled name
  • ◐ Method calls: c.method() → mangled name struct.method + self arg (basic support, no vtable dispatch)
  • ✓ Array literal: ConstantArray or alloca + per-element GEP+store
  • ✓ Tuple literal: ConstantStruct or alloca + stores
  • ✓ Range expression: {start, end} struct pair
  • ✓ Union construction: alloca, tag store, payload bitcast+store

Statement Code Generation

  • ✓ Variable declarations with initializer
  • ✓ Assignment: =, +=, -=, *=, /=, %= (all with float/int dispatch)
  • ✓ Assignment to struct field: x.field = expr via GEP chain
  • ✓ Return statement with value (including SExt/Trunc for integer width mismatch)
  • ✓ If/else with else-if chaining (CondBr based on i1 condition, end block joining)
  • ✓ Loops: infinite (loop {}), conditional (loop cond {}), with cond/body/end basic blocks
  • ✓ Break → br to loop.end, Skip → br to loop.continue
  • ✓ Defer → LIFO flush before return (reverse iteration of deferred vector)
  • ✓ Match → icmp eq chain for simple enums, tag dispatch for tagged unions with payload extraction and variable binding
  • ✓ Try expression → flag check, branch to catch or propagate (return)
  • ✓ Try block → scope with catch target

Struct Code Generation

  • ✓ Type definition: %StructName = type { field_types }
  • ✓ Field access via getelementptr with field type tracking
  • ✓ Struct field assignment via GEP chain
  • ✓ Constructors with explicit field initialization
  • ◐ Struct methods: collected and emitted with mangled names (struct.method)

Enum Code Generation

  • ✓ Type definition: %EnumName = type backing_integer (i32 default, custom backing)
  • ✓ Variant values computed and tracked (explicit and implicit, auto-increment)
  • ✓ Enum member access resolves to integer constant
  • ✓ Match dispatch using icmp eq on backing integer type

Union Code Generation

  • ✓ Tagged union type: %UnionName = type { i32, [N x i8] }
  • ✓ Max payload size computed from all variants
  • ✓ Union construction: tag store + payload bitcast+store
  • ✓ Match tag dispatch with payload extraction
  • ✓ Payload variable binding in match arms

Error Handling Code Generation

  • ✓ Error set declaration with variant→code mapping (incrementing from 0)
  • ✓ Error variant reference in expressions: FileError.NotFound → i32 code
  • ✓ Error union return construction: {i1 1, i32 code} or {i1 0, T value}
  • ✓ Try expression: extractvalue flag check, branch to catch or propagate
  • ✓ Try block: scope with catch handler target
  • ☐ Behaviour / Trait code generation (vtable dispatch)
  • ☐ Generator / Async code generation

Stage 5: Standard Library

Current Std Architecture

  • std.fmt module — src/std/fmt.rzn provides print, println, printf, puts wrappers (injected at compile time when use std.fmt is found)
  • ◐ C++ source-level module injection via file read during AST building
  • ☐ LLVM IR templates embedded in C++ string constants
  • ☐ C++ wrapper files for std modules
  • ☐ Std function name mapping

All Modules (std.core, std.mem, std.str, std.string, std.fmt, std.io, std.fs, std.os, std.vec, std.map, std.set, std.ring, std.math, std.bits, std.ascii, std.unicode, std.parse, std.buf, std.hash, std.sync, std.time, std.testing, std.debug)

  • ☐ All items in all modules (design complete in design/std_new.md, implementation not started)

Stage 6: Critical Missing Features

Codegen Gaps

  • ☐ Short-circuit evaluation for && / || (currently emits bitwise And/Or)
  • ☐ Block-level break/continue labels (break/skip only target innermost loop)
  • ☐ Behaviour/trait vtable dispatch
  • ☐ Comptime / metaprogramming
  • ☐ String literal interpolation ({} in format strings)
  • ☐ Enum match IR — basic block ordering issues in some cases
  • ☐ Generic monomorphization (@Generic(T) annotations parsed but not specialized)

Usability

  • ☐ Automated test runner
  • ☐ Package/module resolution across files
  • ☐ Error recovery in parser (currently panics on first error)
  • ☐ Language server protocol (LSP) support
  • ☐ WASM target

Stage 7: Compiler Self-Hosting

Bootstrap Path (all ☐)

  • ☐ Razen type system self-hosting
  • ☐ Razen lexer written in Razen
  • ☐ Razen parser written in Razen
  • ☐ Razen → C/C++ transpilation bootstrapping
  • ☐ Full self-hosting

Milestone Summary

MilestoneDescriptionKey DeliverablesStatus
M0C++ pipeline skeletonMakefile, lexer, parser, semantic stubs✓ Complete
M1Working lexerFull tokenization of all Razen constructs✓ Complete
M2Full parser + ASTAll declarations, statements, expressions, generics, ranges, else-if, block try, ext struct✓ Complete
M3Semantic analysisType checking, scope, validation, categorized errors, typeToString, pointer/error union compatibility✓ Complete
M4LLVM codegenAll types, expressions, statements, control flow, optimization, emission◐ ≈85% — methods basic, no vtable dispatch
M5Struct codegenStruct types, field access, methods✓ Types+fields ✓, methods ◐
M6Enum + Match + UnionEnumerations, tagged unions, match dispatch✓ Enum✓ Union✓ Match✓
M7Error handlingError sets, error unions, try/catch◐ Error sets + try expr done, union propagation basic
M8Codegen optimizationmem2reg, instcombine, object/assembly emission✓ Complete — new PM PassBuilder, TargetMachine
M9CLI & tooling--help, --version, -v, -f, file input✓ Complete
M10CollectionsVec, Map, Set with generics☐ Not started
M11Std libraryAll 24 std modules implemented☐ Not started (fmt.rzn functional)
M12Self-hostingRazen compiler compiles itself☐ Not started

Design Constraints

  • ☐ Zero hidden allocations — all allocation takes explicit Allocator param
  • ☐ Predictable LLVM mapping — clear path from source to IR
  • ☐ No implicit casts — type conversions must be explicit
  • ☐ No hidden magic — no GC, no implicit allocs, no hidden control flow
  • ☐ Zero-cost abstractions — behaviours dispatch without overhead

Progress: Phases 1-3 (Lexer, Parser, Semantic Analysis) complete. Phase 4 (Codegen) ~85% complete — core infrastructure fully done, optimization pipeline (mem2reg + instcombine), object/assembly emission (emitObject/emitAssembly), CLI (--help, --version, -v, file args), all major control flow (if/loop/match/defer/try), compound types (struct/enum/union/error), all expression types. Remaining codegen work: struct method vtable dispatch, short-circuit evaluation, comptime evaluation, and match IR block ordering edge cases.

Std Library: ~5% — fmt.rzn (print/println/printf/puts) is functional and injected on use std.fmt.

All 10 built-in sample programs produce valid .o and .s files in output/; compiled object files link and execute correctly.

Next Target: Remaining codegen gaps (short-circuit &&/||, comptime evaluation, enum match CFG fix) + struct method codegen + begin std library modules (std.os, std.mem).