5/27/2026 · 5,302 views

Why does tsgo use so much memory?

If you run tsgo on decently sized Typescript project, it’s not uncommon to see it using gigabytes of memory.

Why is that?

The short answer is:

when multi-threading, tsgo makes a type checker per thread
each type checker has its own state (types, symbols, etc.)
this state is not shared as synchronizing it between threads is costly
so each type checker often allocates duplicate, redundant memory
in addition, allocated types are literally never freed¹ It’s not uncommon for Typescript projects to have:
several thousand Typescript files
libraries like Zod, tRPC, Drizzle which result in many, many type instantiations
recursive generic types which product a lot of transient types which are never freed

When running tsgo on a large Typescript project, these type creation patterns compound and result in a lot of duplicated or unused memory.

Let’s dig deeper.

Heap analysisλ

Let’s first get a breakdown of the heap so we can see what’s taking up so much memory.

I’ll run tsgo on a large nextjs project with Zod, tRPC, Drizzle, all the good stuff that makes the typechecker do work. Including node_modules, it’s about 7k .ts files.

We can use Go’s runtime/pprof package to capture peak heap snapshots and the pprof tool to tell us which functions allocated the most memory with the -inuse_space flag.

If we categorize them by AST, typechecker etc. we see this:

Total live heap:                    1471.9 MB
  pprof writer self-overhead:         75.2 MB
  real live data:                   1321.5 MB

      MB     pct   Family
──────────────────────────────────────────────────────────────────────────────
  594.72   45.0%   AST arenas (parser-allocated)
  399.12   30.2%   Checker (type/signature computation)
  121.79    9.2%   LinkStore (per-node/per-symbol caches)
   63.38    4.8%   OS / syscall / file I/O
   62.58    4.7%   Binder (symbol/flow declarations)
   22.33    1.7%   Parser (intern maps, etc.)
   20.24    1.5%   pkg: collections
   15.54    1.2%   Checker arenas
   13.46    1.0%   AST utilities
    6.58    0.5%   Compiler / module resolution
    1.10    0.1%   pkg: core
    0.70    0.1%   pkg: packagejson

What sticks out at first glance is 45% of memory (600MB) is allocated for AST nodes. It sounds like a lot, but it’s actually expected for the bulk of the memory allocated by a compiler to be taken up by AST nodes.

AST nodes also typically need to live for the duration of the compiler’s execution, so there’s nothing we can really do here. A lot of files means a lot of AST nodes!

I’m more interested in the memory allocated by the typechecker (the Checker struct in the source).

What happens if we run tsgo with --singleThreaded?

Total live heap:                     797.4 MB
  pprof writer self-overhead:          3.6 MB
  real live data:                    790.2 MB

      MB     pct   Family
──────────────────────────────────────────────────────────────────────────────
  522.95   66.2%   AST arenas (parser-allocated)
   63.37    8.0%   OS / syscall / file I/O
   62.63    7.9%   Binder (symbol/flow declarations)
   51.93    6.6%   Checker (type/signature computation)
   23.01    2.9%   LinkStore (per-node/per-symbol caches)
   22.51    2.8%   Parser (intern maps, etc.)
   16.78    2.1%   AST utilities
   16.15    2.0%   pkg: collections
   10.21    1.3%   Compiler / module resolution
    0.58    0.1%   pkg: packagejson
    0.10    0.0%   pkg: core
    0.01    0.0%   ** unclassified **

The typechecker takes up only ~50MB instead of ~400MB! This strongly suggests to me that there is some overhead with multi-threading here.

Let’s look at the typechecker deeper.

The type checkerλ

The way tsgo multi-threads typechecking is by creating a pool of Checker for each thread:

// internal/compiler/checkerpool.go

func newCheckerPoolWithTracing(program *Program, tr *tracing.Tracing) *checkerPool {
	checkerCount := 4
	if program.SingleThreaded() {
		checkerCount = 1
	} else if c := program.Options().Checkers; c != nil {
		checkerCount = *c
	}

	checkerCount = max(min(checkerCount, len(program.files), 256), 1)

	pool := &checkerPool{
		program:  program,
		checkers: make([]*checker.Checker, checkerCount),
		locks:    make([]*sync.Mutex, checkerCount),
		tracing:  tr,
	}

	return pool
}

When a Checker is created, it is given the entire Typescript program AST and all its files:

// internal/checker/checker.go

func NewChecker(program Program, tracer *Tracer) (*Checker, *sync.Mutex) {
	program.BindSourceFiles()

	c := &Checker{}
	c.id = nextCheckerID.Add(1)
	c.tracer = tracer
	c.program = program
	c.compilerOptions = program.Options()
	c.files = program.SourceFiles()
	c.fileIndexMap = createFileIndexMap(c.files)

	// ... more code
}

During typechecking and emitting diagnostics for a file, each file gets assigned to the next available Checker.

Each Checker has it’s own state for type-checking (which we’ll see in more detail later). Here’s an example of duplicated work:

File a.ts goes to Checker 1, it creates a bunch of types.
File b.ts imports some type from a.ts and goes to Checker 2.
Checker 2 has its own separate state, so it needs to recompute and re-allocate data for a.ts.

From the pprof run I noticed the top allocating Checker functions were:

Checker.newSymbol() (symbols)
Checker.newObjectType() (types)
Checker.instantiateType() (types)

Let’s look at precisely what the data being allocated in each Checker.

Duplicated typesλ

Each Checker has a lot of stores for the many types that could be constructed:

type Checker struct {
  stringLiteralTypes            map[string]*Type
  numberLiteralTypes            map[jsnum.Number]*Type
  bigintLiteralTypes            map[jsnum.PseudoBigInt]*Type
  enumLiteralTypes              map[EnumLiteralKey]*Type
  indexedAccessTypes            map[CacheHashKey]*Type
  templateLiteralTypes          map[CacheHashKey]*Type
  stringMappingTypes            map[StringMappingKey]*Type
  cachedTypes                   map[CachedTypeKey]*Type        
  cachedSignatures              map[CachedSignatureKey]*Signature
  narrowedTypes                 map[NarrowedTypeKey]*Type
  assignmentReducedTypes        map[AssignmentReducedKey]*Type
  discriminatedContextualTypes  map[DiscriminatedContextualTypeKey]*Type
  instantiationExpressionTypes  map[InstantiationExpressionKey]*Type
  substitutionTypes             map[SubstitutionTypeKey]*Type
  reverseMappedCache            map[ReverseMappedTypeKey]*Type
  reverseHomomorphicMappedCache map[ReverseMappedTypeKey]*Type
  iterationTypesCache           map[IterationTypesKey]IterationTypes
  tupleTypes                    map[CacheHashKey]*Type
  unionTypes                    map[CacheHashKey]*Type
  unionOfUnionTypes             map[UnionOfUnionKey]*Type
  intersectionTypes             map[CacheHashKey]*Type
  propertiesTypes               map[PropertiesTypesKey]*Type
  flowLoopCache                 map[FlowLoopKey]*Type
  flowTypeCache                 map[*ast.Node]*Type
  errorTypes                    map[CacheHashKey]*Type
  // and many more!
}

Remember that:

this memory belongs to a single Checker and there’s no sharing of the data.
allocated types never get freed

This means there can be a lot of duplicated memory that sits around.

To verify this, let’s start by creating a file with some code that builds tuples:

type BuildTuple<L extends number, T extends any[] = []> =
  T['length'] extends L ? T : BuildTuple<L, [...T, any]>;

type TC = BuildTuple<100>;
declare const x: TC;
export const c0 = x[0];
export const cLen: 100 = x.length;

The BuildTuple<L, T> type will recursively build a tuple type from the empty tuple type [] all the way to a tuple with 100 any’s in it ([any, any, ... any]).

Each iteration of the recursion creates a new tuple and caches it forever².

If we create 4 files with the content as above and ran it through tsgo, we should see 100 tuple types created and duplicated across 4 typecheckers (and also 100 number literal types).

Let’s see:

                            single checker     4 checkers
                            ─────────────────  ─────────────────────────────
  tupleTypes                102                [102 102 102 102]  →  408
  numberLiteralTypes        101                [101 101 101 101]  →  404

This illustrates two things:

types will be redundantly created on different threads
a recursive generic type can create a lot of transient types which take up memory

This is just a trivial example. Imagine the level of duplication that could happen when typechecking many thousands of files.

Duplicated symbolsλ

In compilers, named things (identifiers for functions, variables, etc.) often get recorded in a layer of indirection called a “symbol”.

Usually, this lets names be scoped (“foo” in the global scope and “foo” in function scope mean two different things) and also lets you give a stable handle to them in case you want to rename (e.g. minification).

Each Checker stores a bunch of symbols:

type Checker struct {
	// ... more code
    symbolArena core.Arena[ast.Symbol]
	// ... more code
}

Are symbols being duplicated a lot?

I modified tsgo to dump the top symbol names (the string part of the symbol) when running 4 threads:

tsgo --checkers 4

| Symbol         | Kind     |  Count |
| -------------- | -------- | -----: |
| `at`           | Method   | 34,500 |
| `_`            | Property | 25,600 |
| `name`         | Property | 24,700 |
| `value`        | FuncVar  | 22,800 |
| `@@iterator`   | Method   | 22,300 |
| `data`         | Property | 22,100 |
| `enumValues`   | Property | 21,900 |
| `columnType`   | Property | 21,000 |
| `dataType`     | Property | 21,000 |
| `generated`    | Property | 19,500 |

Let’s look at the at symbol count. If it decreases with single-threaded tsgo then that probably means other threads are duplicating it:

tsgo --checkers 1

| Symbol         | Kind     |  Count |
| -------------- | -------- | -----: |
| `props`        | FuncVar  | 16,800 |
| `at`           | Method   | 14,600 |
| `children`     | Property | 10,400 |
| `value`        | FuncVar  | 10,200 |
| `@@iterator`   | Method   |  9,500 |
| `className`    | Property |  9,200 |
| `data`         | Property |  8,500 |
| `forEach`      | Method   |  8,100 |
| `map`          | Method   |  8,000 |
| `find`         | Method   |  7,900 |

So there’s about 20k more at symbols created when running tsgo with 4 threads!

Let’s verify it by creating a little test file.

The at symbol is from Array<T>.prototype.at. We can force Typescript to create this symbol by creating an Array<T> and doing any property lookup on it, this causes Typescript to resolve all members (and create their symbols)³ on the Array object:

declare const arr: Array<string>;
export const len = arr.length;

Now we can create 4 files with this exact same contents. If we run tsgo with --checkers 4 each file should go to a Checker and we’ll see if it duplicates the at symbol:

               --checkers 1         --checkers 4
               ─────────────        ──────────────────────────────
               total                total    c0    c1    c2    c3
at             1                    4        1     1     1     1

So each checker duplicated the symbol for Array<string>.prototype.at.

Also note that new symbols are created for every new instantiation of a type parameter. So Array<string>, Array<number>, etc. will all get their own symbols for at and any other members. This is pretty standard and normal.

But you can start to see how it could be easy for tsgo to duplicate a lot of symbols on other threads.

Imagine your code creates some generic type with a lot of fields and methods, maybe for a data structure:

type MyDataStructure<T> = {
  field1: T;
  field2: string;
  // ...
  field100: string;
}

Each instantiation will create 100 symbols. And then perhaps if you import this type in many files, it’s highly likely that it will be seen and duplicated across more than one Checker.

A real-world example is Zod objects. Zod’s method chaining API returns a ZodObject instantiated with different type parameters:

const emailSchema = z.string().email().min(5).max(120).toLowerCase();

Each .string(), .email() etc instantiates some new ZodObject<Shape, Config> type and the property chaining causes Typescript to resolve and create symbols (as well as allocating the individual types!).

There are similar APIs like Drizzle, tRPC that all do a similar thing, and when multiplied by multiple threads this leads to a lot of memory usage.

Conclusionλ

This was a fun dive into the tsgo source.

How could memory usage be made better in the future?

Garbage collecting types sounds promising, especially since Typescript types behave like regular values in a programming language. Transient types wouldn’t be bound to an AST node or anything and would get GC’ed.

Persistent, shared data structures are used in FP languages which, like Typescript, have the problem of creating many transient values. This could help reduce memory usage for types like tuples etc.

Another interesting place to look at is how the Zig compiler’s InternPool solves a similar problem with comptime values and types⁴.

Here is my fork which contains scripts to process pprof’s data as well as some modifications to tsgo code to emit profiling data.

Footnotesλ

^1↩︎

This may sound crazy at first, and it kind of is. Because Typescript types are turing complete, this feels like writing a program which never frees its memory. However, I can kind of see how it got to this point. It’s pretty standard practice for compilers to allocate AST nodes in arenas (which don’t get freed until end of program) since the AST will live pretty much for the duration of the compiler’s execution. Similarly, in compilers with types that are “normal”, you often associate the type to the AST node and once it’s associated with an AST node it lives forever because the AST does. However, with types like Typescript’s, which are turing complete and can have iterative recursion which produces many, many transient types, this may not be the best strategy.

^2↩︎

The getTupleTargetType() function creates a tuple and stores it in the tupleType of Checker.

^3↩︎

In Typescript’s typechecker, getting a property on an object causes it to resolve members on it which then calls instantiateSymbolTable().

^4↩︎

I’m not an expert on Zig compiler internals but my understanding of InternPool is this:

Typescript types are essentially compile-time values
Zig has compile-time execution (comptime) that is also Turing complete like Typescript and produces a lot of compile-time values (types are also values in Zig’s comptime).
Zig’s semantic analysis pass (which does comptime execution) also has a problem of multi-threading and sharing or duplicating data. Semal is currently single-threaded, but there are other AstGen threads which read its values.
Zig’s solution is slightly inspired by RCU.
Basically: threads can share data, readers are lock-free, and writers can update shared values without blocking readers.
Each thread has it’s own thread local storage where it can place compile-time values
There is a global list of “shards” which you can use to lookup which particular thread a value lives in (basically a map which maps hash of value -> (thread, index))
Each value is hashed to determine which shard it belongs to
When you want to look up a value, you hash it, look it up in the shard which uses atomics, no locks
When a thread creates a value, it stores it in it’s thread local store, and uses a writer mutex to update the corresponding shard
This has a few performance characteristics:
Reading requires zero locking, very fast
Writing does require a lock, but only on the shard, and since the data is split into multiple shards it’s less contention than a single read/write mutex for example
Zig comptime values are also immutable, so writing essentially happens only at creation time