Does anyone have any experience using `mold` while building Pants #development

Does anyone have any experience using `mold` while...

wide-midnight-78598

10/06/2025, 12:27 AM

Does anyone have any experience using

mold

while building pants? I’m testing on an old/slow computer - and it doesn’t appear to improve link time at all, while I was hoping there would be some per-core improvement

gorgeous-winter-99296

10/07/2025, 6:54 AM

Can confirm 🙂

Copy code

MOLD:    Finished `release` profile [optimized + debuginfo] target(s) in 3m 37s
 LLD:    Finished `release` profile [optimized + debuginfo] target(s) in 3m 28s

wide-midnight-78598

10/07/2025, 12:02 PM

.... huh... that's surprising, from everything I've read about mold and from their first-party info:

wide-midnight-78598

10/07/2025, 12:06 PM

I still have some messing around to do, but I confirmed mold was used to link.

gorgeous-winter-99296

10/07/2025, 12:32 PM

Just running

RUSTFLAGS="-Ztime-passes" cargo + nightly build --release

and by far the biggest item is LLVM, link times is neglible.

Copy code

time:   0.000; rss:   39MB ->   40MB (   +1MB)  parse_crate
time:   0.000; rss:   44MB ->   45MB (   +0MB)  crate_injection
time:   0.396; rss:   45MB ->  219MB ( +174MB)  expand_crate
time:   0.396; rss:   45MB ->  219MB ( +174MB)  macro_expand_crate
time:   0.004; rss:  219MB ->  219MB (   +0MB)  AST_validation
time:   0.002; rss:  219MB ->  219MB (   +0MB)  finalize_imports
time:   0.001; rss:  219MB ->  220MB (   +0MB)  finalize_macro_resolutions
time:   0.084; rss:  220MB ->  254MB (  +35MB)  late_resolve_crate
time:   0.005; rss:  254MB ->  254MB (   +0MB)  resolve_check_unused
time:   0.007; rss:  254MB ->  254MB (   +0MB)  resolve_postprocess
time:   0.100; rss:  219MB ->  254MB (  +35MB)  resolve_crate
time:   0.009; rss:  268MB ->  268MB (   +0MB)  drop_ast
time:   0.021; rss:  260MB ->  260MB (   +1MB)  misc_checking_1
time:   0.535; rss:  260MB ->  404MB ( +144MB)  coherence_checking
time:   2.494; rss:  260MB ->  471MB ( +211MB)  type_check_crate
time:   2.416; rss:  471MB ->  569MB (  +97MB)  MIR_borrow_checking
time:   0.031; rss:  569MB ->  570MB (   +1MB)  module_lints
time:   0.031; rss:  569MB ->  570MB (   +1MB)  lint_checking
time:   0.029; rss:  570MB ->  570MB (   +0MB)  privacy_checking_modules
time:   0.078; rss:  569MB ->  570MB (   +2MB)  misc_checking_3
time:   0.994; rss:  570MB ->  593MB (  +23MB)  monomorphization_collector_root_collections
time:   2.789; rss:  593MB ->  880MB ( +287MB)  monomorphization_collector_graph_walk
time:   0.416; rss:  886MB ->  912MB (  +26MB)  partition_and_assert_distinct_symbols
time:   0.000; rss:  904MB ->  905MB (   +1MB)  write_allocator_module
time:   4.823; rss:  905MB -> 1660MB ( +755MB)  codegen_to_LLVM_IR
time:   9.058; rss:  570MB -> 1660MB (+1090MB)  codegen_crate
time:  77.479; rss: 1660MB -> 1559MB ( -101MB)  LLVM_passes
time:  77.378; rss: 1296MB -> 1559MB ( +264MB)  finish_ongoing_codegen
time:   0.863; rss: 1515MB ->  779MB ( -736MB)  run_linker
time:   0.019; rss:  779MB ->  779MB (   +0MB)  link_binary_remove_temps
time:   0.895; rss: 1559MB ->  779MB ( -781MB)  link_binary
time:   0.896; rss: 1559MB ->  779MB ( -781MB)  link_crate
time:   0.896; rss: 1559MB ->  779MB ( -781MB)  link
time:  93.179; rss:   28MB ->  147MB ( +119MB)  total
    Finished `release` profile [optimized + debuginfo] target(s) in 2m 30s

This is just for the final

Engine

step, but out of 93~ seconds and change, we spent about 1 second in the linker. The rest was just codegen.

gorgeous-winter-99296

10/07/2025, 12:35 PM

I think one issue for Pants native code is that is very funnel-shaped, with a lot of heavy compilation also happening in

engine

, bottlenecking the compilation overall.

gorgeous-winter-99296

10/07/2025, 12:37 PM

(i.e., looking at

cargo build --timings

half the build time is just engine, of which the majority is codegen/optimization.)

wide-midnight-78598

10/07/2025, 12:59 PM

I need to dig into this more - but when I was testing some of this earlier in year, a 1 character change triggered a several minute build - and I thought both LLVM + link times were responsible. If not true, this is interesting

gorgeous-winter-99296

10/07/2025, 1:00 PM

So we set

codegen-units = 1

for release builds, and I get similar terrible performance on a work project if I do that. Just changing that to

codegen-units = 16

and instead enabling thin LTO shaves a minute from the compile time. I'd imagine splitting engine into 2-3 crates would have a similar effect.

wide-midnight-78598

10/07/2025, 1:02 PM

So, codegen-units=1 is one of the classic "speeds up performance" things, if I'm not mistaken

gorgeous-winter-99296

10/07/2025, 1:02 PM

Oh, and since Pants always builds in release and release doesn't do incremental builds by default, so that's likely a cause as well in the example you gave

wide-midnight-78598

10/07/2025, 1:02 PM

Yeah, I'm specifically talking about release builds here, debug is much less of an issue

wide-midnight-78598

10/07/2025, 2:09 PM

This is really interesting stuff, as I wonder how true this is after all the call-by-name. As in, was startup time the big problem, or was it "everything" being substantially slower

Copy code

We default to compiling with Rust's release mode, instead of its debug mode, because this makes Pants substantially faster. However, this results in the compile taking 5-10x longer.

If you are okay with Pants running much slower when iterating, set the environment variable MODE=debug and rerun pants to compile in debug mode.

wide-midnight-78598

10/07/2025, 2:17 PM

Spot on, as usual, Tom. Swapped the codegen units to default, and my incremental release build is like 4-5x faster. Link time was negligible

wide-midnight-78598

10/07/2025, 2:23 PM

There also exists a world soon where we can remove that too - trivial incremental debug build without that is 2 seconds

Copy code

[profile.dev]
# Increase the optimization level of the dev profile slightly, as otherwise `rule_graph`
# solving takes prohibitively long.
opt-level = 1

gorgeous-winter-99296

10/07/2025, 2:43 PM

if it isn't already, we could move the rule graph to a separate crate that always compiles at max optimization, either way

wide-midnight-78598

10/07/2025, 2:45 PM

True - but I think with call-by-name, the goal is to remove it entirely (or mostly, or slightly, or whatever)

9 Views

Open in Slack

Previous Next