Hey folks I desperately want to contribute to the core of Pa Pants #development

Hey folks — I desperately want to contribute to th...

gentle-flower-25372

06/02/2025, 2:24 PM

Hey folks — I desperately want to contribute to the core of Pants, but I’ll be honest: I feel totally lost every time I dive into the codebase. I've been writing and contributing to source code for over 10 years, but something about the layout and structure of pants just doesn't click for me yet. I've read through the internal architecture docs (super helpful!) but when I try to do something specific, I can't seem to map what I want to do to where I should do it or how the pieces fit together. Was anyone else in this position when they first started contributing to Pants core? Any advice, mental models, or resources that helped you go from “lost” to “contributing meaningfully”? I’d really appreciate it!

wide-midnight-78598

06/02/2025, 3:15 PM

I guess it depends on what part of the codebase you'd like to contribute to. There's a bit of a line between the Rust code, and the Python code. Are you lost in some particular section? Or some particular task? I found the simplest way to start was to add a linter or something simple I needed - and just doing that unlocked a decent chunk of knowledge there.

👀 1

wide-midnight-78598

06/02/2025, 3:15 PM

Like, if you're diving into the Rust code - then yeah, that might take a while - but the python code seems a bit more... umm.. "structured"?

gentle-flower-25372

06/02/2025, 3:16 PM

I ended up using chatgpt to help explain it to me and had it write a simple version of the rule engine so I could better grok it. It turns out that helped a lot. That said, there are still things that I'm like wtf, I'm lost how this all works 😕

wide-midnight-78598

06/02/2025, 3:16 PM

Well, hopefully a lot of the complicated parts of the rule engine (scheduler) will be going away in the mid-future

👍 1

wide-midnight-78598

06/02/2025, 3:23 PM

That said, there are still things that I'm like wtf, I'm lost how this all works

Such as?

👀 1

fresh-mechanic-68429

06/02/2025, 5:41 PM

I big part of what helped me was writing custom plugins first. You can incrementally build knowledge of the internals that way, and use the existing core code as a reference to guide your implementation

👀 1

fresh-mechanic-68429

06/02/2025, 5:42 PM

plugin guide https://www.pantsbuild.org/stable/docs/writing-plugins/overview

👀 1

gentle-flower-25372

06/02/2025, 5:46 PM

For example, I've observed a performance issues with the following rules: • "Find all targets in the project" ~11s • "Map all targets to their dependents" ~41s I want to focus on "Map all targets to their dependents" which is currently written in python and located in

src/python/pants/backend/project_info/dependents.py:43

How would I go about converting this to rust? What is the interface between python code and rust code?

gentle-flower-25372

06/02/2025, 5:47 PM

Also, to be clear when I look at the function body I'm like wtf... lol

wide-midnight-78598

06/02/2025, 5:51 PM

Performance is a tricky one, because the classic "rewrite it in rust" doesn't "necessarily" work great. For some of the introspection tools, the bulk of the time isn't chewing through Python code - but it's the act of the rule itself running on the repo.

wide-midnight-78598

06/02/2025, 5:52 PM

I'd be curious where the time comes from in your repo though - that would be a great dive through the code

👀 1

gentle-flower-25372

06/02/2025, 5:52 PM

I'm having a hard time understanding if the nested function args are being called synchronously or async too. This one seems complex, but I'm not sure.

gentle-flower-25372

06/02/2025, 5:52 PM

I'd be curious where the time comes from in your repo though - that would be a great dive through the code

Any time you want to pair I can show you. Source is proprietary, but I'd love to provide stats from some command or something or a fork of the pants core.

gentle-flower-25372

06/02/2025, 5:53 PM

This is the full context: https://pantsbuild.slack.com/archives/C046T6T9U/p1740007055261349

wide-midnight-78598

06/02/2025, 5:53 PM

So, when you see the

await

calls (either using the older

Get

syntax, or the newer call-by-name) - that code is essentially fired off to the scheduler - so re-writing that file might not actually change anything. I'm not saying that for certain, to be clear, but even just timing through the dependents function would give a quick clue if there is any hotspot - or if it's all from the await calls

👀 1

fresh-mechanic-68429

06/02/2025, 5:55 PM

tdyas' new otel plugin would probably be worth a spin here. My own optimization efforts are guided by our (internal) otel plugin. https://pantsbuild.slack.com/archives/C046T6T9U/p1748552331102879

👀 1

happy-kitchen-89482

06/03/2025, 3:20 PM

I appreciate the desire to contribute! Yeah, the engine code is a bit of a beast. The interplay between Rust and Python is non-trivial to understand.

happy-kitchen-89482

06/03/2025, 3:21 PM

I agree that perf work should be driven by CPU profiles that prove the hypothesis, but that said it would not surprise me if Python is slowing down build graph computation…

gentle-flower-25372

06/03/2025, 3:21 PM

Yeah, it's almost 45s in our monorepo. We only have about 15K targets total. It's a lot, but not crazy.

happy-kitchen-89482

06/03/2025, 3:28 PM

So converting “Map all targets to their dependents” to Rust might make sense

gentle-flower-25372

06/03/2025, 3:30 PM

I would love you forever 😉

gentle-flower-25372

06/03/2025, 3:30 PM

It's one of those things I want to do, but frankly am beyond lost at the integration between rust and python and I've also never developed in rust.

happy-kitchen-89482

06/03/2025, 3:30 PM

You’d implement it an as “intrinsic” (https://github.com/pantsbuild/pants/blob/bfb0b19d3cb83ac9763cf704c299e271696c971e/src/python/pants/engine/intrinsics.py and https://github.com/pantsbuild/pants/blob/ef5a00048f6b94cfcc7831f004ab601f8ac734db/src/rust/engine/src/intrinsics/mod.rs)

happy-kitchen-89482

06/03/2025, 3:31 PM

Ah, well, learning Rust would be a prerequisite 🙂

💯 1

wide-midnight-78598

06/03/2025, 3:59 PM

For the map-all part, I assume the bulk of hte time is spent on this part of the call? Can you grab a count of how many targets you have, so we can see how many tasks are spun up?

Copy code

@rule(desc="Map all targets to their dependents", level=LogLevel.DEBUG)
async def map_addresses_to_dependents(all_targets: AllUnexpandedTargets) -> AddressToDependents:
    dependencies_per_target = await concurrently(
        resolve_dependencies(
            DependenciesRequest(
                tgt.get(Dependencies), should_traverse_deps_predicate=AlwaysTraverseDeps()
            ),
            **implicitly(),
        )
        for tgt in all_targets
    )

gentle-flower-25372

06/03/2025, 4:12 PM

• Target Counts ◦ docker_image: 103 ◦ python_requirement: 1650 ◦ python_source: 11992 ◦ python_test: 2594

wide-midnight-78598

06/03/2025, 4:13 PM

That's in the

all_targets

variable in that function?

gentle-flower-25372

06/03/2025, 4:14 PM

pants list //::

how I would count all targets?

wide-midnight-78598

06/03/2025, 4:15 PM

Oh, no, it would be just debugging/printing it in that function on the pants

main

branch - to maybe identify in more depth what's happening there. If it's launching 15k tasks, that would be something...

gentle-flower-25372

06/03/2025, 4:32 PM

ahh,, I see what you mean.

gentle-flower-25372

06/03/2025, 4:34 PM

@wide-midnight-78598 you mean something like this, right?

wide-midnight-78598

06/03/2025, 4:35 PM

Yeah essentially, and timing between the start and end of that call . Without knowing better, I'd say the bulk of the duration of the call would be there - and that could give us some idea whether making that an intrinsic could help

👀 1

gentle-flower-25372

06/03/2025, 4:48 PM

I hate Claude modify the function to get some benchmark results:

Copy code

@rule(desc="Map all targets to their dependents", level=<http://LogLevel.INFO|LogLevel.INFO>)
async def map_addresses_to_dependents(all_targets: AllUnexpandedTargets) -> AddressToDependents:
    # Start comprehensive benchmarking
    start_time = time.perf_counter()
    tracemalloc.start()

    num_targets = len(all_targets)
    print(f"🔍 BENCHMARK: Starting dependents mapping for {num_targets:,} targets")
    print(f"📅 Start time: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}")

    # Phase 1: Dependency Resolution
    deps_start = time.perf_counter()
    print(f"⚡ Phase 1: Starting concurrent dependency resolution...")

    dependencies_per_target = await concurrently(
        resolve_dependencies(
            DependenciesRequest(
                tgt.get(Dependencies), should_traverse_deps_predicate=AlwaysTraverseDeps()
            ),
            **implicitly(),
        )
        for tgt in all_targets
    )

    deps_end = time.perf_counter()
    deps_duration = deps_end - deps_start
    print(f"✅ Phase 1 complete: Dependency resolution took {deps_duration:.3f}s ({deps_duration/num_targets*1000:.2f}ms per target)")

    # Phase 2: Build dependents mapping
    mapping_start = time.perf_counter()
    print(f"🗺️  Phase 2: Building dependents mapping...")

    address_to_dependents = defaultdict(set)
    total_dependencies = 0
    max_deps_per_target = 0
    target_with_most_deps = None

    for tgt, dependencies in zip(all_targets, dependencies_per_target):
        deps_count = len(dependencies)
        total_dependencies += deps_count

        if deps_count > max_deps_per_target:
            max_deps_per_target = deps_count
            target_with_most_deps = tgt.address

        for dependency in dependencies:
            address_to_dependents[dependency].add(tgt.address)

    mapping_end = time.perf_counter()
    mapping_duration = mapping_end - mapping_start

    # Phase 3: Create frozen data structures
    freeze_start = time.perf_counter()
    print(f"🧊 Phase 3: Creating immutable data structures...")

    result = AddressToDependents(
        FrozenDict(
            {
                addr: FrozenOrderedSet(dependents)
                for addr, dependents in address_to_dependents.items()
            }
        )
    )

    freeze_end = time.perf_counter()
    freeze_duration = freeze_end - freeze_start

    # Final benchmarking and statistics
    total_time = time.perf_counter() - start_time
    current, peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()

    # Calculate statistics
    avg_deps_per_target = total_dependencies / num_targets if num_targets > 0 else 0
    num_unique_dependencies = len(address_to_dependents)
    avg_dependents_per_dep = sum(len(deps) for deps in address_to_dependents.values()) / num_unique_dependencies if num_unique_dependencies > 0 else 0

    # Print comprehensive benchmark results
    print(f"""
📊 BENCHMARK RESULTS - Dependents Mapping Complete
{'='*60}
⏱️  TIMING:
   • Total time: {total_time:.3f}s
   • Dependency resolution: {deps_duration:.3f}s ({deps_duration/total_time*100:.1f}%)
   • Mapping construction: {mapping_duration:.3f}s ({mapping_duration/total_time*100:.1f}%)
   • Data structure freezing: {freeze_duration:.3f}s ({freeze_duration/total_time*100:.1f}%)
   • Throughput: {num_targets/total_time:.1f} targets/second

💾 MEMORY:
   • Current usage: {current / 1024 / 1024:.1f} MB
   • Peak usage: {peak / 1024 / 1024:.1f} MB

📈 STATISTICS:
   • Total targets: {num_targets:,}
   • Total dependencies: {total_dependencies:,}
   • Unique dependency targets: {num_unique_dependencies:,}
   • Avg deps per target: {avg_deps_per_target:.1f}
   • Max deps per target: {max_deps_per_target:,} ({target_with_most_deps})
   • Avg dependents per dependency: {avg_dependents_per_dep:.1f}
   • Dependency graph density: {total_dependencies/(num_targets*num_targets)*100:.3f}%

🏁 Completed at: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}
{'='*60}
    """)

    return result

gentle-flower-25372

06/03/2025, 4:49 PM

I'm running now and waiting for rust to finish compiling. > Building [=======================> ] 557/558: engine

gentle-flower-25372

06/03/2025, 4:52 PM

BENCHMARK: Starting dependents mapping for 32,265 targets

gentle-flower-25372

06/03/2025, 4:52 PM

The benchmarks made the code perform horribly lol.

wide-midnight-78598

06/03/2025, 4:53 PM

Whelp, that's a lot of targets... .......

gentle-flower-25372

06/03/2025, 4:53 PM

Yep 🙂

gentle-flower-25372

06/03/2025, 4:54 PM

More than I thought but I think a lot of them are crap. I've even removed a bunch and it's still mostly the same horrible performance.

gentle-flower-25372

06/03/2025, 4:54 PM

Copy code

📊 BENCHMARK RESULTS - Dependents Mapping Complete
============================================================
⏱{fe0f}  TIMING:
   • Total time: 150.941s
   • Dependency resolution: 150.498s (99.7%)
   • Mapping construction: 0.218s (0.1%)
   • Data structure freezing: 0.225s (0.1%)
   • Throughput: 213.8 targets/second

💾 MEMORY:
   • Current usage: 337.5 MB
   • Peak usage: 455.9 MB

📈 STATISTICS:
   • Total targets: 32,265
   • Total dependencies: 133,510
   • Unique dependency targets: 29,493
   • Avg deps per target: 4.1
   • Max deps per target: 2,304 (apps/tileserver-gl/fonts:fonts)
   • Avg dependents per dependency: 4.5
   • Dependency graph density: 0.013%

🏁 Completed at: 2025-06-03 10:53:41.053
============================================================

gentle-flower-25372

06/03/2025, 4:54 PM

Those stats aren't an accurate reflection of reality. When there isn't benchmarking added it's closer to 45s.

wide-midnight-78598

06/03/2025, 4:54 PM

@happy-kitchen-89482 How does

MultiGet

work on that many at once? I can't recall - as I last looked through this almost a year ago. Are they all just queued and chewed through by however many threads we have available?

gentle-flower-25372

06/03/2025, 5:02 PM

oof, it's slow.... I added even more metrics to dig into the dependency resolution and it's bad. Roughly 250 targets per second.

gentle-flower-25372

06/03/2025, 5:05 PM

Copy code

📊 DEPENDENCY DISTRIBUTION:
   • Top 10 heaviest targets: 2304, 1964, 1863, 676, 638, 630, 624, 585, 559, 538 deps
   • Targets with >100 deps: 52
   • Targets with >500 deps: 10
   • Targets with >1000 deps: 3

gentle-flower-25372

06/03/2025, 5:05 PM

I hope these metrics help. Curious what @happy-kitchen-89482 thinks in terms of whether or not putting this in rust (or any other changes) would dramatically improve performance.

happy-kitchen-89482

06/03/2025, 9:08 PM

I am fairly certain it would, yes

gentle-flower-25372

06/03/2025, 9:09 PM

@happy-kitchen-89482 is this something that o4-high-mini model could do most of it via cursor or is this way too crazy?

gentle-flower-25372

06/03/2025, 9:11 PM

If you can give me context to share with AI regarding the handshake between rust and py, I might be able to hack on this using cursor or something.

happy-kitchen-89482

06/03/2025, 9:21 PM

I have never used it, so I have no idea

gentle-flower-25372

06/03/2025, 9:21 PM

gotcha.

happy-kitchen-89482

06/03/2025, 9:22 PM

But my guess is that you’ll need to know some stuff to get this to work, I don’t think cursor is going to be an effective shortcut out of learning how the python-rust boundary works

👍 1

happy-kitchen-89482

06/03/2025, 9:22 PM

Fortunately I think you (or AI?) can get quite far by cargo-culting existing intrinsics

✅ 1

gentle-flower-25372

06/03/2025, 9:23 PM

AI is getting quite far, I just can't evaluate it's accuracy 🙂

gentle-flower-25372

06/03/2025, 9:27 PM

@happy-kitchen-89482 you know what would be helpful if you can pass me a few PRs of other work being migrated from python to rust.

wide-midnight-78598

06/03/2025, 9:28 PM

It's great at math - that's for sure...

🤣 1

wide-midnight-78598

06/03/2025, 9:33 PM

@gentle-flower-25372 You'll probably have good luck looking under PRs for "migrate to rust" or "intrinsics". I know there was some work done around tree-sitter which moved a bunch of stuff from python to rust, which could be a good place to start too.

🙌 1

gentle-flower-25372

06/03/2025, 9:34 PM

I'll poke around and pass the context to the model and see if "we" can get something working 😉

wide-midnight-78598

06/03/2025, 9:35 PM

Additionally, there was a recent migration of options parsing from python to rust - they might not be 1:1 compatible, but it'll be something. native_engine.py or .pyi might also be places to start with the interface layer, backtracking that into the rust code

👀 1

happy-kitchen-89482

06/03/2025, 9:44 PM

Running

git log

on those two files should help you find examples

happy-kitchen-89482

06/03/2025, 9:44 PM

Well, on the entire rust module, not just the mod.rs file

happy-kitchen-89482

06/03/2025, 9:45 PM

but even without that, the thing to notice is how the python and rust code are parallel, which should make it straightforward to extend

👀 1

high-magician-46188

06/08/2025, 8:16 AM

Hi, I'm trying to improve CI time at my company and I've noticed that the following log:

[2025-06-08T08:00:53.978Z]   91.36s	Map all targets to their dependents

. It looks like we are affected by the same thing and I would like to give a hand in the effort to improve it. (I'm using Pants v2.24.2) Is there an action-item for me? (in the mean time, I'll try to also instrument it and get some stats)

happy-kitchen-89482

06/10/2025, 10:09 PM

Yep, I think this is another case of graph representation and algorithms should be reimplemented in Rust… as per https://github.com/pantsbuild/pants/issues/22393

11 Views

Open in Slack

Previous Next