Hmm. I use the adhoc tooling to build a node packa...
# general
p
Hmm. I use the adhoc tooling to build a node package. In 2.16 things work fine.. in 2.17 I'm hitting "too many files open" type errors. No setting of ulimit seems to help here. Switching back alleviates this. Any thoughts on how I can debug this?
a
Shot in the dark since you say it only happens on 2.17, but have you also tried increasing the inotify watchers? I run into this problem on occassion when swapping between branches in quick succession.
p
hmm like
fs.inotify.max_user_watches
?
Negative. Same issue unfortunately.
👍 1
g
What's your hard and soft file limits? How high did you go when testing?
p
40000
w
2.17.0?
p
Yes
w
Against 2.16. ?
p
My node stuff works with 2.16. If I switch to 2.17 I see too many files. If I switch back to 2.16 I don't see the issue
w
Okay, so none of the RCs, just stable 2.16 and 2.17
p
node stuff ==
Copy code
files(
    name="package-config",
    sources=["package.json", "package-lock.json", "tsconfig.json"],
)

files(
    name="frontend-sources",
    sources=["src/**/*", "*.ts", "public/*"],
    dependencies=[],
)

system_binary(name="sh", binary_name="sh")

system_binary(
    name="node",
    binary_name="node",
    fingerprint_args=["--version"],
)

system_binary(
    name="touch",
    binary_name="touch",
    fingerprint_args=["--version"],
)

system_binary(
    name="npm",
    binary_name="npm",
    fingerprint_args=["--version"],
    fingerprint_dependencies=[":node"],
)

# Fetch the dependencies and produce a `node_modules` directory
adhoc_tool(
    name="node-modules",
    runnable=":npm",
    runnable_dependencies=[":node", ":sh", ":npm"],
    args=["ci"],
    output_dependencies=[":package-config"],
    execution_dependencies=[":package-config"],
    output_directories=["node_modules"],
    timeout=300,
)

shell_command(
    name="build",
    command="npm run pantsbuild",
    tools=["node", "sh", "npm", "touch"],
    execution_dependencies=[":node-modules", ":frontend-sources"],
    extra_env_vars=["CI=true"],
    output_directories=["dist", "build"],
    output_files=["__init__.py"],
    timeout=300,
)

experimental_wrap_as_resources(
    name="resources",
    inputs=[":build"],
)

experimental_test_shell_command(
    name="test",
    command="npm test -- --coverage",
    tools=["node", "sh", "npm"],
    execution_dependencies=[":node-modules", ":frontend-sources"],
    log_output=True,
    extra_env_vars=["CI=true"],
    timeout=300,
)

run_shell_command(
    name="start",
    command="npm start",
    runnable_dependencies=[":node", ":npm"],
    execution_dependencies=[":node-modules", ":frontend-sources"],
)
Correct stable versions non-RC
👍 1
w
I was looking at the comparison between 2.16 and 2.17, lots of stuff. I need to dig more into adhoc to see how it handles files. I vaguely recall something about symlinks or hardlinking in 2.17, though unsure if that would change anything
I just realized my node stuff isn;t running on 2.17, it’s still using 2.16. I’ll upgrade tonight and see if I can repro
p
Okay let me know. I'm not 100% sure how to make a repro here.. as it seems highly dependent on "your setup"
w
For sure. I have some small and mid-size projects, and then I think I have one “large” project I test on (like, I just grabbed some OSS repo)
Well poop… Same problem
g
Out of curiosity, how many files (approximately) do you expect to be covered in this case? Are we talking 10 vs a ulimit of 40000, or a few thousand files? Just thinking scale here. If you run this with
-ltrace
does it always happen on a specific step?
w
My midsize project is crapping out with the same error when I build node modules
@purple-plastic-57801 Heads up,the first thing I tend to do (other than using
-ldebug
is to figure out which rc/dev build introduced the problem, and that’s usually enough to figure out the root cause
2.17.0.dev0 works, 2.17.0.dev1 fails
@witty-crayon-22786 Would any of that hardlinking stuff you added/removed from 2.17 have any impact on this problem ? https://github.com/pantsbuild/pants/compare/release_2.17.0.dev0...release_2.17.0.dev1
Alternatively, the parallelizing materializations?
w
None of those should, no.
If you're able to report the output of
lsof -p $pid
for the
pantsd
process around the time you see the error, that might help us see which file handles it is holding
w
This is right after the fail
w
Hm. That looks like a pretty vanilla set.
w
Copy code
16:24:09.34 [INFO] Completed: Running the `adhoc_tool` at frontend/web:node-modules
16:24:09.34 [ERROR] 1 Exception encountered:

Engine traceback:
  in `run` goal

IntrinsicError: Failed to execute: Process {
    argv: [
        "/bin/bash",
        "-c",
        "cd frontend/web && /opt/homebrew/bin/pnpm install --frozen-lockfile",
    ],
    env: {
        "PATH": "{chroot}/_runnable_dependency_shims_1ec700094d6da0d2a2f0d61fd955b4abd702f3190cb7829fb9beee4656c42c14",
        "_PANTS_SHIM_ROOT": "{chroot}",
    },
    working_directory: None,
    input_digests: InputDigests {
        complete: DirectoryDigest {
            digest: Digest {
                hash: Fingerprint<d4d387daf2f27e12bac6d951ef5a014f75320b05b68f9b9ecdf4a4805b065e62>,
                size_bytes: 249,
            },
            tree: "Some(..)",
        },
        nailgun: DirectoryDigest {
            digest: Digest {
                hash: Fingerprint<e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855>,
                size_bytes: 0,
            },
            tree: "Some(..)",
        },
        inputs: DirectoryDigest {
            digest: Digest {
                hash: Fingerprint<9873767679ce16f69603180b0efeeeb69e02ec58b8a5d6c97ac26c5a67b52d3b>,
                size_bytes: 82,
            },
            tree: "Some(..)",
        },
        immutable_inputs: {
            RelativePath(
                "_runnable_dependency_shims_1ec700094d6da0d2a2f0d61fd955b4abd702f3190cb7829fb9beee4656c42c14",
            ): DirectoryDigest {
                digest: Digest {
                    hash: Fingerprint<1ec700094d6da0d2a2f0d61fd955b4abd702f3190cb7829fb9beee4656c42c14>,
                    size_bytes: 158,
                },
                tree: "Some(..)",
            },
        },
        use_nailgun: {},
    },
    output_files: {},
    output_directories: {
        RelativePath(
            "frontend/web/node_modules",
        ),
    },
    timeout: None,
    execution_slot_variable: None,
    concurrency_available: 0,
    description: "Running the `adhoc_tool` at frontend/web:node-modules",
    level: Info,
    append_only_caches: {},
    jdk_home: None,
    cache_scope: Successful,
    execution_environment: ProcessExecutionEnvironment {
        name: None,
        platform: Macos_arm64,
        strategy: Local,
    },
    remote_cache_speculation_delay: 0ns,
}

Failed to digest inputs: "Failed to open \"/private/var/folders/dy/q08y_dts5vd71rm99t4gc9lr0000gp/T/pants-sandbox-p1U8nu/frontend/web/node_modules/.pnpm/@typescript-eslint+eslint-plugin@6.7.3_@typescript-eslint+parser@6.7.3_eslint@8.50.0_typescript@5.2.2/node_modules/@typescript-eslint/eslint-plugin/docs/rules/no-confusing-void-expression.md\": Too many open files (os error 24)"
w
Makes me wonder whether there is some other limit that we're hitting which exhibits as a "too many open files" error.
w
Just a coincidence that it bails on the node modules installation?
Copy code
adhoc_tool(
    name="node-modules",
    runnable=":pnpm", 
    args=["install", "--frozen-lockfile"], 
    runnable_dependencies=[":node", ":sh"],
    execution_dependencies=[":build-meta"],
    output_directories=["node_modules"],
    timeout=300,
)
Possibly silly question, but when we get ridiculous paths like this which are supposed to be hardlinked,
Copy code
/private/var/folders/dy/q08y_dts5vd71rm99t4gc9lr0000gp/T/pants-sandbox-p1U8nu/frontend/web/node_modules/.pnpm/@typescript-eslint+eslint-plugin@6.7.3_@typescript-eslint+parser@6.7.3_eslint@8.50.0_typescript@5.2.2/node_modules/@typescript-eslint/eslint-plugin/docs/rules/no-confusing-void-expression.md\
Pants isnt' trying to unwrap those right? Like, we're not creating another file per hardlink or anything like that?
w
Hm, actually: I do have a theory now, based on where this is occurring.
g
Does this spin up a large amount of sockets maybe? Since they count against the file count as well
w
Failed to digest inputs
would occur when we are capturing files from disk: in 2.17, large files are no longer captured into the LMDB database, and instead are copied into the store as real files. that part is fine, but the related bit that changed is that we no longer use blocking operations (for which we have a limit coded) to capture the files: only async operations (for which there are effectively no limit).
but afaict, you would still need to be capturing on the order of ~40k “large” files (>512k)… or maybe 20k
w
Regardless of the solution, an adhoc tool node modules installation would great candidate for a regression test. Would have caught this sooner, and node_modules is definitely a stress test in terms of files count. Even worse in npm than in pnpm
g
Why would they need to be large files in this case @witty-crayon-22786? Wouldn't we be opening small files as well to read them into LMDB?
w
yes, but we always have. and the small files are still captured using blocking operations to be placed in the LMDB store
w
Are hard links captured into the store?
w
meaning that they’re subject to the limit in https://www.pantsbuild.org/docs/reference-global#rule_threads_max
@wide-midnight-78598: no… we create hardlinks for file inputs
g
Ah, so we have only asyncified the copying of large files. Wasn't clear from your first message.
👍 1
w
But all of the pnpm node_modules are hard links to the global pnpm cache
So, Pants hardlinks the hardlink in the adhoc tool?
w
@wide-midnight-78598: no, it will actually capture it into the store via a copy. later, when it needs to use it as an input elsewhere, it might hardlink it out.
👍 1
ah. actually, it looks like all codepaths now use an async operation for their initial computation of the digest… so i lied above about “only small files”.
g
w
Repro on my machine: Grab our example ad-hoc and replace the deps with what I have below. https://github.com/pantsbuild/example-adhoc/blob/main/javascript/package.json then
pants run :run-js-app
Copy code
"dependencies": {
    "algoliasearch": "^4.17.1",
    "autoprefixer": "^10.4.16",
    "cssnano": "^6.0.1",
    "date-fns": "^2.30.0",  
    "eslint": "^8.50.0",
    "eslint-config-prettier": "^9.0.0",
    "eslint-plugin-jest-dom": "^5.1.0",
    "eslint-plugin-square-svelte-store": "^1.0.0",
    "eslint-plugin-svelte": "^2.33.2",
    "eslint-plugin-testing-library": "^6.0.2",
    "firebase": "^9.22.1",
    "hammerjs": "^2.0.8",
    "happy-dom": "^12.2.1",
    "immer": "^10.0.2",
    "msw": "^1.3.1",
    "postcss": "^8.4.31",
    "es-leftpad": "^1.0.0",
    "prettier": "^3.0.3",
    "prettier-plugin-organize-imports": "^3.2.3",
    "prettier-plugin-svelte": "^3.0.3",
    "prettier-plugin-tailwindcss": "^0.5.4",
    "svelte": "^4.2.1",
    "svelte-check": "^3.5.2",
    "svelte-i18n": "^3.7.4",
    "tailwindcss": "^3.3.3",
    "ts-node": "^10.9.1",
    "tslib": "^2.6.2",
    "typescript": "^5.2.2",
    "vite": "^4.4.9",
    "vitest": "^0.34.6"
w
looks like we’ve had a ticket for this one for a while: https://github.com/pantsbuild/pants/issues/19765 … i’ll get a patch out.
👍 2
♥️ 2
p
Thanks all 😃
w
Updated the ticket with this info above, in case we need repros
❤️ 1
p
I can confirm that the change in 2.17.1rc2 addresses this issue. Thanks folx!
❤️ 2
f
Oh no! I just got the
Too many open files
error. Trying to build a frontend app with Parcel using the experimental Javascript backend…
It works if I run the package command with
--no-pantsd
though!
p
Against the more recent RCs?
f
I just updated to 2.17.1rc2, but I can try something even newer!
No, still fails with
Too many open files (os error 24)
on 2.18.0rc3.
It started failing after I added Axios as a dependency to my package.json. Before that, it had been working fine.
g
What is your
ulimit -n
?
f
256.
g
Oh yeah, the limit added to the code was 1024? 😄
f
Ah, just saw this:
16:10:46.24 [WARN] File handle limit is capped to: 1024. To avoid 'too many open file handle' errors, we recommend a limit of at least 10000: please see <https://www.pantsbuild.org/docs/troubleshooting#too-many-open-files-error> for more information.
. Guess I’m raising the limit then!
Aaaand now it works! Thank you 🙌🙌🙌
Using pantsbuild for Javascript… Delightful