We have tests that depend on big files We do not want those Pants #general

We have tests that depend on big files. We do not ...

clean-alligator-41449

06/23/2023, 12:24 PM

We have tests that depend on big files. We do not want those files in our git repo, so they are downloaded using a script. What would be the best way to manage files like these for cicd with pants? What I'm imagining is starting each test with checking if the files are present, downloaded them if not, and then hope they'll be in the cache next time.

broad-processor-92400

06/23/2023, 12:32 PM

THere's a few options: 1. If the files are public, one option would be to download them via

file(..., source=http_source(...))

. For instance, https://github.com/pantsbuild/pants/blob/6cbdd071dae6bb2d372322bff79ce7c4a8c872e2/build-support/bin/act/BUILD#L8-L16. (The

per_platform

wrapper isn't required if they're not platform-dependent) 2. I think, as of 2.16, there's ways to hook into non-public file sources (like s3, with credentials): https://github.com/pantsbuild/pants/blob/main/src/python/pants/notes/2.16.x.md#new-aws-s3-support-for-urls 3. alternatively, you can create your own "do whatever" downloader using

shell_command

, to invoke curl or whatever. (https://www.pantsbuild.org/docs/run-shell-commands) All of these will likely work better than having the test itself download: if the test is downloading as the part of a fixture or during initialisation, that's not surfaced to pants, and so not cached at a pants level (unless you have it download to a shared well-known location, escaping the sandbox, and handling the concurrent issues that entails). Does that help?

👀 1

➕ 3

bitter-ability-32190

06/23/2023, 12:59 PM

A bit more detail on 2, it's backed by a plugin API which can change any URL pants tries to download by modifying the URL or attaching headers. Then on top of that API Pants comes with an S3 plugin ready to go. After enabling it, you'd likely use S3 urls using

http_source

from bullet 1

clean-alligator-41449

07/21/2023, 7:56 AM

This seems to work well. Thanks!

clean-alligator-41449

07/21/2023, 3:57 PM

I have a collection of files that are needed for a test. Ideally I would do something like

files(...sources=[http_source(..), http_source(...)])

, but that does not seem to be possible, because

files

does not support

http_sources.

Alternatively, I could just enumerate all my X files using many

file(..., source=http_source(..))

, but then I would have to add all X files indiviually as dependencies. Is there a way for me to make a single adressable target out of all these files? An other alternative would be zipping the files and combine

file(..., source=http_source(...)

with

shell_command

to unzip.

broad-processor-92400

07/21/2023, 10:00 PM

I don’t know if there’s a way to alias many targets as a single one, but you can define a variable

files_for_tests = […]

with all the deps listed out and use that like

dependencies=[*files_for_tests, …]

. I think this can be used across BUILD files by defining it as a macro (with absolute target addresses). (I’m on my phone so I can’t find the docs page about macros, but hopefully that’s enough breadcrumbs)

clean-alligator-41449

07/24/2023, 12:08 PM

Ok, I tried macros and bunch of other things. What I landed on in the end was adding the files as dependencies to a files(...) target. That's more or less a poor person's alias. Thanks for the help:)

2 Views

Open in Slack

Previous Next