We have tests that depend on big files. We do not ...
# general
c
We have tests that depend on big files. We do not want those files in our git repo, so they are downloaded using a script. What would be the best way to manage files like these for cicd with pants? What I'm imagining is starting each test with checking if the files are present, downloaded them if not, and then hope they'll be in the cache next time.
b
THere's a few options: 1. If the files are public, one option would be to download them via
file(..., source=http_source(...))
. For instance, https://github.com/pantsbuild/pants/blob/6cbdd071dae6bb2d372322bff79ce7c4a8c872e2/build-support/bin/act/BUILD#L8-L16. (The
per_platform
wrapper isn't required if they're not platform-dependent) 2. I think, as of 2.16, there's ways to hook into non-public file sources (like s3, with credentials): https://github.com/pantsbuild/pants/blob/main/src/python/pants/notes/2.16.x.md#new-aws-s3-support-for-urls 3. alternatively, you can create your own "do whatever" downloader using
shell_command
, to invoke curl or whatever. (https://www.pantsbuild.org/docs/run-shell-commands) All of these will likely work better than having the test itself download: if the test is downloading as the part of a fixture or during initialisation, that's not surfaced to pants, and so not cached at a pants level (unless you have it download to a shared well-known location, escaping the sandbox, and handling the concurrent issues that entails). Does that help?
👀 1
âž• 3
b
A bit more detail on 2, it's backed by a plugin API which can change any URL pants tries to download by modifying the URL or attaching headers. Then on top of that API Pants comes with an S3 plugin ready to go. After enabling it, you'd likely use S3 urls using
http_source
from bullet 1
c
This seems to work well. Thanks!
I have a collection of files that are needed for a test. Ideally I would do something like
files(...sources=[http_source(..), http_source(...)])
, but that does not seem to be possible, because
files
does not support
http_sources.
Alternatively, I could just enumerate all my X files using many
file(..., source=http_source(..))
, but then I would have to add all X files indiviually as dependencies. Is there a way for me to make a single adressable target out of all these files? An other alternative would be zipping the files and combine
file(..., source=http_source(...)
with
shell_command
to unzip.
b
I don’t know if there’s a way to alias many targets as a single one, but you can define a variable
files_for_tests = […]
with all the deps listed out and use that like
dependencies=[*files_for_tests, …]
. I think this can be used across BUILD files by defining it as a macro (with absolute target addresses). (I’m on my phone so I can’t find the docs page about macros, but hopefully that’s enough breadcrumbs)
c
Ok, I tried macros and bunch of other things. What I landed on in the end was adding the files as dependencies to a files(...) target. That's more or less a poor person's alias. Thanks for the help:)