hey <@UB2J9BQA0> on <this> ticket I just want to m...
# development
w
hey @hundreds-father-404 on this ticket I just want to make sure I understand it right. The current
python_app
target when using
bundle
it can get files from a different folder by using
rel_path
right? I mean, that’s basically what I’m doing… my BUILD file is inside
projects/projectA
and I need to have files under
libs/3rdparty/mydata
to be packaged together.
I guess what I’m trying to check is.. I wouldnt necessarily need that
rel_path
as long as the new target
archive
can reproduce the
files
structure.. I’m not really changing the dest folder name. That’s probably a good idea to implement using maybe what @witty-crayon-22786 mentioned.. about those intermediate steps…
one other thing that I already asked and it still gives me a hard time is the difference between
files
and
resources
and when to use on over the other. Maybe if we could simplify this…
😕 1
I’ll give you an example, I have
projects/projectA/BUILD
that defines
resources
and two of those resources lives under folders..:
projects/projectA/data/
projects/projectA/deploy/
by defining the
resources
target at the source root of projectA and pointing to
data/***/**
I expect (and imagine) that the result will be that the folder structure
data
and whatever is inside of it are kept. I’m not sure if that’s the “right” way of doing this.. Idk if I should have a BUILD file inside
data
that defines that resource target or not.. but that’s kinda how I managed to solve my issue. Again, it was trial and error.
h
as long as the new target archive can reproduce the files structure.
Yes, this is the default behavior. It would copy the path exactly as it is.
libs/3rdparty/mydata
would keep that same path That PR is about a feature to allow you to relocate the path at runtime, so that
libs/3rdparty/mydata
can become something you want like
my_custom_prefix/mydata
. This is helpful if you, for example, want to convert
src/python/myorg/myteam/project/pyproject.toml
to simply be
pyproject.toml
in your final
archive
or in your test
w
Oh so that’s the tricky part (at least to me) it would recreate the full path? I sort of expected it to only recreate the path FROM where my BUILD file lives.
I guess it wouldnt if it was a
resource
but it does if it’s a
file
h
one other thing that I already asked and it still gives me a hard time is the difference between files and resources and when to use on over the other
Sorry about the confusion. I started drafting a page explaining how to use both features, but was waiting to finish this new PR until then For
files
, you will always have the full path as it is in your actual project layout. In contrast,
resources
are loaded like you would load normal code in your language. If you have the file
src/python/myorg/f.py
, then you would say
import myorg.f
, not
import src.python.myorg.f
. It’s the same with
resources
, you would say
pkg_resources.get_data("my_<http://org.my|org.my>_data")
, rather than
pkg_resources.get_data("<http://src.python.my|src.python.my>_<http://org.my|org.my>_data")
Oh so that’s the tricky part (at least to me) it would recreate the full path?
Ah, oops. By full path, I mean the relative path from the build root. It won’t be an absolute path
w
oh haha yeah I got that part lol
So what you’re trying to solve there is the ability to change the prefix dest folder inside that archive (and possibly tests)
h
Yes, precisely. That you want to rewrite
src/python/myorg/myteam/project/pyproject.toml
to
pyproject.toml
, but have Pants do it for you, rather than needing to do it in your test code or modifying the
archive
that Pants had created for you
If you’re okay with using the normal path, as it is in your actual project structure, then no need for this feature
w
haha knowing me if it’s there I’ll probably use it but yeah that sounds like a feature that should be there
❤️ 1
So I guess it would be a “cleaner” look if I could add to my
files=[]
either
files
or
relocated_files
no? I’m trying to simplify it from a user standpoint.
👍 1
Still can’t Rust so forgive me for just trowing ideas and not coding anything 😢
h
Still can’t Rust
Oh, about 80% of our contributions are in typed Python 3 btw! We only use Rust when changing the internals of the engine, like optimizing how the engine reads from the filesystem. That all gets abstracted away though; where all the logic of Pants gets written is in Python
so forgive me
And not at all, this type of feedback is extremely useful. There are lots of ways of contributing to an open source project
👍 1
w
typed Python 3
that I can do
and I totally agree with @witty-crayon-22786 it shouldnt be a
per-consumer
solution lol, done that in my startup before and didnt go well on the long term
👍 1
😂 1
having those intermediate steps would allow me to use them normally in any other target that can have
files
and
resources
as dependencies? If so it can be very interesting
h
To recap, there are three proposals out there right now: 1.
files()
target has a
sources_prefix_mapping
field. This is the current PR. But an issue is that it means if someone unintentionally updates that one
files()
target, they may end up breaking a bunch of consumers without realizing it. 2. The information lives entirely at consumers, such as your
archive()
target or your
python_tests
target. This likely has the benefit of being easier to understand because all the magical remapping is centralized to one place. But it could result in lots of duplication; it’s not composable. You can’t define a remapping and then reuse it somewhere else. 3. Keep
files()
the same as before, and add a new target type called
relocated_files
. You could have multiple
relocated_files()
targets describing the same
files()
target, but mapping them in different ways. Your
archive
and
python_tests
targets could either depend on the original
files()
, or on the
relocated_files()
target
I think we’re leaning towards #3. Stu and your intuition is probably right that forcing this to be per-consumer (#2) is not very scalable I suspect John and I are correct to fear that #1 could cause bad issues where a user changes the
sources_prefix_mapping
for the original
files
target, and they break a ton of consumers without realizing it. #3 still has the risk of updating one target resulting in breaking multiple consumers, but it makes things much more explicit than #1.
💯 1
w
oh! #1 I can totally see that happening on my team 😨 folks are trying to add/remove stuff from BUILD files and aren’t even testing to see if it’s working before committing it.
w
sidenote: the “break a bunch of consumers” is another reason to want alignment between
test
and deploy: to minimize the amount of custom code that can’t be tested, because they can each expect files to live in the same place
👍 1
h
Are we cool with a
relocated_files
target mapping 1-1 with a
files
target? Vs. allowing
relocated_files
to aggregate multiple targets and having complex branching mapping, like “if this prefix, use this; otherwise, rewrite like this” That is,
Copy code
relocated_files(
  name="tgt1",
  files_target="//:json",
  from="original_prefix",
  to="new_prefix",
)

relocated_files(
  name="tgt2",
  files_target="3rdparty/json",
  from="3rdparty/json",
  to="json",
)
vs.
Copy code
relocated_files(
  files_targets=["//:json", "3rdparty/json"],
  mapping={"": "json", "3rdparty": ""},
)
Even if 1-1 results in much more verbosity, I think it’s going to be worth it.
files
and
resources
already confuses people. This could be a very confusing new feature as well. Keeping it simple will be valuable, even it results in some duplication
w
if the simple thing can grow into a more complicated thing, that should be fine.
h
if the simple thing can grow into a more complicated thing
What do you mean?
w
what i had in mind was
relocated_files(files_target=..)
(note, singular) growing into
relocated_files(files_targets=..)
(note, plural)… that’s an easy deprecation
h
I think I’m fine with starting with it being plural, so long as we only one mapping expressed in the target, via the
from
and
to
field. It’s the complex mapping in a single target that I want to avoid, e.g. the dictionary
w
the issue with your
tgt1
and
tgt2
sketch is that that also requires two
files
targets behind them, so four total. not a huge deal, but the mapping dict is more growable, because you can eventually remove the restriction(s)
h
but the mapping dict is more scalable
Yeah, but harder to reason about
w
is it? maybe, yea.
h
I think so. In my example with
{"": "json", "3rdparty": ""}
, what do you think it would do? Note that
""
matches both files, but
3rdparty
only matches one file. Should both transformations be applied? Only one of them?
w
the question of which patterns apply is valid regardless i think.
if “does 3rdparty apply” is a question, that’s a question with the tgt1/tgt2 design as well.
h
Not if we constrain you to only be able to have one
from
and one
to
by them being proper fields. For a given
relocated_files
target, there can only be a single mapping defined. And that gets applied to every single file
w
the only question (afaict) that is potentially introduced by having multiple mappings in one dict is: “are they applied in some order, or is exactly one pattern applied to a file”
And that gets applied to every single file
from
still needs to match though.
but perhaps you are suggesting that it would fail with an error if all files didn’t match?
h
I was thinking it will eagerly fail if it doesn’t match
w
my guess is that the dict will be easier to use in practice.
but no strong preference, easy to change later.
h
Gr. We can’t use
relocated_files(..., from=..)
. Python claims it as a keyword Any ideas for alternatives to
from
and
to
?
before
and
after
perhaps?
original
and
new
?
w
src/dst ?
👍 1
j
Isn't
files
and
relocated_files
targets that are OUTSIDE the python backend. And
resources
is really
python_resources
since it focuses on packaging the files similar to how
setup.py
does it?
And maybe for target
files
that are used by multiple projects, the target can manage a map that the software can use to find the absolute path to the file.
Copy code
files(
  target="ec2_settings",
  sources=["*.yaml"],
  dst="/opt/example/ec2_settings/",
   map="/opt/example/ec2_settings/map.yaml",
)
h
Yeah, this is a good way of thinking about it Raúl! In fact,
resources
used to be a field on the
python_library
target. And I think we used to have a field for
jvm_library
etc I’m going to try to write some docs on this today or tomorrow. It’s definitely a common point of confusion, understandably so.
w
@jolly-midnight-72759: good points, mostly yes. but the
resources
concept applies to a few different languages which support packaging files to be loaded in a runtime-specific way
👍🏽 1
including the JVM, node.js, python, etc
it’s stuff that goes “on the ${LANG}PATH”, i.e. PYTHONPATH, the classpath, etc
👍 1
@jolly-midnight-72759: regarding absolute paths: the way files are expected to be loaded is via a relative path from your CWD, so the whole output is relocatable. in tests things are run in a sandbox, and when deploying to prod it could be anywhere. but “relative to CWD” is fairly easy to grok and use
j
ahh. good point
I'm thinking at the CfgMgt level. Not at the software package level.
(although debs know how to put things into abs paths but they do it best when they do it via a shell script that copies from the relative path of where the deb was decompressed)
I guess it is a bad pattern to have different projects access a "map" that pants manages. That something that should probably be coordinated in the architecture of the different projects.
Copied to docker containers, or injected into zookeeper or distributed by ansible.
so nevermind.