I've got a bit of a harder question today: For bac...
# general
h
I've got a bit of a harder question today: For background, we're working with NASA's GMAT R2020a software. They have exposed a python API (although in a rather non-standard way) that we have been using fine with our previous venv/setup.py traditional way of managing python. When transitioning over to Pants, something inside their library has broken. The C++ calls that work fine outside of Pants are now showing segmentation faults. I know I've included everything with the
files
target so that the software distribution is available in the sandbox. I've checked this by running the
dependencies
goal and by running a
diff -r
on what was passed to the sandbox and what I have on disk and only saw differences in
__pycache__
folders. So, I think my question is more just looking for general guidance on what might be different about the two execution environments. It's a prebuilt distribution so it should have everything it needs but does rely on some system libraries (e.g.
libpng12.so
). Is it possible that these aren't discoverable when running in the sandbox environment?
h
Hey Nathanael, I recommend taking Pants out of the equation by running directly in the sandbox with the
--no-process-cleanup
flag: https://www.pantsbuild.org/docs/troubleshooting#debug-tip-inspect-the-sandbox-with---no-process-cleanup. As mentioned there, there is a
__run.sh
script that emulates what Pants is doing under-the-hood, including stripping env vars
for general guidance on what might be different about the two execution environment
The most obvious way Pants is different is that it tries to be hermetic when running things, such as stripping env vars. It might be helpful to compare something like the output of
env
in bash to the
__run.sh
script -- Is the segfault deterministic?
e
Depending on what version of Pants you're running you can invoke as `./pants --no-process-cleanup ...`and you'll see lines in Pants output like:
Copy code
09:37:50.55 [INFO] Preserving local process execution dir /tmp/process-executionD1Nekz for "[some descrition of the action ...]"
You can then ecd intoter the sandbox, here
/tmp/process-executionD1Nekz
, and use the `__run.sh`script to emulate how Pants runs the process. In general you should find the issue is missing files - as you were getting towards - or missing env vars.
👍 1
coke 1
h
Cool, yeah I had done
--no-process-cleanup
but wasn't familiar with
__run.sh
.
e
There is a bug with
__run.sh
in its emulation of how the Rust core engine actually invokes subprocesses: the script only shows environment variables set / passed through in the positive sense - it does not actively unset all others. To simulate failure you'd need to replace the
export ...
line at the top with
env -i ...
I think.
1
That wouldn't work of course 🤦 You'd need to run
__run.sh
that way:
env -i ... ./__run.sh
h
Hmm, I just get the following text output to the terminal that doesn't make sense to me
Copy code
/usr/bin/python3.8
7ec9e5c95ffcb4f4bbc26579e9c026e4f342da8ef17cfe49d5237bc1361d7335
e
Looks like you're in the wrong process execution sanbox. That looks like interpreter identification output, a path and a hash.
h
Got it. Nothing looks out of the ordinary there. I see processes for • Searching for
bash
on
PATH
and testing it • Searching for
python
and
python3
and testing them • Finding an interpreter for
CPython
• Determining imports for my script and that's it Is there more I should be seeing of running a
pex_binary
or is that it?
h
Do you know at what stage the segfault is happening? For example, when building the PEX to run, or when actually running it? Related, what goal are you running?
h
It's when actually running it. I'm using
run
to execute my module. There's a little bit of path stuff before this happens, but here's the snippet that causes things to break
Copy code
sys.path.insert(1, str(_GMAT_BIN))
import gmatpy as gmat
gmat.Setup(str(_GMAT_STARTUP))
script = path_util.get_package_root(
) / 'astranis/utils/hifi_propagator.script'
print('CHECKING SCRIPT')
# SEGFAULT happens at this call
print(gmat.LoadScript(str(script)))
print('LOADED')
The ugliness is mostly thanks to NASA
So the first call we make out to
gmat.Setup
works happily and then trying to load a script fails. I don't want to get too into the weeds on GMAT-specific debugging to not burden y'all with that.
h
Okay. So then
--no-process-cleanup
was a red herring because the
run
goal runs interactively in your repository, rather than in a temporary directory. So you won't ever see the process to inspect. Instead, you can use `./pants run --no-cleanup path/to/file.py`: https://www.pantsbuild.org/docs/reference-run#section-cleanup. The PEX will be saved to the
.pants.d
folder iirc, like
.pants.d/tmplpd86t9k/
One thing you could try is
execution_mode='venv'
on the
pex_binary
target. See https://www.pantsbuild.org/docs/reference-pex_binary#codeexecution_modecode
e
On the execution_mode bit, I'm pretty sure that only affect
./pants package
and not
./pants run
- we do torturous things in
./pants run
IIRC. So that leads to another test: @high-yak-85899 does
./pants package ...
and then running the PEX produced in
dist/...
work? If so that isolates it to our
run
chicanery.
h
Well the pex packages but, because I have to include all the gmat source code as
files
, they aren't bundled with the
pex
and that causes other file discovery issues when executing the built
pex
But we do have a little bit of progress. Previously, I was including my third party directory with the
files
target and then finding it within the sandbox. When I point to it with an absolute path where it actually lives on disk, things run without errors happily.
So, I might be able to get around this for now by throwing it in something like
/home/<user>/tools
and point to it that way with an absolute path.
So seems like maybe I'm not getting something included when I pull everything over with the
files
target which would be surprising.
e
Files targets not being included in PEXes still seems like a bug to me, we should support that. Have you tried including using
resources
instead?
h
Yeah, I'm kind of confused how I would package anything that didn't rely on python source files (or similarly generated files). Hadn't tried resources. The above strategy works whether it's packaged first and run or just run directly with the run goal.
e
The files / resources distinction is only that resources have the enclosing source root stripped. So for a python file at src/python/package/module.py, as a resource, its materialized in snaboxes and PEXes at package/module.py (the src/python being the source root here). As a file, its materialized as-is, i.e: at src/puython/package/module.py.
👍 1
The above strategy works whether it's packaged first and run or just run directly with the run goal.
@high-yak-85899 does that mean switching to
resources
solved your issue?
h
No, sorry, that was ambiguous. I meant that, when I referenced the files where they live on disk, I could
run
it or
package
it and execute the pex. Swapping between files and resources doesn't seem to change things.
e
So, paying more attention now, you integrate this code by vendoring one of those SourceForge tarballs (exploded?) it into your source tree?
h
Yes, but it's not actually checked into our repo. It just lives along side it as part of a bootstrap process. So it is an acceptable solution for us to move it somewhere equally discoverable on all machines (we use
~/tools
similarly for some other purposes) and not attempt to package it up.
I'm mostly just curious at this point if somehow
files
isn't grabbing everything even though it seems like it is.
e
Generally Pants tries hard to not support any files outside the repo root, so it seems likely to me that's at root here, and instead of failing loudly we fail silently. But that's at broad brush.
This thread may be relevant. Different library, but similar in distribution style (it isn't, you must build it which installs `.so`s and generates a Python distribution): https://pantsbuild.slack.com/archives/C046T6T9U/p1641913522157500
So GMAT seems mainly java? I only find a small number of python files in the main tgz from SourceForge:
Copy code
$ find . -name "*.py"
./userfunctions/python/AttitudeTypes.py
./userfunctions/python/SimpleSockets.py
./userfunctions/python/AttitudeInterface.py
./userfunctions/python/StringFunctions.py
./userfunctions/python/socket-test-drivers/AttitudeTypes.py
./userfunctions/python/socket-test-drivers/SimpleSockets.py
./userfunctions/python/socket-test-drivers/gmat-sync-mquat.py
./userfunctions/python/socket-test-drivers/AttitudeInterface.py
./userfunctions/python/socket-test-drivers/Test-mjd.py
./userfunctions/python/socket-test-drivers/gmat-sync-mjd.py
./userfunctions/python/socket-test-drivers/Cosmos180-mjd.py
./userfunctions/python/MathFunctions.py
./userfunctions/python/ArrayFunctions.py
./bin/gmatpy/gmat_py.py
./bin/gmatpy/navigation_py.py
./bin/gmatpy/__init__.py
./bin/gmatpy/station_py.py
./api/Ex_R2020a_CompleteForceModel.py
./api/Ex_R2020a_RangeMeasurement.py
./api/BuildApiStartupFile.py
./api/Ex_R2020a_FindTheMoon.py
./api/Ex_R2020a_BasicFM.py
./api/Ex_R2020a_BasicForceModel.py
./api/load_gmat.py
./api/Ex_R2020a_PropagationLoop.py
./api/Ex_R2020a_PropagationStep.py
./utilities/python/GMATDataFileManager.py
./utilities/python/ochReader.py
./utilities/python/missionInterface.py
./utilities/python/testDriver.py
./utilities/python/segment.py
./utilities/python/ochWriter.py
h
Yes, the python is mostly just calling out to C++ or whatever other languages. The primary API entrypoint we are using is seen there in
bin/gmatpy/gmat_py.py
Well, there has to be support to some extent for things outside of a repo. For instance, the python interpreter used isn't, by default, packaged hermetically with what is distributed. So there's some precedent for expecting certain things are available outside of the repo or what is packaged up.
e
That much is true.
Ok, I looked at this in more detail. Assuming your code does something like
from gmat_py import gmat_py
then this should work:
Copy code
Relevant repo subtree:
---
3rdparty/GMAT/R2020a/bin
    BUILD
    gmat_py/__init__.py
    gmat_py/gmat_py.py
    gmat_py/_gmat_py.so

3rdparty/GMAT/R2020a/BUILD:
---
resources(
    name="so",
    sources="**/*.so",
)

python_sources(
    sources="**/*.py",
    dependencies=[
        ":so",
    ]
)

pants.toml:
---
[source]
root_patterns.add = ["/3rdparty/GMAT/R2020a/bin"]
That should allow the vendored python code + .so to be included in your PEX, unit tests to work that use this code, etc.
Minus the way I've done the targets and pants.toml config - is this roughly what you were trying?
I'm guessing maybe not - your description of files outside the repo as part of bootstrap. That's good since a checked in .so like I've shown only works if your fleet of machines is uniform, which is unlikely when you mix developers in. So, in that case, it seems like gmat_py needs to be treated like a JVM-style "provided" dependency; i.e. expect it to be pre-installed on the system and don't try to find it or package it. So I think this was all just me catching up to you. PEXes support
PEX_EXTRA_SYS_PATH=dir1:dir2
for this sort of thing. Is that useful?
There's also
PEX_INHERIT_PATH=prefer|fallback
if the provided deps are expected to be found in the site-packages of the system interpreter running the PEX. See
pex --help-variables
for docs on this or see: https://pex.readthedocs.io/en/v2.1.63/api/vars.html
h
Yeah that's somewhat similar to what I had. I'll try to work that in.
Could quite get it to work. There are
.so
files (and
.so.R2020a
files) elsewhere in that directory that need to be loaded in. What I ended up with was
Copy code
Relevant repo subtree:
---
3rdparty/GMAT
    BUILD
3rdparty/GMAT/R2020a/bin
    BUILD
    gmat_py/__init__.py
    gmat_py/gmat_py.py
    gmat_py/_gmat_py.so

3rdparty/GMAT/BUILD:
---
resources(
    name = "so",
    sources = [
        "**/*.so.*",
        "**/*.so",
    ],
)

3rdparty/GMAT/R2020a/BUILD:
---
python_sources(
    sources=["**/*.py"],
    dependencies=[
        "//3rdpart/GMAT:so",
    ]
)

pants.toml:
---
[source]
root_patterns.add = ["/3rdparty/GMAT/R2020a/bin"]
Then, I'm able to
from gmatpy import gmat_py as gmat
just fine, but when I got to the
LoadScript
call, I'm back to a segfault. So, for now, I think I'll stick with storing on system for the few cases this is needed and move on to other migration issues we've had.
Definitely appreciate all the support effort, though! Much more help than I was expecting with this shot in the dark.
❤️ 1