OK, I need someone to help me out here: We current...
# development
h
OK, I need someone to help me out here: We currently build the native engine on linux in the bootstrap job when building the pex directly in the travis container, and in the “build linux native engine job”, where we build wheels in a docker container inside the travis container, presumably for maximum hermeticity and compatibility. So it seems to me that to consolidate these down to one build, that build needs to be in a docker container. I have attempted to run the pex build (by running
build-support/bin/ci.sh
with no arguments) in the docker container, but am hitting this error:
Copy code
info: installing component 'rust-src'
info: downloading component 'clippy'
info: installing component 'clippy'
/travis/home/.cache/pants/rust/cargo/bin/cargo-ensure-installed: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /travis/home/.cache/pants/rust/cargo/bin/cargo-ensure-installed)
/travis/home/.cache/pants/rust/cargo/bin/cargo-ensure-installed: /lib64/libc.so.6: version `GLIBC_2.15' not found (required by /travis/home/.cache/pants/rust/cargo/bin/cargo-ensure-installed)
/travis/home/.cache/pants/rust/cargo/bin/cargo-ensure-installed: /lib64/libc.so.6: version `GLIBC_2.18' not found (required by /travis/home/.cache/pants/rust/cargo/bin/cargo-ensure-installed)
Failed to build native engine.
+ die 'Failed to bootstrap pants.'
+ (( 1 > 0 ))
+ log '\n\x1b[31mFailed to bootstrap pants.\x1b[0m'
+ echo -e '\n\x1b[31mFailed to bootstrap pants.\x1b[0m'
Failed to bootstrap pants.
a
we can probably provide glibc as a binary tool -- checking the full ci log now
ok forget about that, i think your message below is right, providing glibc solves the wrong problem
h
I don’t understand why this fails when running
ci.sh
but not when running
release.sh -n
So providing glibc would be a bandaid, and you know how I feel about those… 😉
a
They suggests to me that the .cache/pants/rust dir is being cached across different OS versions, which we have no reason to believe should work
I'm not sure where it's being read/written between, but I think that's the crux of the problem here...
The other thing we could do is start statically linking the binaries we install with musl...
h
that caching snafu is possible, yes
why do we build in a docker image?
for maximum compatibility?
a
so things like the versions of glibc that are secretly in the binary are reproducible and don’t change
also for compatibility is the idea — if this were not to be the case we could edit the image to increase compatibility
h
It does look like it was picking up cache from the non-docker job, will nuke that and see what happens
Hmm, not enough to nuke it, it’ll pick up master’s cache instead. will have to create a new cache name
a
one last thing on the above -- i'm pretty sure i remember @witty-crayon-22786 describing the docker image as a "lowest common denominator" of sorts, so compatibility sounds right
looking through
release.sh
, it looks like the native engine bootstrapping is implicitly occurring as a result of running
./pants -q setup-py ...
in
build_and_print_packages()
in
packages.py
when we build the pantsbuild.pants package via
build_pants_packages()
in
release.sh
? is it being bootstrapped somewhere else first?
h
I think it’s happening in the
./pants --version
call here: https://github.com/pantsbuild/pants/blob/master/.travis.yml#L196
a
thanks!
h
So now with the new caches the engine builds: https://travis-ci.org/pantsbuild/pants/jobs/476054023
But pants errors out on
Copy code
19:09:30 00:10       [pythonstyle]
                     /travis/workdir/build-support/pants_dev_deps.venv/bin/python2.7: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
a
i may have seen that before, trying to remember when
oh, if this is pythonstyle, it may be related to a recent change
https://github.com/pantsbuild/pants/pull/7013 and the PR title wasn't edited before it was merged
we may have to add an entry to
LD_LIBRARY_PATH
i'm checking that out now
h
hmm could be
although why are we running pythonstyle here at all is also a good question…
👍 1
let me check if I have your change in my branch
a
that is a better question
h
I do have your change in my branch
maybe that’s what’s causing this?
Or maybe it’s just coincidence
a
that's exactly what i was thinking
could try reverting and running again
it seems correct that we shouldn't be running pythonstyle there but this also seems like something to correct, maybe
reverting that commit and trying travis again will answer that question, i can try that on my fork
h
I’ll do it
I’m already neck deep in this change
a
ok, thanks. i will take a look at why pythonstyle is running there and whether we can kill it off to unblock here in case that doesn't bear fruit
oh it's totally clear why we're running lint there
Copy code
bootstrap_compile_args=(
  lint.python-eval
  --transitive
)
i don't know if we need or want that for the bootstrap phase given that we have a separate lint shard now, looking in git history
(i get a separate error on my local machine running pythonstyle without that commit, which was the reason for it in the first place, but that's irrelevant to the current thread)
i think it actually may be correct to remove that, even if the
libpython.so.1.0
error also needs to be fixed -- looking at
5ad90920f5b282f28a3f66ee15b44599836b8cea
from 2015 which introduced linting for the bootstrapped binary in
ci.sh
john notes that it adds ~2 minutes to each shard, which sounds like the thing we're trying to overcome here. not sure if i'm jumping to conclusions
Copy code
The python-eval checks add ~2 minutes to each ci shard, so ci is
restructured to be more fine grained in order to keep ci times ~6
minutes per shard.
a simpler time
but i can make an issue regarding the
libpython.so.1.0
error, especially if it shows up in non-bootstrap shards, or perhaps just fix it today
afk for an hour but i'm thinking we can kill
bootstrap_compile_args
entirely, either in a separate or the same PR (since we have a special lint shard now), and i will make an issue or PR about pythonstyle failing in the docker image
h
Yeah, I don’t think we need to lint here any more.
But I reverted your change and it looks like it gets past that now
Removing the linting will save 20 seconds or so, but I can deal with that later
So yes, your change was the issue, unfortunately 😞
a
yeah it was too hasty and i didn’t wait long enough for review because i couldn’t run the pre-commit hook without it. it’s fixable
i’m fine reverting it