hey guys, we're seeing some real slowness in our b...
# general
b
hey guys, we're seeing some real slowness in our build with pants:
Copy code
1044.85s 	Building requirements.pex with 21 requirements: Jinja2, SQLAlchemy==1.4.25, aio-pika, aiofile, aiosmtplib, alembic>=1.7.6, asgi-lifespan, asyncpg, fastapi==0.63.0, httpx, numpy, passlib[bcrypt], pydantic[email], pytest-lazy-fixture, python-dateutil, python-jose==3.2.0, python-multipart==0.0.5, pytz, sqlalchemy-stubs, tenacity, uvicorn[standard]==0.17.0.post1
🐌 1
b
it does eventually terminate so that leaves us the first option
So it finished?
b
not yet
but previous runs usually finish
how would I go about investigating a deadlock
would pants' logs help?
b
What version of Pants are you on?
Also what are your
[python]
settings in your TOML?
b
we're on version 2.9.0
we've got a whole bunch of things for Python
Copy code
[python-infer]                                                                                                                                                                                                       
inits = true                                                                                                                                                                                                         

[python]                                                                                                                                                                                                             
interpreter_constraints = ["CPython==3.8.*"]                                                                                                                                                                         
requirement_constraints = "constraints.txt"                                                                                                                                                                          
resolve_all_constraints = false                                                                                                                                                                                      

[python-repos]                                                                                                                                                                                                       
repos = [                                                                                                                                                                                                            
    "<https://download.pytorch.org/whl/torch_stable.html>",                                                                                                                                                            
    "<https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.8/index.html>",                                                                                                                                    
    "<https://download.openmmlab.com/mmcv/dist/cu111/torch1.8.0/index.html>",                                                                                                                                          
]                                                                                                                                                                                                                    
                                                                                                                                                                                                                     
[python-bootstrap]                                                                                                                                                                                                   
search_path = [                                                                                                                                                                                                      
    "<ASDF>",                                                                                                                                                                                                        
    "<PYENV>",                                                                                                                                                                                                       
    "<PATH>",                                                                                                                                                                                                        
]                                                                                                                                                                                                                    
                                                                                                                                                                                                                     
[pytest]                                                                                                                                                                                                             
args = ["-vv"]
also note that somehow I'm somehow seeing
Copy code
Bootstrapping Pants using /usr/bin/python3.9
would setting
verbosity = 1
in
[pex]
help?
w
to investigate a deadlock, i’d look at which processes Pants is parenting, and look for a process with PEX in the name
then use
py-spy
(https://github.com/benfred/py-spy) to see what it (and any child processes, in particular: PIP) are doing
if you have runs that are actually completing, then yes: increasing
[pex].verbosity
would help, as it will render debug output when the process completes.
3
is decently verbose.
b
the process just finished after a little over 1h
what strikes me as weird is
resolve_all_constraints = false
I'm not sure why it was added, could that have anything to do with the current situation?
w
resolve_all_constraints is generally a good idea, as that will avoid resolving redundantly
it shouldn’t affect the runtime of any single resolve though: will just mean that there is a single resolve, and then a bunch of subset-building (which doesn’t touch the network or compile anything)
b
I just switched it to
true
and on the first run so far:
244.01s Resolving constraints.txt
could that indicate that our requirements.txt and/or constraints.txt are actually garbage? 🙂
w
no: it’s just that (unfortunately), the different options of that flag will result in different process cache keys
b
which I'm interested in, we don't have cache in our pipelines
w
sure. if you can, i’d just recommend kicking off builds with each setting, and seeing what you get. in general though (if you have more than a handful of tests),
resolve_all_constraints=True
will be faster.
b
right
w
to get an idea of why the single resolve took that long, you can try the pex verbosity setting. under the hood, PEX is mostly just orchestrating PIP though, so you should expect to see lots of time spent in PIP
b
there's no way it's "just" downloading stuff
600+ seconds and going
pip run a lot faster than that
b
torch
is quite the beefy boy. I'm no longer surprised at this (albeit 1hr is execssive)
b
yeah that would explain why resolve_all_constraints is set to false
torch is only used by a subset of our repo's content
on my initial example though, it took 3629.50s on just Jinja2, SQLAlchemy==1.4.25, aio-pika, aiofile, aiosmtplib, alembic>=1.7.6, asgi-lifespan, asyncpg, fastapi==0.63.0, httpx, numpy, passlib[bcrypt], pydantic[email], pytest-lazy-fixture, python-dateutil, python-jose==3.2.0, python-multipart==0.0.5, pytz, sqlalchemy-stubs, tenacity, uvicorn[standard]==0.17.0.post1
so that isn't related to torch
to be precise, it was 2 jobs:
Copy code
17:09:05.41 [INFO] Long running tasks:
  3629.50s 	Building requirements.pex with 21 requirements: Jinja2, SQLAlchemy==1.4.25, aio-pika, aiofile, aiosmtplib, alembic>=1.7.6, asgi-lifespan, asyncpg, fastapi==0.63.0, httpx, numpy, passlib[bcrypt], pydantic[email], pytest-lazy-fixture, python-dateutil, python-jose==3.2.0, python-multipart==0.0.5, pytz, sqlalchemy-stubs, tenacity, uvicorn[standard]==0.17.0.post1
  3719.33s 	Building requirements.pex with 23 requirements: SQLAlchemy==1.4.25, alembic>=1.7.6, celery[redis]==5.2.3, cloudpickle==1.6.0, extract-msg==0.28.7, fastapi==0.63.0, func-timeout~=4.3.5, gensim, libpff-python-ratom==20200808, motor~=2.5.1, nltk==3.6.2, numpy, pg8000==1.16.6, pydantic[email], pymongo<4,~=3.12.1, python-jose==3.2.0, python-multipart==0.0.5, requests==2.21.0, scikit-learn, sqlalchemy-stubs, sse-starlette==0.7.2, tenacity, watchdog~=2.1.6
alright, it looks like one of the extra python repos we rely on is messing up with us
on a fresh environment (no cache, whatsoever) running
./pants fmt --changed-since=origin/develop
takes 45 seconds and counting on black PEX something something
without the extra repos, it terminates in a couple of seconds
1
b
Hey @bitter-orange-16997! It seems that I am having the same slowness problem as you during the build of the requirements, what do you suggest me to check?
b
in our case this was due to the extra indexes we had on certain Python packages (mostly Facebook indexes for datascience packages)
unbeknownst to us, we would get rate-limited after a while or even straight up blacklisted for a while by them
we were already using a pypi proxy so we just made sure said packages were included in it (which involved pushing them manually)
we were using Sonatype's Nexus Repository Manager for that
later on, as we're trying to move away from Nexus, we wrote a custom lightweight PyPI proxy server whith explicit rules and indexes for our packages which has worked well for us
so I suggest using a PyPI proxy which will provide a couple of good things for you:
• isolation from third-party repositories (only downtime is your own) • ability to seamlessly provide custom package overrides / patches with no upstream issues • network locality (hosting near your CI / office / whatever goes a long way)
b
but I am already using a company internal repo (artifactory) which should prox 3rdparty repos 😕
b
our custom lightweight PyPI is not opensource but it's low complexity:
Copy code
$ cloc pypi_proxy/
       7 text files.
       7 unique files.                              
       1 file ignored.

<http://github.com/AlDanial/cloc|github.com/AlDanial/cloc> v 1.86  T=0.01 s (767.0 files/s, 41529.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                           7            115              0            264
-------------------------------------------------------------------------------
SUM:                             7            115              0            264
-------------------------------------------------------------------------------
ok, then you need to make sure you are hitting it 100% of the time for all packages
check your requirements for extra indexes, your CIs and various scripts which would call pip directly and what not
b
but if I install the dependencies with pip I have no problem and it is very fast
b
does it hit your Artifactory repo though?
b
Yes, both pip and pants are configured to use the same repo.
b
then there might be delays with PEX during initialization of pants or aggregating your dependencies
in that case, I'm not knowledgeable and the pants team should provide be able to provide more information
b
Copy code
⠈ 917.84s Building requirements.pex with 6 requirements: apache-airflow-providers-slack~=3.0.0, apache-airflow[amazon,docker]~=2.0.2, boto3<2.0,>=1.17, pydantic<2.0,>=1.9, pyyaml<6.0.0,>=5.4.1, ruamel.yaml~=0.17
b
yeah that's suspciously slow
b
ok @bitter-orange-16997, anyway thanks for your suggestions !
b
and you 100% positive both pants and pip hit Artifactory?
b
Yes!
b
in pants.toml you should have something like
Copy code
[python-repos]
indexes = [
    "<https://your-artifactory.com/>"
]
and in pip.conf
Copy code
[global]
index-url = <https://your-artifactory.com/>
b
Yes, to be honest I put it in my user home ( inside
.pants.rc
for example), but I'm sure that both are using artifactory because I can see the URL on the logs
b
if you want to forcefully check that you can always
echo "127.0.0.1 <http://pypi.org|pypi.org>" > /etc/hosts
but if that's not your issue, I hope the pants team will figure it out
👍 1