Who would know how the pants docs search engine is...
# development
w
Who would know how the pants docs search engine is loaded? I feel like https://www.pantsbuild.org/docs/reference-global isn't being indexed correctly. I've never really had success searching items on there
Screenshot 2023-04-07 at 10.53.57 AM.png
p
I've had issues finding the global options as well.
w
Yeah, definitely not running on Algolia 🙂 I've never really known how readme search works actually - Not sure if it's a scraper (e.g. maybe we're not correctly linking to the global options page?) or whether it should ingest all the HTML
p
We need something that indexes symbols. Searching for
__defaults__
doesn't work because it ignores the underscores.
I wonder if sphinx + rtd would fare any better
w
Searching for
__defaults__
doesn't work because it ignores the underscores.
Oh, wild, I didn't know that. Though in hindsight, a lot of discovery problems for me are making sense 🤦‍♂️
b
Its search is poor, unfortunately shifting more tech support burden onto the team than should be necessary. Anyone who'd like to take on the work of moving us over to sphinx would be doing a great service.
w
Not sure if that would help searchability - unless the hosting platform would?
Maybe if we could get an Algolia open source plan - I vaguely recall they had/have/will have a site scraper - so you kinda just point it at your docs site... Though, maybe I'm conflating which search engines I was looking at
b
One of the many issues with readme is that we are not allowed to use custom CSS or custom JS, which severely restricts our ability to use external tooling. We could link to an external indexing tool, but we could not integrate it.
w
Oh wow, didn't know that.
b
And if all we can do is link to a custom search engine, might as well use something like Google Custom Search for free.
There is a tier that allows custom JS, but it's very expensive; well beyond the community's means.
w
I statically host most of my sites on Cloudflare pages - was just going through the process of adding Algolia to my blog. Also I use Algolia for an enterprise product. Not cheap, but so good
p
Sphinx, restructured text, and markdown are on my list of things to work on in pants, but there are several other things I need to do before I’ll get to that. So, if anyone gets there before me, please! 🙂
🎉 2
b
Just getting us moved over to sphinx would be a big service, since it would give us a lot more flexibility in taking next steps. As far as I know, most of the the docs are already in markdown since it's the format readme supports.
One thing that's a bummer to me about readme's indexing is that it only indexes readme itself. blog.pantsbuild.org and chat.pantsbuild.org for instance are not indexed, nor is GH. Imagine if all of that content were indexed in one place. People would be able to do so much more self-service support, and keep moving quickly.
readme also only indexes certain content. The home page, for instance, is not indexed. We also are restricted in what we can do with the theme, so for instance the reason the header menu is overpopulated with stuff like ToS, privacy policy, and cookie policy because we can't create a footer.
w
I feel like a lot of questions I've been asking myself are slowly being answered.....
b
Pages like https://www.pantsbuild.org/docs/who-uses-pants and https://www.pantsbuild.org/docs/language-support exist purely because of readme's awkward architecture constraints.
Every time I see those "click here" to go to the actual page, I'm like grrr, that is so 1996.
A lot of FAQs we get are because the search is fairly primitive. For instance, "rename build files" and "name build files" do not turn up any obvious link to the info the person is seeking.
p
Is the blog made with markdown stored in the pants repo too? Any indexing sphinx does is of the content it generates.
w
b
It also doesn't handle decimals correctly. So searching for a decimal number like 3.8 results in matches for anywhere a 3 appears, regardless of whether it's followed by dot-eight.
p
another case of using a generic text search engine that strips out symbols
b
Yeah. For a tool that's meant to be specialized for handling code, readme is disappointingly not well-tuned to the task.
It seems to discard many symbols, so for instance, searching for
./pants
gets 21 pages of matches that mostly are not actually matches.
p
yup. not surprised.
b
In answer to your earlier questions, stuff that isn't autogenerated is in
<https://github.com/pantsbuild/pants/tree/main/docs/markdown>
btw.
p
including the blog?
b
No, the blog is hosted with Ghost
And chat.pantsbuild.org is hosted by Linen. So neither is part of the repo.
p
ah. Then yeah - sphinx and rtd won’t help with a combined search for all of those. Or at least not out of the box. We would need something else that indexes all three, which we could wire up through sphinx.
w
I think not being able to search slack/linen makes sense to some degree. Ditto for those communities using Discord. Until you're in the app, searching is hard, unless they have some API that allows digging in?
p
linen is designed to be search-engine friendly
b
Well if we had access to custom JS and custom CSS, then we could integrate any open source or low-cost search tooling we want.
👍 1
w
linen is designed to be search-engine friendly
Really? I didn't see it come up anywhere in search results when doing exact quotes... Hmm
p
Well, that’s one of the things they advertised I thought. 😕
b
Linen isn't yet indexed by Google et al. Until we publish more links to it for search engines to find, it's still pretty low-profile. But its own search is pretty good.
👍 1
w
site:<http://chat.pantsbuild.org|chat.pantsbuild.org> pants
I was able to get some stuff showing up, which is pretty neat! I'd say it's not well indexed yet, since the only stuff that popped up was the top-level messages, not threads, but maybe it'll get better as it goes on
🚀 1
b
I'm impressed it's found it at all. The first public link was in the blog post a couple days ago. It may have only just begun spidering, which would explain the depth=1 results.
w
Let Google answers questions and not your team
Linen is designed to be search-engine friendly. Most if not all chat tools are not search-engine friendly or accessible. Linen is search-engine friendly because we render a static version of our app to search engines like Google
Love it so much. It's annoying to try to search through a team's discord or slack (but first needing to sign up) - walled garden approach to open-source support. Github issues + discussions? No problem, great results. But I just think about the thousands of questions asked every day on Svelte's discord that would be so helpful to see when ~googling~DuckDuckGo-ing. Another side effect of all the chat systems is that StackOverflow really isn't the go-to resource when you're looking for something framework/library specific, which is also unfortunate.
2
b
We put 8 years of Slack history into Linen, so I'm guessing there are a lot of static pages to crawl.
🎉 1
👖 1
b
FWIW (a lot, I think) I was searching for something a few days ago and turned up some old and vaguely useful results buried in a slack thread on linen 👍
🚀 2
c
How about getting our Pants docs onto devdocs.io ? https://github.com/freeCodeCamp/devdocs/blob/main/.github/CONTRIBUTING.md#contributing-new-documentations I’ve used devdocs in the past and it has mostly great searchability for the stuff on there (with a few exceptions)
It’s a docs scraper, so doesn’t mean we host the docs any where else, just that it is scraped from where ever and shows up there as well.
👀 1