Opened 13 years ago
Closed 10 years ago
#16011 closed Cleanup/optimization (duplicate)
Improve Django documentation search engine results relevancy
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | *.djangoproject.com | Version: | 1.3 |
Severity: | Normal | Keywords: | search-engine |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
It's very hard for beginners to find something if you don't know exactly where to look.
I realize that you probably use a sphynx documentation with the build-in search engine and that integrating something like haystack would be a lot of work, but look:
Looking for a reference of the filter 'join':
http://docs.djangoproject.com/search/?q=filter+join&release=5
Fifth answer, no way to know this is it and once you click have to find to the right section in a huge page.
Now looking for join on the PHP doc:
http://fr.php.net/results.php?q=join&l=en&p=all
First answer, immediate visual clue. Click: you got it.
This is one example, but this happens to people I teach Django to every day.
The Django documentation is very useful and it's a shame that I have to advice to anybody I teach django to to use Google to search the doc.
Change History (9)
comment:1 by , 13 years ago
Component: | Documentation → Djangoproject.com Web site |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 13 years ago
Type: | New feature → Cleanup/optimization |
---|
It looks like we are using haystack already, and I'd suggest this is the problem. If you compare to the search results produced by Sphinx, our results are terrible for simple searches.
Compare a Sphinx search for 'join'
http://readthedocs.org/docs/django/en/latest/search.html?q=join&check_keywords=yes&area=default
with ours:
http://docs.djangoproject.com/search/?q=join&release=5
Sphinx could still be a lot better by putting sorting its matches from the index better - the 'join' template filter should and could be at the top of the list.
With more than one word, both terrible. Sphinx does not use appear to use its index for these, so the direct links disappear, and without them search is very poor, because you only get to the page, and Django typically has long documentation pages with plenty of anchors.
Why was haystack adopted? Does it provide anything useful? It is faster than using Sphinx's search, but getting bad results quickly doesn't seem a great ideal.
comment:3 by , 13 years ago
I've created a patch for Sphinx that massively improves its results, and sent a pull request.
My changes can be found here: https://bitbucket.org/spookylukey/sphinx/overview
comment:4 by , 13 years ago
My patches have been pulled into Sphinx trunk now. You'll have to update to latest Django to be able to use latest Sphinx, but if you do, searching offline docs is much better. (Offline docs don't have an obvious search box though, which is a pain).
With these changes, the Sphinx search is actually quite a lot better than the haystack search, for some searches. The basic reasons are:
1) haystack never takes you directly to a section - you always have to search within the page when you get there. It doesn't have all the metadata that Sphinx has to be able to do this.
2) Django docs tend to have long pages, with many individual small items combined on a single page (e.g. all settings, all template tags/filters etc, all methods of a certain class etc).
For these reasons, I don't think haystack by itself is ever going to cut it for us.
However, for some searches, haystack definitely wins (e.g. search for 'multiple databases'). These tend to be when the search relates to a topic, rather than some object. So I think we need a combination of using Sphinx data for object lookups, which should appear first, with haystack for topic searches.
comment:6 by , 13 years ago
comment:8 by , 12 years ago
I recently upgraded all the dependencies on djangoproject.com. It's now using the latest Sphinx.
Adding the search box to offline docs should be a simple matter of updating the template. It's independent from the website.
Mixing two search engines sounds complicated. Currently, everyone probably uses Google for non-trivial / topic searches. Wouldn't it make sense to simply switch to Sphinx' search everywhere?
comment:9 by , 10 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
It should be noted that PHP can achieve this by having a completely flat namespace for all its functions. There is no reason why we might not have more than one thing called 'join', so we cannot rival PHP here.
I agree we could do better though. Having lots of old release notes clutter the results is one problem that stands out.
Also, on the front page, there is a line saying "Looking for something specific? Use the index". If you do so, you get exactly to the 'join' docs in a few clicks. However, I've had to point this out more than once now. It would be good if the search used the index first, at least for one word queries. It is possible that Sphinx already supports this, if not, it would be a good addition for Sphinx. I think this would work a lot better than integrating haystack, because Sphinx has much more information to help it decide relevancy.
So, we should get Sphinx to do this. I'll accept the ticket and leave as open until we have achieved that.