Opened 4 years ago

Closed 6 months ago

#16011 closed Cleanup/optimization (duplicate)

Improve Django documentation search engine results relevancy

Reported by: info.ksamuel@… Owned by: nobody
Component: *.djangoproject.com Version: 1.3
Severity: Normal Keywords: search-engine
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

It's very hard for beginners to find something if you don't know exactly where to look.

I realize that you probably use a sphynx documentation with the build-in search engine and that integrating something like haystack would be a lot of work, but look:

Looking for a reference of the filter 'join':

http://docs.djangoproject.com/search/?q=filter+join&release=5

Fifth answer, no way to know this is it and once you click have to find to the right section in a huge page.

Now looking for join on the PHP doc:

http://fr.php.net/results.php?q=join&l=en&p=all

First answer, immediate visual clue. Click: you got it.

This is one example, but this happens to people I teach Django to every day.

The Django documentation is very useful and it's a shame that I have to advice to anybody I teach django to to use Google to search the doc.

Change History (9)

comment:1 Changed 4 years ago by lukeplant

  • Component changed from Documentation to Djangoproject.com Web site
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

It should be noted that PHP can achieve this by having a completely flat namespace for all its functions. There is no reason why we might not have more than one thing called 'join', so we cannot rival PHP here.

I agree we could do better though. Having lots of old release notes clutter the results is one problem that stands out.

Also, on the front page, there is a line saying "Looking for something specific? Use the index". If you do so, you get exactly to the 'join' docs in a few clicks. However, I've had to point this out more than once now. It would be good if the search used the index first, at least for one word queries. It is possible that Sphinx already supports this, if not, it would be a good addition for Sphinx. I think this would work a lot better than integrating haystack, because Sphinx has much more information to help it decide relevancy.

So, we should get Sphinx to do this. I'll accept the ticket and leave as open until we have achieved that.

comment:2 Changed 4 years ago by lukeplant

  • Type changed from New feature to Cleanup/optimization

It looks like we are using haystack already, and I'd suggest this is the problem. If you compare to the search results produced by Sphinx, our results are terrible for simple searches.

Compare a Sphinx search for 'join'

http://readthedocs.org/docs/django/en/latest/search.html?q=join&check_keywords=yes&area=default

with ours:

http://docs.djangoproject.com/search/?q=join&release=5

Sphinx could still be a lot better by putting sorting its matches from the index better - the 'join' template filter should and could be at the top of the list.

With more than one word, both terrible. Sphinx does not use appear to use its index for these, so the direct links disappear, and without them search is very poor, because you only get to the page, and Django typically has long documentation pages with plenty of anchors.

Why was haystack adopted? Does it provide anything useful? It is faster than using Sphinx's search, but getting bad results quickly doesn't seem a great ideal.

comment:3 Changed 4 years ago by lukeplant

I've created a patch for Sphinx that massively improves its results, and sent a pull request.

My changes can be found here: https://bitbucket.org/spookylukey/sphinx/overview

comment:4 Changed 4 years ago by lukeplant

My patches have been pulled into Sphinx trunk now. You'll have to update to latest Django to be able to use latest Sphinx, but if you do, searching offline docs is much better. (Offline docs don't have an obvious search box though, which is a pain).

With these changes, the Sphinx search is actually quite a lot better than the haystack search, for some searches. The basic reasons are:

1) haystack never takes you directly to a section - you always have to search within the page when you get there. It doesn't have all the metadata that Sphinx has to be able to do this.

2) Django docs tend to have long pages, with many individual small items combined on a single page (e.g. all settings, all template tags/filters etc, all methods of a certain class etc).

For these reasons, I don't think haystack by itself is ever going to cut it for us.

However, for some searches, haystack definitely wins (e.g. search for 'multiple databases'). These tend to be when the search relates to a topic, rather than some object. So I think we need a combination of using Sphinx data for object lookups, which should appear first, with haystack for topic searches.

comment:5 Changed 3 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:6 Changed 3 years ago by aaugustin

re. haystack never takes you directly to a section : see #10613

I've closed #15242 as a duplicate of this. It suggested weighting titles more heavily.

comment:7 Changed 3 years ago by aaugustin

See also #18633

comment:8 Changed 2 years ago by aaugustin

I recently upgraded all the dependencies on djangoproject.com. It's now using the latest Sphinx.

Adding the search box to offline docs should be a simple matter of updating the template. It's independent from the website.

Mixing two search engines sounds complicated. Currently, everyone probably uses Google for non-trivial / topic searches. Wouldn't it make sense to simply switch to Sphinx' search everywhere?

comment:9 Changed 6 months ago by timgraham

  • Resolution set to duplicate
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.
Back to Top