Opened 5 years ago

Closed 5 years ago

#15954 closed New feature (fixed)

Django's ignorable 404 list should include iphone favicons

Reported by: Paul McMillan Owned by: Aymeric Augustin
Component: Core (Other) Version: 1.3
Severity: Normal Keywords: middleware
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX:

Description

The ignorable 404 list in global_settings should include some patterns to suppress the awful racket that iphones can make when hitting a site.

Specifically, the IGNORABLE_404_STARTS setting should include 'apple-touch-icon.png' for sure.

It's debatable if it should also include '/m/', '/mobi/, '/mobile/', '/iphone/', and /pda/. I'm not sure if those are a bot, or actually some versions of the phone itself, but 404s for those urls are pretty common.

Attachments (2)

15954.patch (10.9 KB) - added by Aymeric Augustin 5 years ago.
15954.2.patch (12.5 KB) - added by Aymeric Augustin 5 years ago.

Download all attachments as: .zip

Change History (10)

comment:1 Changed 5 years ago by Aymeric Augustin

Easy pickings: unset
Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset
Triage Stage: UnreviewedDesign decision needed

The current default value is IGNORABLE_404_ENDS = ('mail.pl', 'mailform.pl', 'mail.cgi', 'mailform.cgi', 'favicon.ico', '.php').

It would make sense to ignore apple-touch-icon-precomposed.png and apple-touch-icon.png, but that will not be enough, since iDevices will try apple-touch-icon-<width>x<height>-precomposed.png and apple-touch-icon-<width>x<height>.png first starting with iOS 4.2.

See the documentation here: https://developer.apple.com/library/safari/#documentation/AppleApplications/Reference/SafariWebContent/ConfiguringWebApplications/ConfiguringWebApplications.html

While we are on this topic, maybe crossdomain.xml (Flash) and robots.txt should be ignored too.

Since all this is pretty debatable, an alternative solution would be to remove the default value for this settings. (Do I want to be notified that I did not create a robots.txt? Or a favicon.ico? Well, that completely depends on the goals of my website.) In practice, that would mean providing an empty tuple as the default value and suggesting a list of common offenders in the documentation: if you haven't created a favicon.ico, robots.txt, etc. add these files to IGNORABLE_404_ENDS.

For the record, there is also IGNORABLE_404_STARTS = ('/cgi-bin/', '/_vti_bin', '/_vti_inf'). Both settings should be handled consistently.

comment:2 Changed 5 years ago by Luke Plant

One possible solution:

  • deprecate IGNORABLE_404_STARTS and IGNORABLE_404_ENDS
  • replace with IGNORABLE_404_URLS, which:
    • uses regular expressions (compiled or otherwise), allowing us to match the apple-touch-icon-* requests and others
    • defaults to an empty list, so that we are not making decisions about whether a site should have robots.txt etc. or not
    • possibly has a commented out line in the project template that includes a sensible set of defaults, so it is easy for people to enable. Alternatively we could put the list in the docs, as Aymeric (aaugustin) suggests.

A quick discussion on django-devs is needed, though, I think.

comment:3 Changed 5 years ago by Aymeric Augustin

Owner: changed from nobody to Aymeric Augustin
Triage Stage: Design decision neededAccepted

Given that the idea was suggested by a core developer, that there were 2 +1 from the community on django-devs, and that no one opposed the idea, I'll mark it as accepted.

Last edited 5 years ago by Aymeric Augustin (previous) (diff)

Changed 5 years ago by Aymeric Augustin

Attachment: 15954.patch added

comment:4 Changed 5 years ago by Aymeric Augustin

Has patch: set

Attached patch implements lukeplant's suggestion.

Bonus: I added tests for the 404 reporting by email—there wasn't any.

comment:5 Changed 5 years ago by Luke Plant

Patch needs improvement: set
Type: Cleanup/optimizationNew feature

Good work, here's my review:

Even though it is trivial to replace IGNORABLE_404_STARTS/ENDS with IGNORABLE_404_URLS, our deprecation/backwards compatibility policy means we should keep supporting the former until we completely remove them two versions later. This is consistent with how we've handled DATABASE_* -> DATABASES (which is actually a simpler change as it doesn't change functionality).

So we need the existing settings to carry on working. We should also emit a PendingDeprecation warning if they are non-empty.

We also need this adding to the deprecation timeline under Django 1.6.

One nit in documentation: "of if you want to keep the old default value" should start or not of.

comment:6 Changed 5 years ago by Aymeric Augustin

Patch needs improvement: unset

Here is a new patch that implements the deprecation path.

In fact, there are two places where we could raise the PendingDeprecationWarning:

  • when the settings are used: this is what I have done. A warning will be emitted for every 404 when SEND_BROKEN_LINK_EMAILS is True and DEBUG is False. It will be written in the server log.
  • when the settings are loaded: this would be a bigger change. In django/conf/__init__.py, we would emit the warning, compile IGNORABLE_404_STARTS/ENDS to regexps and append them to IGNORABLE_404_URLS. It has the advantage that the warning will also be visible under ./manage runserver, but I am not sure it is worth it.

There is still one backwards-incompatible change that will immediately appear in 1.4: the default value changes. Arguably, it's a bug fix.

Changed 5 years ago by Aymeric Augustin

Attachment: 15954.2.patch added

comment:7 Changed 5 years ago by Luke Plant

Triage Stage: AcceptedReady for checkin

I think it's fine to put the warning in the place you've done it, putting it in django/conf/__init__.py

And given that we've decided to warn people if these old settings are non-empty, it wouldn't make sense for them to be non-empty by default, so that backwards incompatibility is also fine.

So marking as RFC accordingly.

comment:8 Changed 5 years ago by Luke Plant

Resolution: fixed
Status: newclosed

In [16160]:

Fixed #15954 - New IGNORABLE_404_URLS setting that allows more powerful filtering of 404s to ignore

Thanks to aaugustin for implementing this.

(Technically this doesn't fix the original report, as we've decided against
having *any* default values, but the new feature makes it possible, and the
docs have an example addressing #15954).

Note: See TracTickets for help on using tickets.
Back to Top