#25302 closed New feature (fixed)
BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL
Reported by: | Aymeric Augustin | Owned by: | Maxime Lorant |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Many dubious bots send a Referer header that is equal to the current URL, presumably to work around checks for empty Referer headers. Some of these bots are also poorly implemented and trigger a stupid amount of 404s.
BrokenLinkEmailsMiddleware is smart enough not to report 404s without a Referer. I suggest to make it not report 404s when the Referer it equal to the current URL either.
Change History (13)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:3 by , 9 years ago
Should I update the documentation? It is already written that empty referer are ignored, I think we could add something about this new behaviour too...
See https://docs.djangoproject.com/en/1.8/howto/error-reporting/#errors
comment:5 by , 9 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Version: | 1.8 → master |
comment:7 by , 9 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
I ran 1.9 RC 1 in production for a few days and sadly, the fix doesn't yield the results I hoped for.
Broken bots are hardcoded to use http://<domain><url>
as referer. However Django's check is sensitive to the scheme. Since I run on https://...
the condition added to fix this ticket here never triggers.
Would it make sense to also ignore the scheme in the check?
comment:8 by , 9 years ago
I submitted a pull request: https://github.com/django/django/pull/5730.
I would like to backport it to 1.9 before it's released.
comment:9 by , 9 years ago
I don't mind ignoring the scheme indeed, +100 for the PR :) Did not think of it but it is clearly something that should be ignored.
comment:10 by , 9 years ago
One hour after posting this PR, I received 160 emails like the following in 8 minutes:
Referrer: http://REDACTED.com/libraries/joomla/html/language/en-GB/en-GB.jhtmldate.ini Requested URL: /libraries/joomla/html/language/en-GB/en-GB.jhtmldate.ini User agent: Go 1.1 package http IP address: 162.158.56.47
I'm wondering if anyone besides me actually uses this in production...
comment:11 by , 9 years ago
Has patch: | set |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Just to be clear: a 404 with a Referer equal to the current URL must be a false positive, because there's no way you can be coming from a page that doesn't exist.
If the page disappeared between the previous and the current request, then the problem is solved, since the page containing the broken link no longer exists.