#25302 closed New feature (fixed)
BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL
| Reported by: | Aymeric Augustin | Owned by: | Maxime Lorant | 
|---|---|---|---|
| Component: | HTTP handling | Version: | dev | 
| Severity: | Normal | Keywords: | |
| Cc: | Triage Stage: | Ready for checkin | |
| Has patch: | yes | Needs documentation: | no | 
| Needs tests: | no | Patch needs improvement: | no | 
| Easy pickings: | no | UI/UX: | no | 
Description
Many dubious bots send a Referer header that is equal to the current URL, presumably to work around checks for empty Referer headers. Some of these bots are also poorly implemented and trigger a stupid amount of 404s.
BrokenLinkEmailsMiddleware is smart enough not to report 404s without a Referer. I suggest to make it not report 404s when the Referer it equal to the current URL either.
Change History (13)
comment:1 by , 10 years ago
comment:2 by , 10 years ago
| Owner: | changed from to | 
|---|---|
| Status: | new → assigned | 
comment:3 by , 10 years ago
Should I update the documentation? It is already written that empty referer are ignored, I think we could add something about this new behaviour too... 
See https://docs.djangoproject.com/en/1.8/howto/error-reporting/#errors
comment:5 by , 10 years ago
| Triage Stage: | Unreviewed → Accepted | 
|---|---|
| Version: | 1.8 → master | 
comment:7 by , 10 years ago
| Resolution: | fixed | 
|---|---|
| Status: | closed → new | 
I ran 1.9 RC 1 in production for a few days and sadly, the fix doesn't yield the results I hoped for.
Broken bots are hardcoded to use http://<domain><url> as referer. However Django's check is sensitive to the scheme. Since I run on https://... the condition added to fix this ticket here never triggers.
Would it make sense to also ignore the scheme in the check?
comment:8 by , 10 years ago
I submitted a pull request: https://github.com/django/django/pull/5730.
I would like to backport it to 1.9 before it's released.
comment:9 by , 10 years ago
I don't mind ignoring the scheme indeed, +100 for the PR :) Did not think of it but it is clearly something that should be ignored. 
comment:10 by , 10 years ago
One hour after posting this PR, I received 160 emails like the following in 8 minutes:
Referrer: http://REDACTED.com/libraries/joomla/html/language/en-GB/en-GB.jhtmldate.ini Requested URL: /libraries/joomla/html/language/en-GB/en-GB.jhtmldate.ini User agent: Go 1.1 package http IP address: 162.158.56.47
I'm wondering if anyone besides me actually uses this in production...
comment:11 by , 10 years ago
| Has patch: | set | 
|---|---|
| Triage Stage: | Accepted → Ready for checkin | 
Just to be clear: a 404 with a Referer equal to the current URL must be a false positive, because there's no way you can be coming from a page that doesn't exist.
If the page disappeared between the previous and the current request, then the problem is solved, since the page containing the broken link no longer exists.