Opened 9 years ago

Closed 8 years ago

Last modified 8 years ago

#25302 closed New feature (fixed)

BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL

Reported by: Aymeric Augustin Owned by: Maxime Lorant
Component: HTTP handling Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Many dubious bots send a Referer header that is equal to the current URL, presumably to work around checks for empty Referer headers. Some of these bots are also poorly implemented and trigger a stupid amount of 404s.

BrokenLinkEmailsMiddleware is smart enough not to report 404s without a Referer. I suggest to make it not report 404s when the Referer it equal to the current URL either.

Change History (13)

comment:1 by Aymeric Augustin, 9 years ago

Just to be clear: a 404 with a Referer equal to the current URL must be a false positive, because there's no way you can be coming from a page that doesn't exist.

If the page disappeared between the previous and the current request, then the problem is solved, since the page containing the broken link no longer exists.

comment:2 by Maxime Lorant, 9 years ago

Owner: changed from nobody to Maxime Lorant
Status: newassigned

comment:3 by Maxime Lorant, 9 years ago

Should I update the documentation? It is already written that empty referer are ignored, I think we could add something about this new behaviour too...

See https://docs.djangoproject.com/en/1.8/howto/error-reporting/#errors

comment:4 by Aymeric Augustin, 9 years ago

Yes, you should update the documentation as well.

comment:5 by Claude Paroz, 9 years ago

Triage Stage: UnreviewedAccepted
Version: 1.8master

comment:6 by Tim Graham <timograham@…>, 9 years ago

Resolution: fixed
Status: assignedclosed

In 4ce433e:

Fixed #25302 -- Prevented BrokenLinkEmailsMiddleware from reporting 404s when Referer = URL.

comment:7 by Aymeric Augustin, 8 years ago

Resolution: fixed
Status: closednew

I ran 1.9 RC 1 in production for a few days and sadly, the fix doesn't yield the results I hoped for.

Broken bots are hardcoded to use http://<domain><url> as referer. However Django's check is sensitive to the scheme. Since I run on https://... the condition added to fix this ticket here never triggers.

Would it make sense to also ignore the scheme in the check?

comment:8 by Aymeric Augustin, 8 years ago

I submitted a pull request: https://github.com/django/django/pull/5730.

I would like to backport it to 1.9 before it's released.

comment:9 by Maxime Lorant, 8 years ago

I don't mind ignoring the scheme indeed, +100 for the PR :) Did not think of it but it is clearly something that should be ignored.

comment:10 by Aymeric Augustin, 8 years ago

One hour after posting this PR, I received 160 emails like the following in 8 minutes:

Referrer: http://REDACTED.com/libraries/joomla/html/language/en-GB/en-GB.jhtmldate.ini
Requested URL: /libraries/joomla/html/language/en-GB/en-GB.jhtmldate.ini
User agent: Go 1.1 package http
IP address:  162.158.56.47

I'm wondering if anyone besides me actually uses this in production...

comment:11 by Tim Graham, 8 years ago

Has patch: set
Triage Stage: AcceptedReady for checkin

comment:12 by Aymeric Augustin <aymeric.augustin@…>, 8 years ago

Resolution: fixed
Status: newclosed

In 11f10b7:

Fixed #25302 (again) -- Ignored scheme when checking for bad referers.

The check introduced in 4ce433e was too strict in real life. The poorly
implemented bots this patch attempted to ignore are sloppy when it comes
to http vs. https.

comment:13 by Aymeric Augustin <aymeric.augustin@…>, 8 years ago

In 8dc11dc5:

[1.9.x] Fixed #25302 (again) -- Ignored scheme when checking for bad referers.

The check introduced in 4ce433e was too strict in real life. The poorly
implemented bots this patch attempted to ignore are sloppy when it comes
to http vs. https.

Backport of 11f10b7 from master

Note: See TracTickets for help on using tickets.
Back to Top