Opened 12 years ago
Closed 12 years ago
#20099 closed New feature (fixed)
Allow filtering 404 reports by user agent
Reported by: | Ram Rachum | Owned by: | nobody |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | yes |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I get a lot of 404 reports to my inbox that are caused by various spiders. I'd like to block these. The same way we have a mechanism for blocking 404 reports for certain URLs, we should be able to block 404 reports with user agents that are easily discernible as spiders.
Attachments (5)
Change History (28)
comment:1 by , 12 years ago
Component: | Uncategorized → HTTP handling |
---|---|
Triage Stage: | Unreviewed → Accepted |
follow-up: 3 comment:2 by , 12 years ago
I agree regarding email, it's better to redirect to services like Sentry.
Regarding the first item, do we need anything beyond just adding , user_agent=None
to the definition of is_ignorable_404
and sending in the user agent in process_response
?
comment:3 by , 12 years ago
Replying to coolRR:
Regarding the first item, do we need anything beyond just adding
, user_agent=None
to the definition ofis_ignorable_404
and sending in the user agent inprocess_response
?
Yes, something like that.
comment:4 by , 12 years ago
Has patch: | set |
---|
Patch attached. Let me know whether it's good. (I don't work often with patches, this one was generated by Git, I don't know whether you can use it in hg.)
comment:5 by , 12 years ago
Needs documentation: | set |
---|---|
Needs tests: | set |
Django is in Git (Python is in hg), so yes, the format is right.
Now we still miss tests and docs.
by , 12 years ago
Attachment: | 0001-Tests-for-user-agent-input-to-is_ignorable_404.patch added |
---|
Tests
comment:6 by , 12 years ago
Here's a test. It's pretty lame; I would have preferred a test that creates a subclass but I'm very clueless about Django's test architecture, so I don't know how to do it in a way that won't bother other Django developers.
comment:7 by , 12 years ago
Needs tests: | unset |
---|
Here's how I would have tested that functionality. Of course, testing is often a matter of personal taste (unit-oriented or functionality-oriented). But it's a little more in line with other tests in the same class.
comment:8 by , 12 years ago
At this point, it would make more sense to pass the whole request
to is_ignorable_404
.
Please provide a single patch including both code, tests and docs changes.
comment:9 by , 12 years ago
aaugustin: I considered giving the request as an argument so the user could do anything they want, but then I thought that if the user needs so much freedom, maybe they're better off replacing BrokenLinkEmailsMiddleware completely.
comment:10 by , 12 years ago
I'd just like to avoid further tickets asking to add more keyword arguments.
by , 12 years ago
Attachment: | 0001-Implement-BrokenLinkEmailsMiddleware.is_request_we_s.patch added |
---|
comment:11 by , 12 years ago
aaugustin: I've submitted a patch which changes the method to take the request. It also moves the existing request-checking logic into that method. I didn't do docs yet, I want to first know whether this approach is acceptable.
comment:12 by , 12 years ago
This is entering bikeshedding territory, the only thing I don't like in this patch is the name is_request_we_should_notify_for
; is_ignorable_request
would be much more straightforward.
comment:13 by , 12 years ago
I'm not attached to the name (I felt it was too long but couldn't come up with a shorter one.)
Note though that is_ignorable_request
is a negative name while the name I introduced is positive. (I flipped the logic of the method, I think that all those double negatives are really annoying.)
If you can come up with a short and positive name, I'll be happy to hear it.
comment:14 by , 12 years ago
I thought about the double negative, but I felt it was still worth emphasizing the relation with IGNORABLE_404_URLS.
comment:15 by , 12 years ago
I'm not sure what you're suggesting. If you're suggesting to keep the old name and negativity so it'll be easier for people to understand that this method is related to IGNORABLE_404_URLS
, I disagree.
comment:16 by , 12 years ago
#10214 was closed as a duplicate. The patch for this ticket should add a note in the docs, explaining that subclassing the middleware is the recommended way to alter its behavior.
by , 12 years ago
Attachment: | 0001-Implement-BrokenLinkEmailsMiddleware.is_request_we_s.2.patch added |
---|
comment:20 by , 12 years ago
Added pull request, if that helps:
https://github.com/django/django/pull/1014
If you want, I can rename is_request_we_should_notify_for
to is_relevant_request
.
comment:21 by , 12 years ago
Revisiting this issue after some time, here's an alternative approach: https://github.com/django/django/pull/1211
We need a referee now :-)
comment:22 by , 12 years ago
Cc: | added |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Left some minor comment on @claudep's PR. Apart from those the patch looks RFC.
comment:23 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Globally, I'm not thrilled by the proposal. Firstly, redirecting 404 to inboxes is not a best practice, as generally it is better to redirect to a service like Sentry. Secondly, this filtering is depending on projects, there might be people wanting to make stats about 404 generated by spiders. And thirdly, the spider's UA are changing and hard-coding a list in Django would be subject to frequent updates.
However, we should ease middleware customization so as filtering is easier. I would like to counsel people to subclass
BrokenLinkEmailsMiddleware
and write their ownis_ignorable_404
method. Unfortunately, the user agent is not accessible for that method. So I'm accepting this ticket with two goals:is_ignorable_404
is_ignorable_404
method of https://docs.djangoproject.com/en/dev/ref/middleware/#django.middleware.common.BrokenLinkEmailsMiddleware