Opened 13 years ago
Closed 13 years ago
#16541 closed Bug (wontfix)
A broken URL should not handled as a 400 bad request, it might a 404 not found
Reported by: | kinpoo | Owned by: | nobody |
---|---|---|---|
Component: | HTTP handling | Version: | 1.3 |
Severity: | Normal | Keywords: | |
Cc: | kinpoo | Triage Stage: | Design decision needed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | yes |
Description
like broken URL "/%E5", now it except UnicodeDecodeError and return HttpResponseBadRequest
- I think it might be handle as a bad URL and output 404 not found, not 400 bad request.
- Sometimes, add some unnecessary information in URL for SEO reason, example: "/123-外贸", encoded as "/123-%E5%A4%96%E8%B4%B8", if it broken, looks like "/123-%E5", I can also get the correct resource and redirect to correct url
Attachments (1)
Change History (5)
by , 13 years ago
Attachment: | ticket-16541.diff added |
---|
comment:1 by , 13 years ago
Cc: | added |
---|---|
Component: | Uncategorized → HTTP handling |
UI/UX: | set |
comment:2 by , 13 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
comment:3 by , 13 years ago
Resolution: | wontfix |
---|---|
Status: | closed → reopened |
Triage Stage: | Unreviewed → Design decision needed |
Sorry, my English not good. My be you to understand me so hardly.
For this ticket the point is item 1, not item 2.
For item 1 you can see other servers does, most of HTTP servers returns a 404 page. For example:
- http://www.google.com/%E5
- http://www.apache.org/%E5
- http://www.nginx.org/%E5
- http://www.lighttpd.net/%E5
- http://www.iis.net/%E5
- http://www.cherokee-project.com/%E5
And in HTTP 1.1: Status Code Definitions, it says:
10.4.1 400 Bad Request
The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications.
The malformed syntax should be point to error headers, not error URL, example: "GET / HTTP/1.1 NO NEWLINE HERE HOST: example.com\r\n"
And if ignore the UnicodeDecodeError, we can do more things, we can try to resolver it, or can not resolved and return 404.
For item 2, we can control our codes, make it DRY, but we can not control the process of communication. Some site truncate long URL; some user copy URL carefulness. So we need make all efforts to fix the URL for better user experience. We can try to fix it like this:
# in urls.py # ... url(r'^wiki/(?P<id>\w+)-(?P<title>.*)$', 'article'), #... # in views.py # ... def article(request, id, title=''): article = get_object_or_404(Article, id=id) if title != article.title: return http.HttpResponsePermanentRedirect(article.get_absolute_url()) # ...
For now I can process PATH_INFO before django_handler in my fapws3 server script, no need to setup a special WSGIRequest class.
def application(env, start_response): env['PATH_INFO'] = force_unicode(env.get('PATH_INFO', u'/'), errors='ignore') return list(django_handler.handler(env, start_response))
But, I also think it should be return 404, like other servers does, thank you.
comment:4 by , 13 years ago
Resolution: | → wontfix |
---|---|
Status: | reopened → closed |
I agree with kmtracy; marking wontfix again. kinpoo, if you'd like to discuss this further, please take it to the django-dev mailing list; discussion here on Trac doesn't work all that well.
Thanks!
Item number 1 in the description is a design decision that has already been made. See #5738 for full discussion, particularly comment 10 (https://code.djangoproject.com/ticket/5738#comment:10), which describes the current code. At this point I don't see a compelling reason presented here to reverse that decision, so for item 1 I'd close this as a wontfix.
I can't quite follow what the problem is in item 2. If you are saying that you want to be able to recover from someone breaking a full valid url string ("/123-%E5%A4%96%E8%B4%B8") and sending the server only ("/123-%E5") that too I'd classify as wontfix. I don't see how it would be possible in the general case for the server to correctly figure out what that real target for the incomplete url had been. If your code could indeed recover from this if only the
WSGIRequest
class would not throw an error on the attempt to decode, then the comment noted above in #5738 describes how in your special case you could set things up to allow that: set up your system with aWSGIHandler
that has a specialWSGIRequest
class that behaves the way you want.Therefore I'm closing wontfix, though if I've misunderstood the problem you are trying to describe in 2 and you think there is really a more general problem with the current code please try to spell it out a little more clearly.