Code

Opened 3 years ago

Closed 3 years ago

#16541 closed Bug (wontfix)

A broken URL should not handled as a 400 bad request, it might a 404 not found

Reported by: kinpoo Owned by: nobody
Component: HTTP handling Version: 1.3
Severity: Normal Keywords:
Cc: kinpoo Triage Stage: Design decision needed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: yes

Description

like broken URL "/%E5", now it except UnicodeDecodeError and return HttpResponseBadRequest

  1. I think it might be handle as a bad URL and output 404 not found, not 400 bad request.
  1. Sometimes, add some unnecessary information in URL for SEO reason, example: "/123-外贸", encoded as "/123-%E5%A4%96%E8%B4%B8", if it broken, looks like "/123-%E5", I can also get the correct resource and redirect to correct url

Attachments (1)

ticket-16541.diff (3.0 KB) - added by kinpoo 3 years ago.

Download all attachments as: .zip

Change History (5)

Changed 3 years ago by kinpoo

comment:1 Changed 3 years ago by kinpoo

  • Cc kinpoo added
  • Component changed from Uncategorized to HTTP handling
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • UI/UX set

comment:2 Changed 3 years ago by kmtracey

  • Resolution set to wontfix
  • Status changed from new to closed

Item number 1 in the description is a design decision that has already been made. See #5738 for full discussion, particularly comment 10 (https://code.djangoproject.com/ticket/5738#comment:10), which describes the current code. At this point I don't see a compelling reason presented here to reverse that decision, so for item 1 I'd close this as a wontfix.

I can't quite follow what the problem is in item 2. If you are saying that you want to be able to recover from someone breaking a full valid url string ("/123-%E5%A4%96%E8%B4%B8") and sending the server only ("/123-%E5") that too I'd classify as wontfix. I don't see how it would be possible in the general case for the server to correctly figure out what that real target for the incomplete url had been. If your code could indeed recover from this if only the WSGIRequest class would not throw an error on the attempt to decode, then the comment noted above in #5738 describes how in your special case you could set things up to allow that: set up your system with a WSGIHandler that has a special WSGIRequest class that behaves the way you want.

Therefore I'm closing wontfix, though if I've misunderstood the problem you are trying to describe in 2 and you think there is really a more general problem with the current code please try to spell it out a little more clearly.

comment:3 Changed 3 years ago by kinpoo

  • Resolution wontfix deleted
  • Status changed from closed to reopened
  • Triage Stage changed from Unreviewed to Design decision needed

Sorry, my English not good. My be you to understand me so hardly.

For this ticket the point is item 1, not item 2.

For item 1 you can see other servers does, most of HTTP servers returns a 404 page. For example:

And in HTTP 1.1: Status Code Definitions, it says:

10.4.1 400 Bad Request

The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications.

The malformed syntax should be point to error headers, not error URL, example: "GET / HTTP/1.1 NO NEWLINE HERE HOST: example.com\r\n"

And if ignore the UnicodeDecodeError, we can do more things, we can try to resolver it, or can not resolved and return 404.


For item 2, we can control our codes, make it DRY, but we can not control the process of communication. Some site truncate long URL; some user copy URL carefulness. So we need make all efforts to fix the URL for better user experience. We can try to fix it like this:

# in urls.py
    # ...
    url(r'^wiki/(?P<id>\w+)-(?P<title>.*)$', 'article'),
    #...

# in views.py
# ...
def article(request, id, title=''):
    article = get_object_or_404(Article, id=id)
    if title != article.title:
        return http.HttpResponsePermanentRedirect(article.get_absolute_url())
    # ...

For now I can process PATH_INFO before django_handler in my fapws3 server script, no need to setup a special WSGIRequest class.

def application(env, start_response):
    env['PATH_INFO'] = force_unicode(env.get('PATH_INFO', u'/'), errors='ignore')
    return list(django_handler.handler(env, start_response))

But, I also think it should be return 404, like other servers does, thank you.

Version 0, edited 3 years ago by kinpoo (next)

comment:4 Changed 3 years ago by jacob

  • Resolution set to wontfix
  • Status changed from reopened to closed

I agree with kmtracy; marking wontfix again. kinpoo, if you'd like to discuss this further, please take it to the django-dev mailing list; discussion here on Trac doesn't work all that well.

Thanks!

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.