Opened 12 months ago

Last modified 11 months ago

#34989 new Cleanup/optimization

Set Content-Length where possible for HttpResponses.

Reported by: Jeppe Fihl-Pearson Owned by: Anders Kaseorg
Component: HTTP handling Version: 4.2
Severity: Normal Keywords: CommonMiddleware uWSGI
Cc: Florian Apolloner Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

We upgraded from Django 4.1.13 to 4.2.7 this week, and as part of this, we noticed redirects for URLs that added a missing trailing slash took 4-5 seconds. Before the upgrade, the redirects only took a fraction of a second to return.

Debugging this, I've found out it's related to the change made for #33700. If I revert that change the delay goes away.

I've also found out this requires a slightly specific setup. So far I've only seen it when using uWSGI as the application server using the http11-socket option. I am not able to reproduce the issue when using http-socket for configuring the listening socket.

I have adapted the reproduction case from #33700 to show this issue. It can be found at https://github.com/Tenzer/django-slow-redirect. When making a request against the http-socket port, it returns immediately:

% time curl -si http://127.0.0.1:8010/url
HTTP/1.1 301 Moved Permanently
Content-Type: text/html; charset=utf-8
Location: /url/

curl -si http://127.0.0.1:8010/url  0.01s user 0.01s system 65% cpu 0.033 total

but when I make a request against the http11-socket port, it instead takes at least 4 seconds extra:

% time curl -si http://127.0.0.1:8011/url
HTTP/1.1 301 Moved Permanently
Content-Type: text/html; charset=utf-8
Location: /url/

curl -si http://127.0.0.1:8011/url  0.01s user 0.01s system 0% cpu 4.026 total

I am assuming this might be hitting a timeout somewhere but I can't work out where that might be or how the changes in https://github.com/django/django/pull/15689 might have caused it.

We have for now had to switch to using the http-socket option in uWSGI, but it would be ideal if the http11-socket option also would work, as that is required for uWSGI to support keep-alive.

Change History (13)

comment:1 by Florian Apolloner, 12 months ago

Cc: Florian Apolloner added

Can you compare tcpdumps between the two options, that might provide some indication of what is at fault.

comment:2 by Anders Kaseorg, 12 months ago

Has patch: set
Owner: changed from nobody to Anders Kaseorg
Status: newassigned

The difference is that we stopped setting Content-Length: 0 on these redirects. It is probably a uWSGI bug that removing Content-Length causes uWSGI to delay for socket-timeout seconds in this scenario; however, restoring it is easy and makes this symptom disappear.

https://github.com/django/django/pull/17509

comment:3 by Florian Apolloner, 12 months ago

I am against fixing this if a Content-Length is not required on redirects (which I doubt it is). A bug like this is perfect to expose issues in WSGI implementations.

comment:4 by Florian Apolloner, 12 months ago

Case in point, it allowed me to identify a possible issue in my WSGI changes to hypercorn: https://github.com/pgjones/hypercorn/pull/155/files#r1402530266

comment:5 by Anders Kaseorg, 12 months ago

A redirect response is allowed to have a body, so in order for keep-alive to work, the HTTP server must send either Content-Length or Transfer-Encoding: chunked. It’s less clear that the WSGI application should be responsible for sending it explicitly. I would have expected uWSGI to automatically add Transfer-Encoding if Content-Length is not provided, but evidently it doesn’t.

This is easy to reproduce without Django, so I don’t think we should feel obligated to avoid fixing this just to have a reproduction recipe.

def application(environ, start_response):
    body = b"Hello, world!"
    start_response(
        "200 OK",
        [
            # ("Content-Length", str(len(body))),  # uncomment to fix
        ],
    )
    yield body
Version 0, edited 12 months ago by Anders Kaseorg (next)

comment:6 by Florian Apolloner, 12 months ago

the HTTP server must send either Content-Length or Transfer-Encoding: chunked

I do not think this is true, a third option is most likely to simply close the connection.

It’s less clear that the WSGI application should be responsible for sending it explicitly.

It is explicitly clear that the WSGI application is not required to send this in PEP 3333 (https://peps.python.org/pep-3333/#handling-the-content-length-header):

If the application does not supply a Content-Length header, a server or gateway may choose one of several approaches to handling it. The simplest of these is to close the client connection when the response is completed.

So I would argue that this is a bug in uWSGI. This is also not the first WSGI incompatibility in uWGSI, for instance I fixed a rather annoying issue with close: https://github.com/unbit/uwsgi/commit/1746254fec076bd52e7682187041e82fafad5427

This is easy to reproduce without Django, so I don’t think we should feel obligated to avoid fixing this just to have a reproduction recipe.

The thing is, if we fix this in Django just to workaround about a spec incompatibility in uWSGI (imo), then the next framework has to do the same and we just might hide a harder to fix bug. Fixing this in uWSGI would properly fix this for all frameworks.

comment:7 by Mariusz Felisiak, 12 months ago

Resolution: invalid
Status: assignedclosed

I agree with Florian that it's not an issue in Django itself. This should be reported and fixes in uWSGI.

comment:8 by Anders Kaseorg, 12 months ago

So let’s fix it in both Django and uWSGI? Even after uWSGI is fixed, we want Django to send Content-Length anyway when possible, because Content-Length is more efficient than the alternatives, and uWSGI isn’t in a position to generate Content-Length because HttpResponse doesn’t have a len().

comment:9 by Florian Apolloner, 12 months ago

I just looked at the PR and there might be a point to move https://github.com/andersk/django/blob/de1ee5ee0ba51229ec8ceb7146173533f45ace51/django/middleware/common.py#L114 into our WSGI handler because the middleware does not have to be active. Whether this would mean that HTTPResponse should have a __len__ or we simply add logic in the wsgi handler, I am not sure yet. The PR as it stands now is not the proper fix though I think.

comment:10 by Anders Kaseorg, 12 months ago

To be clear, uWSGI doesn’t currently implement the __len__ optimization and the spec doesn’t require it. So that’s another path that would require changing both Django and uWSGI, with the disadvantage that only some WSGI gateways would benefit.

comment:11 by Florian Apolloner, 12 months ago

Resolution: invalid
Status: closednew

Yeah, in general I'd expect the server to optimize where possible, so it is rather surprising that uWSGI doesn't do that.

It is unfortunate that HttpResponse does not have a __len__ and we cannot monkey patch it onto it while also supporting streaming via generators without a known length because Python does something along the lines of type(x).__len__(obj), so even if hasattr() sees __len__, calling len() on it works only if the class has __len__.

I also guess we might not exactly be able to get rid of setting the Content-Length in the CommonMiddleware because we document it: https://docs.djangoproject.com/en/4.2/ref/middleware/#module-django.middleware.common

So the way I see it we have two options:

  • Improve the CommonMiddleware as in your PR (I am saying improve and not fix, because technically Django does nothing spec incompatible, so it is rather a performance improvement. No matter how we spin it, there is a deeper bug in uWSGI I think -- this also means there will be no backports).
  • Also add a similar code to our WSGI/ASGI handlers and set the Content-Length where possible. I am in favor of doing this, since Anders is right that a server cannot optimize given that HttpResponse has no __len__. I am slighty worried about backwards compatibility here though, but I honestly cannot think of a case where setting the Content-Length would cause problems.

As such I am preliminarily reopening the ticket and if Mariusz is fine with it let's set it to accepted. The title etc will be a little bit off then, but I guess that is fine.

Would you mind opening a ticket against uWSGI and cross-link it here?

comment:12 by Mariusz Felisiak, 12 months ago

Has patch: unset
Summary: APPEND_SLASH redirects much slower in Django 4.2 when used with uWSGI's http11-socketSet Content-Length where possible for HttpResponses.
Triage Stage: UnreviewedAccepted
Type: BugCleanup/optimization

Tentatively accepted.

comment:13 by Florian Apolloner, 11 months ago

I have been digging a bit more in the RFC and I think setting the Content-Length to 0 is actually preferred since it indicates that there is really no content (which a redirect is allowed to have).

See https://www.rfc-editor.org/rfc/rfc9110#field.content-length:

Aside from the cases defined above, in the absence of Transfer-Encoding, an origin server SHOULD send a Content-Length header field when the content size is known prior to sending the complete header section. This will allow downstream recipients to measure transfer progress, know when a received message is complete, and potentially reuse the connection for additional requests.

Note: See TracTickets for help on using tickets.
Back to Top