Opened 2 years ago

Last modified 6 weeks ago

#20147 new New feature

Provide an alternative to request.META for accessing HTTP headers

Reported by: lukeplant Owned by: nobody
Component: HTTP handling Version: 1.5
Severity: Normal Keywords:
Cc: marc.tamlyn@…, tom@…, ben@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

From the docs:

HttpRequest.META
A standard Python dictionary containing all available HTTP headers...

With the exception of CONTENT_LENGTH and CONTENT_TYPE, as given above, any HTTP headers in the request are converted to META keys by converting all characters to uppercase, replacing any hyphens with underscores and adding an HTTP_ prefix to the name. So, for example, a header called X-Bender would be mapped to the META key HTTP_X_BENDER.

The question is, why? Why do we have this ridiculous transform? It is pure silliness, whose only explanation is a quirk of CGI, which is now totally irrelevant.

You should be able to look up a header in the HTTP spec and do something very simple to get it from the HTTP request. How about this API:

request.HEADERS['Host']

(for consistency with GET/POST/FILES etc.), or even

request['Host']

Dictionary access should obey HTTP rules about case-sensitivity of the header names.

This also would has the advantage that repr(request) wouldn't have lots of junk you don't need i.e. the entire content of os.environ, which, on a developer machine especially, can have a lot of noise (mine does).

It also future-proofs us for when WSGI is replaced with something more sensible, and the whole silly round trip to os.environ can be removed completely, or if we want to support something else parallel to WSGI and client code wants to access HTTP headers in the same way for both.

This leaves a few things in META that are not derived from an HTTP header, and do not have a way of accessing them from the request object. I think these are just:

  • SCRIPT_NAME - this is a CGI leftover, that is only useful in constructing other things, AFAICS
  • QUERY_STRING - this can be easily constructed from request.get_full_path() for the rare times that you need the raw query string rather than request.GET
  • SERVER_NAME - should use get_host() instead
  • SERVER_PORT - use get_host()
  • SERVER_PROTOCOL - could use is_secure(), but perhaps it would be nice to have a convenience get_protocol() method.

(see http://wsgi.readthedocs.org/en/latest/definitions.html)

Change History (15)

comment:1 Changed 2 years ago by lukeplant

A strong argument against the request['Referer'] API is the use of request in templates (e.g. if request.GET.some_flag), which conflates dictionary access and attribute access, probably making request.HEADERS['Referer'] a much safer API.

comment:2 Changed 2 years ago by anonymous

HTTP headers are case insensitive. You want to get rid of the transform, but what happens when someone sends "accept: " and you check for HEADERS["Accept"]?

Last edited 2 years ago by lukeplant (previous) (diff)

comment:3 Changed 2 years ago by lukeplant

As stated above, "Dictionary access should obey HTTP rules about case-sensitivity of the header names."

I didn't say get rid of the transform - it should be done within the API, not by the user of the API. In terms of implementation, request.HEADERS['Accept'] will map straight to request._META['HTTP_ACCEPT'], at least for wsgi, or do something equivalent that will ensure case-insensitivity.

comment:4 Changed 2 years ago by lukeplant

There are a few more things that need considering if this is to be done:

  • RequestFactory and the test Client, and their APIs which pass directly to request.META.
  • REMOTE_ADDRESS, REMOTE_USER
  • SECURE_PROXY_SSL_HEADER

comment:5 Changed 2 years ago by carljm

Minor bikeshed-type question: is there really value in making request.HEADERS all-caps? I realize the parallel to request.POST, request.GET, and request.META, but the former two are all-caps simply because HTTP methods are usually written that way. I guess I'd just like to see a bit of rationale spelled out for how we decide whether a given request attribute ought to be all-caps; I'd probably lean towards just request.headers for the new API.

More discussion of this proposal (in particular, whether to deprecate/change request.META) is here: https://groups.google.com/d/topic/django-developers/Jvs3F79cY4Y/discussion

comment:6 Changed 2 years ago by mjtamlyn

  • Cc marc.tamlyn@… added

It would be consistent for request.headers to be lowercase to match up with request.body for example.

comment:7 Changed 2 years ago by anonymous

Should we consider having request.headers return unicode values rather than byte values?

Correctly decoding HTTP headers is slightly fiddly - the default supported encoding is iso-8859-1,
but utf-8 can also be supported as per RFC 2231, RFC 5987.

Getting the decoding right probably isn't something we want developers to have to think about.

Note: For real-world usage see this example of browser support for utf-8 in uploaded filenames: https://code.google.com/p/chromium/issues/detail?id=57830

comment:8 Changed 2 years ago by tomchristie

  • Cc tom@… added

(Ooops, that anonymous comment is mine.)

comment:9 Changed 2 years ago by tomchristie

Okay, noticed that the link to chrome's use of iso-8859-1 is actually for response headers, so disregard that.

The question regarding unicode vs byte values still stands, though.

comment:10 Changed 2 years ago by lukeplant

I'm happy with request.headers instead of request.HEADERS - the parallel to request.body does make more sense that request.GET.

Regarding unicode/bytes, it's a very thorny issue, and the more I look into it the worse it gets. PEP 3333 might apply, if we are assuming a simple mapping to request.META, but that essentially leaves decoding issues to the user if I'm reading it correctly.

comment:11 Changed 2 years ago by tomchristie

Okay, maybe it's not obvious if unicode values would be preferable or not.

I thought I'd take a look at what the requests library does, and found this similar ticket: https://github.com/kennethreitz/requests/pull/1181

If it is something that we decide to do, then the following looks like it ought to do the trick:

from email.header import decode_header
u''.join(header_bytes.decode(enc or 'iso-8859-1') for header_bytes, enc in decode_header(h))

For further reference note that the httpbis spec is proposed to obsolete RFC2616, cleaning up & clarifying underspecified bits of the spec.
The relevant section on header value encoding is here: http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-19#section-3.2.2

comment:12 Changed 2 years ago by aaugustin

  • Summary changed from Replace and deprecate request.META for HTTP headers to Provide an alternative to request.META for accessing HTTP headers
  • Triage Stage changed from Unreviewed to Accepted

The mailing list discussion converged towards keeping META, but recommending a dict-like request.headers.

I'm updating the summary to reflect this.

comment:13 Changed 2 years ago by astupidog

Regarding the transformation of request headers, for example from X-Bender to the META key HTTP_X_BENDER -

From what I see this transformation is not done in django but in the wsgi implementation.

I tested with apache mod_wsgi and with python's wsgiref and seems that they are doing this transformation not django.

I couldn't find it documented anywhere but see this from python's Lib/wsgiref/simple_server.py

99 for h in self.headers.headers:
100 k,v = h.split(':',1)
101 k=k.replace('-','_').upper(); v=v.strip()
102 if k in env:
103 continue # skip content length, type,etc.
104 if 'HTTP_'+k in env:
105 env['HTTP_'+k] += ','+v # comma-separate multiple headers
106 else:
107 env['HTTP_'+k] = v

comment:14 Changed 6 weeks ago by benspaulding

  • Cc ben@… added

comment:15 Changed 6 weeks ago by timgraham

See #16068 for a duplicate.

Note: See TracTickets for help on using tickets.
Back to Top