Opened 3 years ago

Last modified 12 months ago

#18855 assigned New feature

persist a socket across reloads of the dev server

Reported by: dlamotte Owned by: dlamotte
Component: HTTP handling Version: master
Severity: Normal Keywords:
Cc: unai@…, apollo13 Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Using livereload (livereload.com) with Django becomes painful when updating a file immediately results in reloading the webpage AND the Django dev server. There is a short period of time when the dev server is not listening as it is busy reloading (frequently hit when using livereload).

This is exacerbated with larger projects as reload time is longer.

This has been an itch I've wanted to scratch for a long time. Hopefully, this will be accepted so others can benefit from it also. Helps plenty even with users hitting Reload manually on their browser.

Change History (19)

comment:1 Changed 3 years ago by dlamotte

  • Needs documentation unset
  • Needs tests unset
  • Owner changed from nobody to dlamotte
  • Patch needs improvement unset
  • Status changed from new to assigned

comment:2 Changed 3 years ago by dlamotte

  • Has patch set

comment:3 follow-up: Changed 3 years ago by akaariai

  • Triage Stage changed from Unreviewed to Accepted
  • Version changed from 1.4 to master

It seems this could only be done on Unix platforms (at least socket.fromfd is documented to be available only on Unix). The best option for platform support is to write the code in a way where the code works when needed socket abilities are available, but if they are not, then we still get a working server, just without socket persistence capabilities.

If I understand the patch correctly, the socket is first created in the main process. Then, it is passed to the server subprocess as os.environ parameter. Finally the server subprocess will try to use the already existing socket by using socket.fromfd().

So, in addition to the above platform problem, at least these possible limitations come to mind:

  • Could we leak the socket's fileno in to os.environ where it is accidentally reused later on? Could we pass it to the server subprocess some other way?
  • It seems we will need to have socket creation logic in Django's code. We don't have that now. This might cause us some headaches later on.
  • Could the socket be left in a state where it contains cruft from the old process (this might be a stupid question, I don't know socket semantics too well)? What I mean here is that maybe the server process reads half of an request, gets reloaded, and then the new process has a broken half-request waiting for it? The same could happen for write-to-client, too, if I am not mistaken.

I would like to get rid of the unavailability of the dev server during reload, no question about that, but we should not commit this unless we have pretty good guarantees that this will cause zero regressions on any supported platforms. This is not worth risking breakages. I am marking this accepted, but this is only on the grounds that we can make this bulletproof.

comment:4 Changed 3 years ago by dlamotte

Thanks for the review. I've updated the pull request to address the platform issue.

I dont know the platforms supported by Django, but I added a check to the code to test if "fromfd" exists in the socket module. If it does, it will use the new persist socket code. If not, the code will not be used. I believe its the case that if "fromfd" is not supported, it will not exist, however, I don't have a host to test this on.

Yes, the socket needs to be first created in the main process that survives the reload process. This way, the socket stays open and accepting connections.

  • Not sure what you mean by leak? The child process that gets reloaded would be the only process that could get the FD. The spawnve called in the autoreloader ensures that the child's FDs are identical to that of its parent, but besides this, its an entirely new process (it calls exec or something similar for the platform). This means that no other process state is saved from the parent. We can either pass the FD number via the environment OR via the command line. It appears the autoreload module uses an environment variable "RUN_MAIN" to tell the child process that it should run the HTTP server. I think using another environment variable is suitable to pass the FD also.
  • I had not realized this. The socket is only created now, its still configured the way it was before. This should make it less risky IMO.
  • I don't believe it introduces any new risks. Specifically, a "half write" could happen with the current implementation if its not handled (I'm not familiar enough to know if that's the case). This would not change that AFAIK. The child process would (during cleanup due to exit) will close the client's connected socket just as if there wasn't a persisted socket. As for reading a partial response, I believe we're safe from this simply because the .accept() on this main server socket returns a client socket which handles the connection. If it only reads part of the response and the connection is interrupted due to reload, the child exiting will close the socket and the pending data will be discarded by the kernel. It will not remain pending on the server socket (this is guaranteed by TCP I believe, I don't believe this would be the case with something like UDP for instance).

I'll do some looking into writing some tests along with this commit. It'll be significantly interesting/complicated to test I think because of the inherent race conditions involved. Any advice or ideas are welcomed.

Thanks again for your review and acceptance of this. I'll update this request when I've looked into writing the tests and/or have written some.

comment:5 in reply to: ↑ 3 Changed 3 years ago by bhuztez

I think this is duplicate of #16049

comment:6 Changed 3 years ago by dlamotte

Yes. I didn't see that one before opening this one. However, at the moment, this ticket has a corresponding patch/pull request. Its missing tests, but I plan on fixing that. Also, I have been using the patch in my own development with success and zero issues.

comment:7 Changed 2 years ago by aaugustin

  • Needs documentation set
  • Needs tests set

This is very interesting. To move forwards it just needs:

  • docs — including an explanation that it doesn't work on Windows with Python 2 (see discussion on #16049)
  • tests — or an explanation of why it's impossible to test

In the meantime, I've closed the pull request, because our triage options on GitHub are limited and we don't keep pull requests with partial patches.

Please re-open it or make a new PR when you get a chance to make these improvements!

comment:8 Changed 2 years ago by dlamotte

Sorry this has been inactive so long. Life has been pretty busy. The testing part is a bit challenging simply because it's very "race-condition-y". I have used the patch in development since I initially wrote it (and I write code everyday with Django) so I'm very certain it works without regressions, however, I can understand adding tests to the code base.

I'll get going on the docs.

comment:9 Changed 2 years ago by dlamotte

Created new pull request: https://github.com/django/django/pull/893

Added documentation and brought the branch up to date with the latest master branch.

I explain in the pull request why I think it'll be ok to not add tests (given the difficulty). But if there are suggestions, I can surely look into them and update the pull request.

comment:10 Changed 2 years ago by dlamotte

I think this is slipping through the cracks. This is my first contribution to Django, so I'm probably making this harder than it needs to be. Sorry about that.

Just wanted to check in and see what else I can do to get this into 1.6 (hopefully).

Thanks much for your help and patience.

comment:11 Changed 20 months ago by unaizalakain

  • Cc unai@… added

comment:12 Changed 20 months ago by unaizalakain

  • Needs documentation unset
  • Needs tests unset
  • Triage Stage changed from Accepted to Ready for checkin

LGTM

comment:13 Changed 19 months ago by apollo13

  • Cc apollo13 added

What's the reason for adding an option to disable that feature? I can't imagine why someone would like to disable it if it works.

comment:14 Changed 19 months ago by dlamotte

Could have sworn I added it because I was asked to, but cannot seem to find that at the moment. I guess it's only there so if for some reason its causing more problems than it solves for that user, they can disable it.

comment:15 Changed 17 months ago by timo

  • Patch needs improvement set
  • Triage Stage changed from Ready for checkin to Accepted

I wonder if our time would be better spent in efforts like #21978 -- "Add optional gunicorn support to runserver", rather than trying to enhance our own HTTP server?

In any case, the patch no longer merges cleanly according to GitHub so bumping off RFC.

comment:16 Changed 17 months ago by dlamotte

  • Triage Stage changed from Accepted to Ready for checkin

I've just updated the pull request to rebase off of the latest master. Please re-consider this patch.

I think the ticket you reference is great, however, this patch is ready now and it addresses a common problem that many would appreciate being fixed.

comment:17 Changed 17 months ago by apollo13

  • Triage Stage changed from Ready for checkin to Accepted

Did you actually check that this works fine on Python3.4?

comment:18 Changed 14 months ago by unaizalakain

Once addressed the notes added to the PR patch, it works like a charm with 3.4

comment:19 Changed 12 months ago by timo

As noted on the pull request, this causes Address already in use errors locally when running ./runtests.py admin_widgets --selenium --liveserver=localhost:8090-8100 in parallel (as well as on Jenkins). It seems that the alternate ports are never tried.

Last edited 12 months ago by timo (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top