Opened 12 years ago
Last modified 4 years ago
#18855 new New feature
persist a socket across reloads of the dev server
Reported by: | Dan LaMotte | Owned by: | |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | unai@…, Florian Apolloner | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Using livereload (livereload.com) with Django becomes painful when updating a file immediately results in reloading the webpage AND the Django dev server. There is a short period of time when the dev server is not listening as it is busy reloading (frequently hit when using livereload).
This is exacerbated with larger projects as reload time is longer.
This has been an itch I've wanted to scratch for a long time. Hopefully, this will be accepted so others can benefit from it also. Helps plenty even with users hitting Reload manually on their browser.
Change History (20)
comment:1 by , 12 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:2 by , 12 years ago
Has patch: | set |
---|
follow-up: 5 comment:3 by , 12 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Version: | 1.4 → master |
It seems this could only be done on Unix platforms (at least socket.fromfd is documented to be available only on Unix). The best option for platform support is to write the code in a way where the code works when needed socket abilities are available, but if they are not, then we still get a working server, just without socket persistence capabilities.
If I understand the patch correctly, the socket is first created in the main process. Then, it is passed to the server subprocess as os.environ parameter. Finally the server subprocess will try to use the already existing socket by using socket.fromfd().
So, in addition to the above platform problem, at least these possible limitations come to mind:
- Could we leak the socket's fileno in to os.environ where it is accidentally reused later on? Could we pass it to the server subprocess some other way?
- It seems we will need to have socket creation logic in Django's code. We don't have that now. This might cause us some headaches later on.
- Could the socket be left in a state where it contains cruft from the old process (this might be a stupid question, I don't know socket semantics too well)? What I mean here is that maybe the server process reads half of an request, gets reloaded, and then the new process has a broken half-request waiting for it? The same could happen for write-to-client, too, if I am not mistaken.
I would like to get rid of the unavailability of the dev server during reload, no question about that, but we should not commit this unless we have pretty good guarantees that this will cause zero regressions on any supported platforms. This is not worth risking breakages. I am marking this accepted, but this is only on the grounds that we can make this bulletproof.
comment:4 by , 12 years ago
Thanks for the review. I've updated the pull request to address the platform issue.
I dont know the platforms supported by Django, but I added a check to the code to test if "fromfd" exists in the socket module. If it does, it will use the new persist socket code. If not, the code will not be used. I believe its the case that if "fromfd" is not supported, it will not exist, however, I don't have a host to test this on.
Yes, the socket needs to be first created in the main process that survives the reload process. This way, the socket stays open and accepting connections.
- Not sure what you mean by leak? The child process that gets reloaded would be the only process that could get the FD. The
spawnve
called in the autoreloader ensures that the child's FDs are identical to that of its parent, but besides this, its an entirely new process (it calls exec or something similar for the platform). This means that no other process state is saved from the parent. We can either pass the FD number via the environment OR via the command line. It appears the autoreload module uses an environment variable "RUN_MAIN" to tell the child process that it should run the HTTP server. I think using another environment variable is suitable to pass the FD also. - I had not realized this. The socket is only created now, its still configured the way it was before. This should make it less risky IMO.
- I don't believe it introduces any new risks. Specifically, a "half write" could happen with the current implementation if its not handled (I'm not familiar enough to know if that's the case). This would not change that AFAIK. The child process would (during cleanup due to exit) will close the client's connected socket just as if there wasn't a persisted socket. As for reading a partial response, I believe we're safe from this simply because the
.accept()
on this main server socket returns a client socket which handles the connection. If it only reads part of the response and the connection is interrupted due to reload, the child exiting will close the socket and the pending data will be discarded by the kernel. It will not remain pending on the server socket (this is guaranteed by TCP I believe, I don't believe this would be the case with something like UDP for instance).
I'll do some looking into writing some tests along with this commit. It'll be significantly interesting/complicated to test I think because of the inherent race conditions involved. Any advice or ideas are welcomed.
Thanks again for your review and acceptance of this. I'll update this request when I've looked into writing the tests and/or have written some.
comment:6 by , 12 years ago
Yes. I didn't see that one before opening this one. However, at the moment, this ticket has a corresponding patch/pull request. Its missing tests, but I plan on fixing that. Also, I have been using the patch in my own development with success and zero issues.
comment:7 by , 12 years ago
Needs documentation: | set |
---|---|
Needs tests: | set |
This is very interesting. To move forwards it just needs:
- docs — including an explanation that it doesn't work on Windows with Python 2 (see discussion on #16049)
- tests — or an explanation of why it's impossible to test
In the meantime, I've closed the pull request, because our triage options on GitHub are limited and we don't keep pull requests with partial patches.
Please re-open it or make a new PR when you get a chance to make these improvements!
comment:8 by , 12 years ago
Sorry this has been inactive so long. Life has been pretty busy. The testing part is a bit challenging simply because it's very "race-condition-y". I have used the patch in development since I initially wrote it (and I write code everyday with Django) so I'm very certain it works without regressions, however, I can understand adding tests to the code base.
I'll get going on the docs.
comment:9 by , 12 years ago
Created new pull request: https://github.com/django/django/pull/893
Added documentation and brought the branch up to date with the latest master branch.
I explain in the pull request why I think it'll be ok to not add tests (given the difficulty). But if there are suggestions, I can surely look into them and update the pull request.
comment:10 by , 11 years ago
I think this is slipping through the cracks. This is my first contribution to Django, so I'm probably making this harder than it needs to be. Sorry about that.
Just wanted to check in and see what else I can do to get this into 1.6 (hopefully).
Thanks much for your help and patience.
comment:11 by , 11 years ago
Cc: | added |
---|
comment:12 by , 11 years ago
Needs documentation: | unset |
---|---|
Needs tests: | unset |
Triage Stage: | Accepted → Ready for checkin |
LGTM
comment:13 by , 11 years ago
Cc: | added |
---|
What's the reason for adding an option to disable that feature? I can't imagine why someone would like to disable it if it works.
comment:14 by , 11 years ago
Could have sworn I added it because I was asked to, but cannot seem to find that at the moment. I guess it's only there so if for some reason its causing more problems than it solves for that user, they can disable it.
comment:15 by , 11 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
I wonder if our time would be better spent in efforts like #21978 -- "Add optional gunicorn support to runserver", rather than trying to enhance our own HTTP server?
In any case, the patch no longer merges cleanly according to GitHub so bumping off RFC.
comment:16 by , 11 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
I've just updated the pull request to rebase off of the latest master. Please re-consider this patch.
I think the ticket you reference is great, however, this patch is ready now and it addresses a common problem that many would appreciate being fixed.
comment:17 by , 11 years ago
Triage Stage: | Ready for checkin → Accepted |
---|
Did you actually check that this works fine on Python3.4?
comment:18 by , 10 years ago
Once addressed the notes added to the PR patch, it works like a charm with 3.4
comment:19 by , 10 years ago
As noted on the pull request, this causes Address already in use
errors locally when running ./runtests.py admin_widgets --selenium --liveserver=localhost:8090-8100
in parallel (as well as on Jenkins). It seems that the alternate ports are never tried.
comment:20 by , 4 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
Sent pull request: https://github.com/django/django/pull/304