Opened 5 years ago

Closed 5 years ago

#25432 closed Bug (invalid)

Django ORM race condition

Reported by: Yuval Adam Owned by: nobody
Component: Database layer (models, ORM) Version: 1.8
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I've hit an interesting problem that isn't covered by the current Django documentation, and might even be a bug that Django can handle better. It started off as a SO question at http://stackoverflow.com/q/32661885/24545 but here's the gist of it.

After creating a new object MyModel.objects.create(foo=goo) and inserting it into the database, it is possible that immediate subsequent calls to fetch that object might fail (i.e. MyModel.objects.get(foo=goo) will throw DoesNotExist). I have seen this happen in a test case where I make two subsequent API calls that do exactly this and got a ~5% failure rate.

In most cases this might not be a problem, but I am using this query to make sure I'm not creating two duplicate objects. This is essentially an UPSERT problem. In my case, my solution was to set unique=True on my foo field and attempt to create the object in any case, which will naturally fail on a duplicate, then I just catch the IntegrityError and fail gracefully. In this we we use DB semantics which guarantees no duplicates.

The relevant settings for this test case are: PostgreSQL, default Django transaction settings and no specific caching.

So 2 questions here:

  1. What happens if my application requires that any two transactions a and b behave such that b always sees fresh data that was written in a? How can I enforce this in Django?
  2. How do we document this behavior in a better way? If this is possible or impossible, Django must be clearer on how such transactions are handled.

Change History (10)

comment:1 Changed 5 years ago by Yuval Adam

comment:2 Changed 5 years ago by Aymeric Augustin

Is it possible that you're writing to the primary and reading from a replica? In that case all you're seeing is the replication delay. That would explain why you can't reproduce locally.

comment:3 in reply to:  2 Changed 5 years ago by Yuval Adam

Replying to aaugustin:

Is it possible that you're writing to the primary and reading from a replica? In that case all you're seeing is the replication delay. That would explain why you can't reproduce locally.

Nope, there's only a single Postgres database at play.

comment:4 Changed 5 years ago by Aymeric Augustin

Just to be clear, the scenario is the following:

  • you make the first API request
  • the server processes this request, performs some database operations, commits the database transaction, sends back the response
  • you receive and process the response to the first request
  • you make the second API request from the same thread that made the first request
  • the server cannot see the data written by the first request

You question almost looks like the answer should involve isolation levels, specifically the SERIALIZABLE level, but if my description above is correct that cannot explain the behavior you're describing.

comment:5 in reply to:  4 Changed 5 years ago by Yuval Adam

Replying to aaugustin:

Just to be clear, the scenario is the following:

Yes, exactly.

comment:6 Changed 5 years ago by Aymeric Augustin

The SERIALIZABLE isolation level is the answer to question 1. at the bottom of your bug report, though.

Regarding question 2. we don't want to document it because too many people will think it's a good idea without understanding the consequences and because it breaks some APIs e.g. get_or_create doesn't work anymore (I think).

comment:7 Changed 5 years ago by Aymeric Augustin

Actually making both requests from the same thread doesn't matter because they may be handled by different processes on the server.

Even at the SERIALIZABLE isolation level, the database is allowed to serialize transactions made on different connections in any consistent order.

I think this is part of the consistency model of multi-processes databases servers, which are distributed systems even if they run on one server

I don't think there's anything specific to Django here. You should be able to reproduce this behavior with plain WSGI & pyscopg2.

comment:8 Changed 5 years ago by Shai Berger

I have seen this happen in a test case where I make two subsequent API calls that do exactly this and got a ~5% failure rate.

Can you elaborate on that a little? In particular, is the test case using Django's test-client for these calls? If it is, there's one thread on the server doing everything, and it all happens in the same transaction; I'd ask you to check again about caching and replication and that sort of thing.

If it isn't the test-client (or even the Django test framework), I'd ask you to verify again that the second request is only sent after full processing of the first request has completed.

Either way, please also verify that your PostgreSQL configuration is sane. I've seen recommendations to make testing faster by turning off its fsync etc, and that could be the cause of such behavior.

On a side note: I'm pretty sure get_or_create only works reliably under serializable transactions; under the default (read-committed) isolation level there are failure scenarios, whether you try to get first or create first.

comment:9 in reply to:  8 Changed 5 years ago by Yuval Adam

Replying to shaib:

Can you elaborate on that a little? In particular, is the test case using Django's test-client for these calls? If it is, there's one thread on the server doing everything, and it all happens in the same transaction; I'd ask you to check again about caching and replication and that sort of thing.

This isn't Django test client, I'm calling from any external process (specifically I'm testing with curl from command line).

If it isn't the test-client (or even the Django test framework), I'd ask you to verify again that the second request is only sent after full processing of the first request has completed.

Yep, I'm using curl http://example.com/foo && curl http://example.com/foo

Either way, please also verify that your PostgreSQL configuration is sane. I've seen recommendations to make testing faster by turning off its fsync etc, and that could be the cause of such behavior.

This is happening on Heroku Postgres. Any recommendations on how to tinker with the config there? Is it known to be problematic?

On a side note: I'm pretty sure get_or_create only works reliably under serializable transactions; under the default (read-committed) isolation level there are failure scenarios, whether you try to get first or create first.

comment:10 Changed 5 years ago by Aymeric Augustin

Resolution: invalid
Status: newclosed

This is happening on Heroku Postgres.

You should ask the Heroku Postgres support.

I'm going to close this ticket because there is no evidence of a bug in Django.

You can't reproduce with a local Postgres; you can reproduce with Heroku Postgres; so start there :-)

Note: See TracTickets for help on using tickets.
Back to Top