Opened 13 years ago

Closed 12 years ago

Last modified 12 years ago

#16865 closed Bug (wontfix)

get_or_create defaults to _for_write even when it's just reading

Reported by: Rick van Hattem <Rick.van.Hattem@…> Owned by: nobody
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords:
Cc: v.oostveen@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

First a little background information. I am currently using an asynchronous master slave(s) setup which requires me to pin a request to the master as soon as Django does any writes.

This results in all code reading from the slaves when possible, and switching to the master as soon as data is written.

The problem comes with the get_or_create method. Most of the code that uses get_or_create expects to get a result and uses the create as a fallback if it doesn't exist. The result of this is that (atleast in all the cases where we use get_or_create), the get_or_create method is 99% reads and only a tiny bit of writes. With the current code however, Django defaults to using the write database as soon as get_or_create is called before ever executing an insert.

The link to the line that is causing this:
https://code.djangoproject.com/browser/django/trunk/django/db/models/query.py#L407

Was this done by design? If so, please explain the reasoning behind it.
Otherwise I would classify this as a bug and in that case it would be a good idea to move the self._for_write = True line to the except self.model.DoesNotExist: block where the actual writes happen.

Attachments (4)

Change History (22)

comment:1 by Carl Meyer, 13 years ago

Triage Stage: UnreviewedAccepted

This seems reasonable; the downside is if you first read from the read slave it might increase the likelihood of collisions (the IntegrityError case further down) due to replication lag. Still seems like that might be an OK exchange for getting the majority of reads onto the read slave; it's a reasonable assumption that the common case for most uses of get_or_create is that the object already exists.

Tentatively accepting, but I'd like to hear from Alex or Russell, who are more familiar with the multi-db stuff.

comment:2 by Calvin Spealman, 13 years ago

I think this would be a bad idea. get_or_create is inherently a write-operation and as such is currently doing the correct thing. Changing this behavior would break a lot of existing code and damage the stability and usefulness of get_o_create.

comment:3 by Dan Poirier, 13 years ago

There's too great a chance that the slave DB is a little out of date, get_or_create sees no record, tries to create it, and then fails because it already exists in the master DB.

get_or_create() is more designed for data imports than to be used in every request. If 99% of the time the record is expected to be there, and performance is critical, why not use a get() and catch the exception and call get_or_create() in those rare cases.

comment:4 by Karen Tracey, 13 years ago

I think the suggestion here has been misunderstood (at any rate I misunderstood it when overhearing part of a conversation about it...sorry!) and that the suggested change would not break anything. The suggestion is to simply move the _for_write=True down to where a write is actually going to be attempted. Yes, that means the initial attempt to get may falsely say "not here!" when in fact the instance has already been created in one DB but not propagated to the DB chosen for the initial get attempt. But the subsequent code that tries the create already handles the case where the create fails due to the object already existing: it re-attempts the get, which at that point will query the same database as the write was attempted on and will succeed (assuming this code works at all, which it doesn't on MySQL/InnoDB with repeatable read transaction isolation level, but that's an entirely different ticket).

comment:5 by Wolph, 13 years ago

@calvinspealman: it is called get_or_create isn't it? All of the Django code that uses it makes the assumption that it is always available and just falls back to creating if it is indeed new. And as you can see in the code, it is implemented as such. It does a get with a fallback to create. In your case it would be a create with a fallback to get if you get an integrity error.

@kmtracey: You are right, it all depends on your replication method which it why this code has been working flawlessly for me so far. Regardless, the current system definately breaks any kind of master slave setup which is just silly.

Perhaps the current database routers need an extra method for that. Right now we have the db_for_write and db_for_read but in this case it might be useful to have a db_for_realtime_read or something. Assuming that the database which you write to is also the one you want to read from is not in all cases right.

comment:6 by Russell Keith-Magee, 13 years ago

@carljm has correctly hit the reason this was done -- get_or_create needs to get a reliable answer as to whether the object exists, and due to the potential for replication lag, the most reliable answer comes from the write source, not the read source.

That said, I can certainly see the value in moving the *initial* read load off the maste, and from the looks of it, @kmtracey's analysis is correct -- as long as the fallback read is performed on the master, we should get the benefits of a distributed initial read, while retaining a reliable write.

comment:7 by noria, 13 years ago

Has patch: set

This patch do what we can expect:
if the object does not exist on the first db (read db) and if the read db is different than the write db,
then we retry the get() on the write db,
else we re-raise a new DoesNotExist exception.

I prefer if someone who is 'experimented' could check this.

Version 0, edited 13 years ago by noria (next)

comment:8 by trbs, 13 years ago

Cc: v.oostveen@… added

Isn't this patch a bit more complicated then it's need to be ?

When we just move down self._for_write = True from the outer most try-except one level then we would have the same effect right ?

It will try:

  • a read from the slave (or whatever choice is made by the router)
  • try to insert the new record on the master
  • retry the get operation but this time from the master explicitly since self._for_write is set.

And we save doing second get() on the master since it would make sense that most of the time the slaves are up-to-date and so that there is no master record.

The drawback of that approach is that a transaction is made and executed in the case that the slave was lagging and the object already exists. (this could easily happen in a situation where a context processes or something is executing some code that does the get_or_create and when the page loads you request multiple pages doing the same get_or_create. First request actually creates the object but the slaves have not caught up a ms later that the other request for the same page come in. If you have a high traffic site you might want to safe on transactions and opt for more selects instead.)

comment:9 by trbs, 12 years ago

Added patch, patch_16865_get_or_create_move_for_write.diff​, which moves the _for_write to where changes are (potentially) are made to the database.

Since this patch is extremely simple I hope it can be accepted so we can some progress on this ticket.

comment:10 by trbs, 12 years ago

Updated patch with tests.

comment:11 by Wolph, 12 years ago

This patch is the exact same as we have been using at Fashiolista.com for a year now. This should indicate at least some stability since Fashiolista.com is probably one of the largest Django sites around.

comment:12 by trbs, 12 years ago

Updated the patch with some documentation to docs/refs/models/queryset.txt about this.

FYI: I have been using this patch in production for about 7 months now, which include some e-commerce and high traffic gaming sites.

comment:13 by Carl Meyer <carl@…>, 12 years ago

Resolution: fixed
Status: newclosed

In 901af865505310a70dd02ea5b3becbf45819b652:

Fixed #16865 -- Made get_or_create use read database for initial get query.

Thanks Rick van Hattem for the report and trbs for the patch.

comment:14 by Carl Meyer <carl@…>, 12 years ago

In 4e9a74b81df1c7aaea2f90a3a4911920e134b275:

Revert "Fixed #16865 -- Made get_or_create use read database for initial get query."

Thanks to Jeremy Dunck for pointing out the problem with this change. If in a
single transaction, the master deletes a record and then get_or_creates a
similar record, under the new behavior the get_or_create would find the record
in the slave db and fail to re-create it, leaving the record nonexistent, which
violates the contract of get_or_create that the record should always exist
afterwards. We need to do everything against the master here in order to ensure
correctness.

This reverts commit 901af865505310a70dd02ea5b3becbf45819b652.

comment:15 by Carl Meyer, 12 years ago

Resolution: fixed
Status: closedreopened

comment:16 by Carl Meyer, 12 years ago

Resolution: wontfix
Status: reopenedclosed

Closing as wontfix, as after discussion with jdunck I no longer see how we can do this and still guarantee correctness. We can reopen if somebody can make a proposal that addresses the type of problem outlined above.

in reply to:  16 comment:17 by trbs, 12 years ago

Replying to carljm:

Closing as wontfix, as after discussion with jdunck I no longer see how we can do this and still guarantee correctness. We can reopen if somebody can make a proposal that addresses the type of problem outlined above.

I agree I don't see an easy way out of this problem. Pretty sure me and the reporter of this ticket only use get_or_create() in situation where we never delete the object that is potentially created by get_or_create().

Could we at least put a little hint in the documentation of get_or_create() that notes that this function will always use the for-write database via db routers for consistency ? That way future people are warned not to expect this to scale over multiple databases.

comment:18 by Wolph, 12 years ago

If this case is such a problem, wouldn't a setting fix it? Or a parameter for the get_or_create method?

There has to be some way to enable this within the ORM. Having to write your own try/except every time you use it (especially with the required savepoints) is quite the hassle.

Note: See TracTickets for help on using tickets.
Back to Top