Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#29794 closed Bug (invalid)

Duplicate object returned using filter

Reported by: Lars Solberg Owned by: nobody
Component: Database layer (models, ORM) Version: 2.1
Severity: Normal Keywords: duplicate, vacuum
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Note: I have am unable to reproduce this issue, it showed up in my prod-environment, but only with a specific db-entry. I think it's a bug, but I need help reproducing it..
Note 2: This problem fixed itself doing a database vacuum.. I tried that manually after I wrote the whole issue. Still think the issue is weird, maybe someone have something useful to say. At least keeping the issue for searchability for the next person hitting this. Maybe there is a way for django to detect cases like this?

Using a simple model for explaination:

class LowercaseCharField(models.CharField):
    def get_prep_value(self, value):
        return str(value).lower()

class Server(models.Model):
    objects = ServerBaseManager.from_queryset(ServerQuerySet)()
    name = LowercaseCharField(max_length=255)
    domain = models.ForeignKey(Domain, db_index=True, default=default_domain, on_delete=models.SET_DEFAULT)

    class Meta:
        unique_together = (
            ('name', 'domain')
        )

ServerBaseManager and ServerQuerySet does not override any functions, just add functionality.
The domain ForeignKey is nothing magical either..

On 1 specific name entry, lets call it server1 there are something strange happening.
There are 2 entries with the server1 name in the db, one with domain.pk=1, and one with domain.pk=2

Here is the strange part:

In [1]: [i.pk for i in Server.objects.filter(name='server1')]
Out[1]: [1, 2]

In [2]: Server.objects.filter(name='server1')[0].pk
Out[2]: 2

In [3]: Server.objects.filter(name='server1')[1].pk
Out[3]: 2

In [4]: Server.objects.filter(name='server1').order_by('pk')[0].pk
Out[4]: 1

In [5]: Server.objects.filter(name='server1').order_by('pk')[1].pk
Out[5]: 2

In [6]: Server.objects.filter(name='server1').order_by('-name')[0].pk
Out[6]: 2

In [7]: Server.objects.filter(name='server1').order_by('-name')[1].pk
Out[7]: 2

In [8]: Server.objects.filter(name='server1').order_by('name')[0].pk
Out[8]: 2

In [9]: Server.objects.filter(name='server1').order_by('name')[1].pk
Out[9]: 2

In [10]: Server.objects.filter(name='server1').values('pk')
Out[10]: <ServerQuerySet [{'pk': 1}, {'pk': 2}]>

In [11]: Server.objects.filter(pk__in=Server.objects.filter(name='server1'))[0].pk
Out[11]: 1

In [12]: Server.objects.filter(pk__in=Server.objects.filter(name='server1'))[1].pk
Out[12]: 2

as you can see. The queryset returns the same object if I access it using queryset[0], or queryset[1], but in a lot of other cases, it works as it should.

  • queryset.query returns nothing magical.. Just a simple SELECT query
  • I have plenty of duplicate names, but this only happens to server1, tho this is hard to test in bulk, since the duplicate problem won't show up if I try to automate the testing..

versions:

  • python: 3.6.5
  • postgres 9.4
  • django: 2.1.1

Change History (4)

comment:1 by Claude Paroz, 6 years ago

Resolution: invalid
Status: newclosed

Sorry, but I don't see anything shocking in what you showed us, even if it's a bit surprising.

For example:

In [6]: Server.objects.filter(name='server1').order_by('-name')[0].pk
Out[6]: 2

In [7]: Server.objects.filter(name='server1').order_by('-name')[1].pk
Out[7]: 2

You are doing two queries with undefined ordering (as name is identical). So it's totally possible that the first query returns pk [1, 2], while the second query returns [2, 1].

It would be shocking if you'd obtain a similar result with one unique queryset:

qs = Server.objects.filter(name='server1').order_by('-name')
qs[0].pk => 2
qs[1].pk => 2

comment:2 by Lars Solberg, 6 years ago

Sorry, some of the examples was stupid. But I did do the queryset check as well before the VACUUM fixed the problem.
The result are exactly as you said.

Scrolling the terminal history, I have this

In [1]: qs = Server.objects.filter(name='server1')

In [2]: qs[0]
Out[2]: <Server: server1>

In [3]: qs[0].pk
Out[3]: 2

In [4]: qs[1].pk
Out[4]: 2

comment:3 by Claude Paroz, 6 years ago

Oh, weird. Unfortunately, I'm afraid this will impossible to solve unless we have some way to reproduce the problem.
Feel free to add anything new you could find in the future about this. I don't see currently how Django could be at fault.

comment:4 by Lars Solberg, 6 years ago

Absolutly.. This solved itself by doing a VACUUM.. However, auto-vacuum was enabled, so I guess postgres can be blamed somewhat..?
I tryed to vacuum too quickly, as I can't reproduce and debug the problem anymore.
If it happens again, I'll dig deeper.

I though something like [i.pk for i in Server.objects.filter(name='server1')] would generate the same sql queries as qs[0], qs[1]..

Note: See TracTickets for help on using tickets.
Back to Top