Opened 14 years ago

Closed 10 years ago

Last modified 10 years ago

#11442 closed Bug (fixed)

Postgresql backend casts inet types to text, breaks IP operations and IPv6 lookups.

Reported by: eide Owned by: Sasha Romijn
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: ipv6 postgres inet dceu13
Cc: Jernej Kos, mmitar@…, asandroq Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Ticket #708 describes a problem with LIKE operations on inet types in postgresql. The solution was to cast inet to text using the HOST() function.

But by casting inet to text none of the network operations in postgresql will work, and IPv6 lookups are pretty much broken. In the database I'm currently using, doing a HOST() on a IPv6 address will always produce a compressed URL. So if I'm checking against a fullsize address in django the lookup will fail, even though they are the same address.

Here's an example of what I'm talking about:

my_db=# CREATE TABLE my_ips (ip inet);
CREATE TABLE
                            ^
my_db=# INSERT INTO my_ips VALUES ('2001:0db8:0000:0000:0000:0000:0000:0001');
INSERT 0 1

my_db=# SELECT * FROM my_ips WHERE ip = '2001:0db8:0000:0000:0000:0000:0000:0001';
     ip      
-------------
 2001:db8::1
(1 row)

my_db=# SELECT * FROM my_ips WHERE ip = '2001:db8::1';
     ip      
-------------
 2001:db8::1
(1 row)

So far so good, but when you throw HOST() into the picture, this happens:

my_db=# SELECT * FROM my_ips WHERE HOST(ip) = '2001:db8::1';
     ip      
-------------
 2001:db8::1
(1 row)

my_db=# SELECT * FROM my_ips WHERE HOST(ip) = '2001:0db8:0000:0000:0000:0000:0000:0001';
 ip 
----
(0 rows)

2001:db8::1 and 2001:0db8:0000:0000:0000:0000:0000:0001 are the same address, just displayed on different forms.

Currently I always make sure that I pass a compressed IP to the models with IPAddressFields. That does however assume that all postgresql databases will always return IPv6 addresses on the compressed form, and I do not know if that's correct.

The correct solution would be to not cast inet to text.

Also, the postgresql documentation on Network Address Functions and Operators states that:

The host, text, and abbrev functions are primarily intended to offer alternative display formats.

So using HOST() for lookups is acctually kind of wrong in the first place.

Change History (22)

comment:1 Changed 14 years ago by morten.brekkevold@…

Not only will using HOST yield wrong results, there are also severe performance implications to using the HOST function call in lookups, as it fails to utilize indexes on INET type fields. See the following example:

nav=# select count(*) from arp;   
  count  
---------
 6391765
(1 row)

nav=# explain analyze select * from arp where ip = '2001:700::1';
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using arp_ip_btree on arp  (cost=0.00..905.80 rows=232 width=67) (actual time=0.021..0.021 rows=0 loops=1)
   Index Cond: (ip = '2001:700::1'::inet)
 Total runtime: 0.051 ms
(3 rows)

nav=# explain analyze select * from arp where HOST(ip) = '2001:700::1';
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Seq Scan on arp  (cost=0.00..200239.38 rows=32911 width=67) (actual time=9410.175..9410.175 rows=0 loops=1)
   Filter: (host(ip) = '2001:700::1'::text)
 Total runtime: 9410.196 ms
(3 rows)

nav=# 

comment:2 Changed 14 years ago by kristian.klette@…

Morten: The perfomance issue can be "solved" by adding a host(ip)-index on the table;

CREATE INDEX arp_host_ip ON arp (host(ip));
klette=# SELECT ip from ips where host(ip) = '2001:700:300:1800::b';
         host         
----------------------
 2001:700:300:1800::b
(1 row)

Time: 1781.635 ms
klette=# CREATE INDEX ips_host_ip_index ON ips ( host(ip));
CREATE INDEX
Time: 31937.661 ms
klette=# SELECT ip from ips where host(ip)::text = '2001:700:300:1800::b';
         host         
----------------------
 2001:700:300:1800::b
(1 row)

Time: 0.805 ms

Doesn't really solve the bug though, but boost performance at least.

comment:3 Changed 14 years ago by Russell Keith-Magee

Triage Stage: UnreviewedAccepted

comment:4 Changed 14 years ago by Russell Keith-Magee

Resolution: duplicate
Status: newclosed

On second thought - closing as a dupe of #811. IPv6 support is spotty across the board - if we're going to fix it, it isn't just a Postgres issue.

comment:5 Changed 14 years ago by bobrobertson

Resolution: duplicate
Status: closedreopened
Version: 1.01.1

The resolution assumes this is just an IPv6 problem, and completely ignores the enormous performance problem introduced by casting every inet record in the database to a string. This is understandable for LIKE queries, but it even uses HOST() on exact match queries.

These two queries return the same results. The first is how Django currently runs this query, and is roughly 2000x slower than the second. (Yes, I restarted Postgres between tests and flushed the OS buffers, so it is a fair comparison.)

The difference is performing n inet->string casts vs. performing 1 string->inet cast.
This also fixes the original IPv6 problem in this ticket.

Takes ~30.0 sec:

SELECT ip from ips where host(ip) = '10.0.0.1'

Takes ~0.15 sec:

SELECT ip from ips where ip = inet '10.0.0.1'

comment:6 in reply to:  5 Changed 14 years ago by bobrobertson

Replying to bobrobertson:

Excuse my typo.

Takes ~0.15 sec:

should have been:

Takes ~0.015 sec:

comment:7 Changed 13 years ago by David Mandelberg

This also breaks ordering of IPv4 addresses, making some querysets return completely incorrect results.

foo=# select inet '127.0.0.3' < inet '127.0.0.10';
 ?column?
----------
 t
(1 row)

foo=# select '127.0.0.3' < '127.0.0.10';
 ?column?
----------
 f
(1 row)

comment:8 Changed 13 years ago by Julien Phalip

Severity: Normal
Type: Bug

comment:9 Changed 12 years ago by Aymeric Augustin

UI/UX: unset

Change UI/UX from NULL to False.

comment:10 Changed 12 years ago by Aymeric Augustin

Easy pickings: unset

Change Easy pickings from NULL to False.

comment:11 Changed 11 years ago by Jernej Kos

Cc: Jernej Kos added

This also unexpectedly breaks any custom fields that have the inet database type. For example, when performing a lookup for a subnet, this HOST() conversion will produce no results, since the prefix length is stripped. And it is very annoying since the only way to work around it seems to be writing a custom manager that specifies a custom WhereNode implementation for the query set (or an ugly way is to add an extra space into the returned db_type so it doesn't match and is not mangled in the backend). This decision of casting every inet-thing via HOST() was really unwise and should be fixed.

Last edited 11 years ago by Jernej Kos (previous) (diff)

comment:12 Changed 11 years ago by Mitar

Cc: mmitar@… added

comment:13 Changed 11 years ago by asandroq

Cc: asandroq added

comment:14 Changed 11 years ago by Aymeric Augustin

Status: reopenednew

comment:15 Changed 10 years ago by HM

This is biting me right now. I need to lookup an ip-address in a range of ip-addresses, that is:

select * from ip_range where ipaddr_start <= '62.79.79.3' and ipaddr_end >= '62.79.79.3';

Which with the current test-database (postgresql 9.1.4) returns

 id | ipaddr_start |  ipaddr_end 
----+--------------+-------------
 47 | 62.79.79.0   | 62.79.79.255

As expected.

On Django 1.5.1:

>>> a = '62.79.79.3'
>>> print IPRange.objects.filter(ipaddr_start__lte=a, ipaddr_end__gte=a).query
'SELECT "ip_range"."id", "ip_range"."ipaddr_start", "ip_range"."ipaddr_end" 
FROM "ip_range" WHERE (HOST("ip_range"."ipaddr_start") <= 62.79.79.3  
AND HOST("ip_range"."ipaddr_end") >= 62.79.79.3 ) 
ORDER BY "ip_range"."ipaddr_start" ASC, "ip_range"."ipaddr_end" ASC;'

This yields a syntax error when pasted in to psql, on the first occurence of the unquoted "62.79.79.3".

If the ip-addresses are quoted and HOST() removed, it works.

The model is

class IPRange(models.Model):
    ipaddr_start = models.GenericIPAddressField(protocol='both',
            verbose_name='Start of IP-range', unpack_ipv4=True)
    ipaddr_end = models.GenericIPAddressField(protocol='both',
            verbose_name='End of IP-range', unpack_ipv4=True, blank=True,
            null=True, default=None)

    class Meta:
         db_table = 'ip_range'
         unique_together = ('ipaddr_start', 'ipaddr_end')

Psql reports the table as:

                                    Table "public.ip_range"
    Column    |          Type          |                       Modifiers
--------------+------------------------+--------------------------------------------------------
 id           | integer                | not null default nextval('ip_range_id_seq'::regclass)
 ipaddr_start | inet                   | not null
 ipaddr_end   | inet                   | 

I'm on a deadline and will therefore have to write my own field, will add to this ticket when done.

comment:16 Changed 10 years ago by Aymeric Augustin

The query and the parameters are sent separately to PostgreSQL. In .query the parameters are substituted for readability, but that obviously results in invalid SQL, and it isn't the query that's actually sent to PostgreSQL.

comment:17 Changed 10 years ago by HM

Still have to get rid of HOST(). LIKE isn't defined for 'inet' anyway, it has its own subnet-operators: '<<', '<<=', '>>', '>>='. The operators "contains" and "icontains" should map to postgres ">>=".It seems the backend-system isn't designed for adapting its operators to the data types so I think I'm better off using raw SQL right now :/.

comment:18 Changed 10 years ago by Sasha Romijn

Owner: changed from nobody to Sasha Romijn
Status: newassigned

comment:19 Changed 10 years ago by Sasha Romijn

Keywords: dceu13 added
Version: 1.1master

comment:20 Changed 10 years ago by Sasha Romijn

Has patch: set

I can see the need for more native support of INET, but the problem is that other databases do not support this type. If we would not treat INET as a string, using HOST(), the behaviour across databases would become inconsistent. Therefore, I don't think we can remove the usage of HOST().

That said, I do understand the desire to use this in custom fields. Currently, the PostgreSQL backend applies HOST() on all queries for INET columns, making custom fields difficult. I have made a patch that instead decides on HOST() based on the get_internal_name() of the field. Only when that is (Generic)IPAddressField, HOST() is used. This means behaviour of built-in fields remains consistent across database (and the same as in previous Django versions), but it is now possible to define custom fields that use INET but are not wrapped with HOST().

Tests successfully run on PostgreSQL 9.2.4 and SQLite, but overall the patch isn't very risky as no queries have changed. I have also added an additional test for #708, the reason we originally started using HOST():

PR: https://github.com/django/django/pull/1160

comment:21 Changed 10 years ago by Erik Romijn <eromijn@…>

Resolution: fixed
Status: assignedclosed

In 60d94c2a80a68861021526c0fef7fc40e648e81f:

Fixed #11442 -- Postgresql backend casts all inet types to text

comment:22 Changed 10 years ago by Aymeric Augustin <aymeric.augustin@…>

In f7467181aa56e30c946ff55190c49792d6e9a72f:

Merge pull request #1160 from erikr/host-inet-postgres2

Fixed #11442 -- Postgresql backend casts all inet types to text

Note: See TracTickets for help on using tickets.
Back to Top