Code

Opened 5 years ago

Closed 11 months ago

Last modified 11 months ago

#11442 closed Bug (fixed)

Postgresql backend casts inet types to text, breaks IP operations and IPv6 lookups.

Reported by: eide Owned by: erikr
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: ipv6 postgres inet dceu13
Cc: kostko, mmitar@…, asandroq Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Ticket #708 describes a problem with LIKE operations on inet types in postgresql. The solution was to cast inet to text using the HOST() function.

But by casting inet to text none of the network operations in postgresql will work, and IPv6 lookups are pretty much broken. In the database I'm currently using, doing a HOST() on a IPv6 address will always produce a compressed URL. So if I'm checking against a fullsize address in django the lookup will fail, even though they are the same address.

Here's an example of what I'm talking about:

my_db=# CREATE TABLE my_ips (ip inet);
CREATE TABLE
                            ^
my_db=# INSERT INTO my_ips VALUES ('2001:0db8:0000:0000:0000:0000:0000:0001');
INSERT 0 1

my_db=# SELECT * FROM my_ips WHERE ip = '2001:0db8:0000:0000:0000:0000:0000:0001';
     ip      
-------------
 2001:db8::1
(1 row)

my_db=# SELECT * FROM my_ips WHERE ip = '2001:db8::1';
     ip      
-------------
 2001:db8::1
(1 row)

So far so good, but when you throw HOST() into the picture, this happens:

my_db=# SELECT * FROM my_ips WHERE HOST(ip) = '2001:db8::1';
     ip      
-------------
 2001:db8::1
(1 row)

my_db=# SELECT * FROM my_ips WHERE HOST(ip) = '2001:0db8:0000:0000:0000:0000:0000:0001';
 ip 
----
(0 rows)

2001:db8::1 and 2001:0db8:0000:0000:0000:0000:0000:0001 are the same address, just displayed on different forms.

Currently I always make sure that I pass a compressed IP to the models with IPAddressFields. That does however assume that all postgresql databases will always return IPv6 addresses on the compressed form, and I do not know if that's correct.

The correct solution would be to not cast inet to text.

Also, the postgresql documentation on Network Address Functions and Operators states that:

The host, text, and abbrev functions are primarily intended to offer alternative display formats.

So using HOST() for lookups is acctually kind of wrong in the first place.

Attachments (0)

Change History (22)

comment:1 Changed 5 years ago by morten.brekkevold@…

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Not only will using HOST yield wrong results, there are also severe performance implications to using the HOST function call in lookups, as it fails to utilize indexes on INET type fields. See the following example:

nav=# select count(*) from arp;   
  count  
---------
 6391765
(1 row)

nav=# explain analyze select * from arp where ip = '2001:700::1';
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using arp_ip_btree on arp  (cost=0.00..905.80 rows=232 width=67) (actual time=0.021..0.021 rows=0 loops=1)
   Index Cond: (ip = '2001:700::1'::inet)
 Total runtime: 0.051 ms
(3 rows)

nav=# explain analyze select * from arp where HOST(ip) = '2001:700::1';
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Seq Scan on arp  (cost=0.00..200239.38 rows=32911 width=67) (actual time=9410.175..9410.175 rows=0 loops=1)
   Filter: (host(ip) = '2001:700::1'::text)
 Total runtime: 9410.196 ms
(3 rows)

nav=# 

comment:2 Changed 4 years ago by kristian.klette@…

Morten: The perfomance issue can be "solved" by adding a host(ip)-index on the table;

CREATE INDEX arp_host_ip ON arp (host(ip));
klette=# SELECT ip from ips where host(ip) = '2001:700:300:1800::b';
         host         
----------------------
 2001:700:300:1800::b
(1 row)

Time: 1781.635 ms
klette=# CREATE INDEX ips_host_ip_index ON ips ( host(ip));
CREATE INDEX
Time: 31937.661 ms
klette=# SELECT ip from ips where host(ip)::text = '2001:700:300:1800::b';
         host         
----------------------
 2001:700:300:1800::b
(1 row)

Time: 0.805 ms

Doesn't really solve the bug though, but boost performance at least.

comment:3 Changed 4 years ago by russellm

  • Triage Stage changed from Unreviewed to Accepted

comment:4 Changed 4 years ago by russellm

  • Resolution set to duplicate
  • Status changed from new to closed

On second thought - closing as a dupe of #811. IPv6 support is spotty across the board - if we're going to fix it, it isn't just a Postgres issue.

comment:5 follow-up: Changed 4 years ago by bobrobertson

  • Resolution duplicate deleted
  • Status changed from closed to reopened
  • Version changed from 1.0 to 1.1

The resolution assumes this is just an IPv6 problem, and completely ignores the enormous performance problem introduced by casting every inet record in the database to a string. This is understandable for LIKE queries, but it even uses HOST() on exact match queries.

These two queries return the same results. The first is how Django currently runs this query, and is roughly 2000x slower than the second. (Yes, I restarted Postgres between tests and flushed the OS buffers, so it is a fair comparison.)

The difference is performing n inet->string casts vs. performing 1 string->inet cast.
This also fixes the original IPv6 problem in this ticket.

Takes ~30.0 sec:

SELECT ip from ips where host(ip) = '10.0.0.1'

Takes ~0.15 sec:

SELECT ip from ips where ip = inet '10.0.0.1'

comment:6 in reply to: ↑ 5 Changed 4 years ago by bobrobertson

Replying to bobrobertson:

Excuse my typo.

Takes ~0.15 sec:

should have been:

Takes ~0.015 sec:

comment:7 Changed 4 years ago by dseomn

This also breaks ordering of IPv4 addresses, making some querysets return completely incorrect results.

foo=# select inet '127.0.0.3' < inet '127.0.0.10';
 ?column?
----------
 t
(1 row)

foo=# select '127.0.0.3' < '127.0.0.10';
 ?column?
----------
 f
(1 row)

comment:8 Changed 3 years ago by julien

  • Severity set to Normal
  • Type set to Bug

comment:9 Changed 2 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:10 Changed 2 years ago by aaugustin

  • Easy pickings unset

Change Easy pickings from NULL to False.

comment:11 Changed 2 years ago by kostko

  • Cc kostko added

This also unexpectedly breaks any custom fields that have the inet database type. For example, when performing a lookup for a subnet, this HOST() conversion will produce no results, since the prefix length is stripped. And it is very annoying since the only way to work around it seems to be writing a custom manager that specifies a custom WhereNode implementation for the query set (or an ugly way is to add an extra space into the returned db_type so it doesn't match and is not mangled in the backend). This decision of casting every inet-thing via HOST() was really unwise and should be fixed.

Last edited 2 years ago by kostko (previous) (diff)

comment:12 Changed 2 years ago by mitar

  • Cc mmitar@… added

comment:13 Changed 15 months ago by asandroq

  • Cc asandroq added

comment:14 Changed 13 months ago by aaugustin

  • Status changed from reopened to new

comment:15 Changed 12 months ago by HM

This is biting me right now. I need to lookup an ip-address in a range of ip-addresses, that is:

select * from ip_range where ipaddr_start <= '62.79.79.3' and ipaddr_end >= '62.79.79.3';

Which with the current test-database (postgresql 9.1.4) returns

 id | ipaddr_start |  ipaddr_end 
----+--------------+-------------
 47 | 62.79.79.0   | 62.79.79.255

As expected.

On Django 1.5.1:

>>> a = '62.79.79.3'
>>> print IPRange.objects.filter(ipaddr_start__lte=a, ipaddr_end__gte=a).query
'SELECT "ip_range"."id", "ip_range"."ipaddr_start", "ip_range"."ipaddr_end" 
FROM "ip_range" WHERE (HOST("ip_range"."ipaddr_start") <= 62.79.79.3  
AND HOST("ip_range"."ipaddr_end") >= 62.79.79.3 ) 
ORDER BY "ip_range"."ipaddr_start" ASC, "ip_range"."ipaddr_end" ASC;'

This yields a syntax error when pasted in to psql, on the first occurence of the unquoted "62.79.79.3".

If the ip-addresses are quoted and HOST() removed, it works.

The model is

class IPRange(models.Model):
    ipaddr_start = models.GenericIPAddressField(protocol='both',
            verbose_name='Start of IP-range', unpack_ipv4=True)
    ipaddr_end = models.GenericIPAddressField(protocol='both',
            verbose_name='End of IP-range', unpack_ipv4=True, blank=True,
            null=True, default=None)

    class Meta:
         db_table = 'ip_range'
         unique_together = ('ipaddr_start', 'ipaddr_end')

Psql reports the table as:

                                    Table "public.ip_range"
    Column    |          Type          |                       Modifiers
--------------+------------------------+--------------------------------------------------------
 id           | integer                | not null default nextval('ip_range_id_seq'::regclass)
 ipaddr_start | inet                   | not null
 ipaddr_end   | inet                   | 

I'm on a deadline and will therefore have to write my own field, will add to this ticket when done.

comment:16 Changed 12 months ago by aaugustin

The query and the parameters are sent separately to PostgreSQL. In .query the parameters are substituted for readability, but that obviously results in invalid SQL, and it isn't the query that's actually sent to PostgreSQL.

comment:17 Changed 12 months ago by HM

Still have to get rid of HOST(). LIKE isn't defined for 'inet' anyway, it has its own subnet-operators: '<<', '<<=', '>>', '>>='. The operators "contains" and "icontains" should map to postgres ">>=".It seems the backend-system isn't designed for adapting its operators to the data types so I think I'm better off using raw SQL right now :/.

comment:18 Changed 11 months ago by erikr

  • Owner changed from nobody to erikr
  • Status changed from new to assigned

comment:19 Changed 11 months ago by erikr

  • Keywords dceu13 added
  • Version changed from 1.1 to master

comment:20 Changed 11 months ago by erikr

  • Has patch set

I can see the need for more native support of INET, but the problem is that other databases do not support this type. If we would not treat INET as a string, using HOST(), the behaviour across databases would become inconsistent. Therefore, I don't think we can remove the usage of HOST().

That said, I do understand the desire to use this in custom fields. Currently, the PostgreSQL backend applies HOST() on all queries for INET columns, making custom fields difficult. I have made a patch that instead decides on HOST() based on the get_internal_name() of the field. Only when that is (Generic)IPAddressField, HOST() is used. This means behaviour of built-in fields remains consistent across database (and the same as in previous Django versions), but it is now possible to define custom fields that use INET but are not wrapped with HOST().

Tests successfully run on PostgreSQL 9.2.4 and SQLite, but overall the patch isn't very risky as no queries have changed. I have also added an additional test for #708, the reason we originally started using HOST():

PR: https://github.com/django/django/pull/1160

comment:21 Changed 11 months ago by Erik Romijn <eromijn@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In 60d94c2a80a68861021526c0fef7fc40e648e81f:

Fixed #11442 -- Postgresql backend casts all inet types to text

comment:22 Changed 11 months ago by Aymeric Augustin <aymeric.augustin@…>

In f7467181aa56e30c946ff55190c49792d6e9a72f:

Merge pull request #1160 from erikr/host-inet-postgres2

Fixed #11442 -- Postgresql backend casts all inet types to text

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.