#11442 closed Bug (fixed)
Postgresql backend casts inet types to text, breaks IP operations and IPv6 lookups.
Reported by: | eide | Owned by: | Sasha Romijn |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | ipv6 postgres inet dceu13 |
Cc: | Jernej Kos, mmitar@…, asandroq | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Ticket #708 describes a problem with LIKE operations on inet types in postgresql. The solution was to cast inet to text using the HOST() function.
But by casting inet to text none of the network operations in postgresql will work, and IPv6 lookups are pretty much broken. In the database I'm currently using, doing a HOST() on a IPv6 address will always produce a compressed URL. So if I'm checking against a fullsize address in django the lookup will fail, even though they are the same address.
Here's an example of what I'm talking about:
my_db=# CREATE TABLE my_ips (ip inet); CREATE TABLE ^ my_db=# INSERT INTO my_ips VALUES ('2001:0db8:0000:0000:0000:0000:0000:0001'); INSERT 0 1 my_db=# SELECT * FROM my_ips WHERE ip = '2001:0db8:0000:0000:0000:0000:0000:0001'; ip ------------- 2001:db8::1 (1 row) my_db=# SELECT * FROM my_ips WHERE ip = '2001:db8::1'; ip ------------- 2001:db8::1 (1 row)
So far so good, but when you throw HOST() into the picture, this happens:
my_db=# SELECT * FROM my_ips WHERE HOST(ip) = '2001:db8::1'; ip ------------- 2001:db8::1 (1 row) my_db=# SELECT * FROM my_ips WHERE HOST(ip) = '2001:0db8:0000:0000:0000:0000:0000:0001'; ip ---- (0 rows)
2001:db8::1 and 2001:0db8:0000:0000:0000:0000:0000:0001 are the same address, just displayed on different forms.
Currently I always make sure that I pass a compressed IP to the models with IPAddressFields. That does however assume that all postgresql databases will always return IPv6 addresses on the compressed form, and I do not know if that's correct.
The correct solution would be to not cast inet to text.
Also, the postgresql documentation on Network Address Functions and Operators states that:
The host, text, and abbrev functions are primarily intended to offer alternative display formats.
So using HOST() for lookups is acctually kind of wrong in the first place.
Change History (22)
comment:1 Changed 14 years ago by
comment:2 Changed 14 years ago by
Morten: The perfomance issue can be "solved" by adding a host(ip)-index on the table;
CREATE INDEX arp_host_ip ON arp (host(ip));
klette=# SELECT ip from ips where host(ip) = '2001:700:300:1800::b'; host ---------------------- 2001:700:300:1800::b (1 row) Time: 1781.635 ms klette=# CREATE INDEX ips_host_ip_index ON ips ( host(ip)); CREATE INDEX Time: 31937.661 ms klette=# SELECT ip from ips where host(ip)::text = '2001:700:300:1800::b'; host ---------------------- 2001:700:300:1800::b (1 row) Time: 0.805 ms
Doesn't really solve the bug though, but boost performance at least.
comment:3 Changed 14 years ago by
Triage Stage: | Unreviewed → Accepted |
---|
comment:4 Changed 14 years ago by
Resolution: | → duplicate |
---|---|
Status: | new → closed |
On second thought - closing as a dupe of #811. IPv6 support is spotty across the board - if we're going to fix it, it isn't just a Postgres issue.
comment:5 follow-up: 6 Changed 14 years ago by
Resolution: | duplicate |
---|---|
Status: | closed → reopened |
Version: | 1.0 → 1.1 |
The resolution assumes this is just an IPv6 problem, and completely ignores the enormous performance problem introduced by casting every inet record in the database to a string. This is understandable for LIKE queries, but it even uses HOST() on exact match queries.
These two queries return the same results. The first is how Django currently runs this query, and is roughly 2000x slower than the second. (Yes, I restarted Postgres between tests and flushed the OS buffers, so it is a fair comparison.)
The difference is performing n inet->string casts vs. performing 1 string->inet cast.
This also fixes the original IPv6 problem in this ticket.
Takes ~30.0 sec:
SELECT ip from ips where host(ip) = '10.0.0.1'
Takes ~0.15 sec:
SELECT ip from ips where ip = inet '10.0.0.1'
comment:6 Changed 14 years ago by
comment:7 Changed 13 years ago by
This also breaks ordering of IPv4 addresses, making some querysets return completely incorrect results.
foo=# select inet '127.0.0.3' < inet '127.0.0.10'; ?column? ---------- t (1 row) foo=# select '127.0.0.3' < '127.0.0.10'; ?column? ---------- f (1 row)
comment:8 Changed 13 years ago by
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:11 Changed 11 years ago by
Cc: | Jernej Kos added |
---|
This also unexpectedly breaks any custom fields that have the inet
database type. For example, when performing a lookup for a subnet, this HOST() conversion will produce no results, since the prefix length is stripped. And it is very annoying since the only way to work around it seems to be writing a custom manager that specifies a custom WhereNode
implementation for the query set (or an ugly way is to add an extra space into the returned db_type
so it doesn't match and is not mangled in the backend). This decision of casting every inet
-thing via HOST() was really unwise and should be fixed.
comment:12 Changed 11 years ago by
Cc: | mmitar@… added |
---|
comment:13 Changed 11 years ago by
Cc: | asandroq added |
---|
comment:14 Changed 11 years ago by
Status: | reopened → new |
---|
comment:15 Changed 10 years ago by
This is biting me right now. I need to lookup an ip-address in a range of ip-addresses, that is:
select * from ip_range where ipaddr_start <= '62.79.79.3' and ipaddr_end >= '62.79.79.3';
Which with the current test-database (postgresql 9.1.4) returns
id | ipaddr_start | ipaddr_end ----+--------------+------------- 47 | 62.79.79.0 | 62.79.79.255
As expected.
On Django 1.5.1:
>>> a = '62.79.79.3' >>> print IPRange.objects.filter(ipaddr_start__lte=a, ipaddr_end__gte=a).query 'SELECT "ip_range"."id", "ip_range"."ipaddr_start", "ip_range"."ipaddr_end" FROM "ip_range" WHERE (HOST("ip_range"."ipaddr_start") <= 62.79.79.3 AND HOST("ip_range"."ipaddr_end") >= 62.79.79.3 ) ORDER BY "ip_range"."ipaddr_start" ASC, "ip_range"."ipaddr_end" ASC;'
This yields a syntax error when pasted in to psql, on the first occurence of the unquoted "62.79.79.3".
If the ip-addresses are quoted and HOST() removed, it works.
The model is
class IPRange(models.Model): ipaddr_start = models.GenericIPAddressField(protocol='both', verbose_name='Start of IP-range', unpack_ipv4=True) ipaddr_end = models.GenericIPAddressField(protocol='both', verbose_name='End of IP-range', unpack_ipv4=True, blank=True, null=True, default=None) class Meta: db_table = 'ip_range' unique_together = ('ipaddr_start', 'ipaddr_end')
Psql reports the table as:
Table "public.ip_range" Column | Type | Modifiers --------------+------------------------+-------------------------------------------------------- id | integer | not null default nextval('ip_range_id_seq'::regclass) ipaddr_start | inet | not null ipaddr_end | inet |
I'm on a deadline and will therefore have to write my own field, will add to this ticket when done.
comment:16 Changed 10 years ago by
The query and the parameters are sent separately to PostgreSQL. In .query
the parameters are substituted for readability, but that obviously results in invalid SQL, and it isn't the query that's actually sent to PostgreSQL.
comment:17 Changed 10 years ago by
Still have to get rid of HOST(). LIKE isn't defined for 'inet' anyway, it has its own subnet-operators: '<<', '<<=', '>>', '>>='. The operators "contains" and "icontains" should map to postgres ">>=".It seems the backend-system isn't designed for adapting its operators to the data types so I think I'm better off using raw SQL right now :/.
comment:18 Changed 10 years ago by
Owner: | changed from nobody to Sasha Romijn |
---|---|
Status: | new → assigned |
comment:19 Changed 10 years ago by
Keywords: | dceu13 added |
---|---|
Version: | 1.1 → master |
comment:20 Changed 10 years ago by
Has patch: | set |
---|
I can see the need for more native support of INET, but the problem is that other databases do not support this type. If we would not treat INET as a string, using HOST(), the behaviour across databases would become inconsistent. Therefore, I don't think we can remove the usage of HOST().
That said, I do understand the desire to use this in custom fields. Currently, the PostgreSQL backend applies HOST() on all queries for INET columns, making custom fields difficult. I have made a patch that instead decides on HOST() based on the get_internal_name()
of the field. Only when that is (Generic)IPAddressField, HOST() is used. This means behaviour of built-in fields remains consistent across database (and the same as in previous Django versions), but it is now possible to define custom fields that use INET but are not wrapped with HOST().
Tests successfully run on PostgreSQL 9.2.4 and SQLite, but overall the patch isn't very risky as no queries have changed. I have also added an additional test for #708, the reason we originally started using HOST():
comment:21 Changed 10 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Not only will using HOST yield wrong results, there are also severe performance implications to using the HOST function call in lookups, as it fails to utilize indexes on INET type fields. See the following example: