#11442 closed Bug (fixed)
Postgresql backend casts inet types to text, breaks IP operations and IPv6 lookups.
Reported by: | eide | Owned by: | Sasha Romijn |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | ipv6 postgres inet dceu13 |
Cc: | Jernej Kos, mmitar@…, asandroq | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Ticket #708 describes a problem with LIKE operations on inet types in postgresql. The solution was to cast inet to text using the HOST() function.
But by casting inet to text none of the network operations in postgresql will work, and IPv6 lookups are pretty much broken. In the database I'm currently using, doing a HOST() on a IPv6 address will always produce a compressed URL. So if I'm checking against a fullsize address in django the lookup will fail, even though they are the same address.
Here's an example of what I'm talking about:
my_db=# CREATE TABLE my_ips (ip inet); CREATE TABLE ^ my_db=# INSERT INTO my_ips VALUES ('2001:0db8:0000:0000:0000:0000:0000:0001'); INSERT 0 1 my_db=# SELECT * FROM my_ips WHERE ip = '2001:0db8:0000:0000:0000:0000:0000:0001'; ip ------------- 2001:db8::1 (1 row) my_db=# SELECT * FROM my_ips WHERE ip = '2001:db8::1'; ip ------------- 2001:db8::1 (1 row)
So far so good, but when you throw HOST() into the picture, this happens:
my_db=# SELECT * FROM my_ips WHERE HOST(ip) = '2001:db8::1'; ip ------------- 2001:db8::1 (1 row) my_db=# SELECT * FROM my_ips WHERE HOST(ip) = '2001:0db8:0000:0000:0000:0000:0000:0001'; ip ---- (0 rows)
2001:db8::1 and 2001:0db8:0000:0000:0000:0000:0000:0001 are the same address, just displayed on different forms.
Currently I always make sure that I pass a compressed IP to the models with IPAddressFields. That does however assume that all postgresql databases will always return IPv6 addresses on the compressed form, and I do not know if that's correct.
The correct solution would be to not cast inet to text.
Also, the postgresql documentation on Network Address Functions and Operators states that:
The host, text, and abbrev functions are primarily intended to offer alternative display formats.
So using HOST() for lookups is acctually kind of wrong in the first place.
Change History (22)
comment:1 by , 15 years ago
comment:2 by , 15 years ago
Morten: The perfomance issue can be "solved" by adding a host(ip)-index on the table;
CREATE INDEX arp_host_ip ON arp (host(ip));
klette=# SELECT ip from ips where host(ip) = '2001:700:300:1800::b'; host ---------------------- 2001:700:300:1800::b (1 row) Time: 1781.635 ms klette=# CREATE INDEX ips_host_ip_index ON ips ( host(ip)); CREATE INDEX Time: 31937.661 ms klette=# SELECT ip from ips where host(ip)::text = '2001:700:300:1800::b'; host ---------------------- 2001:700:300:1800::b (1 row) Time: 0.805 ms
Doesn't really solve the bug though, but boost performance at least.
comment:3 by , 15 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:4 by , 15 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
On second thought - closing as a dupe of #811. IPv6 support is spotty across the board - if we're going to fix it, it isn't just a Postgres issue.
follow-up: 6 comment:5 by , 15 years ago
Resolution: | duplicate |
---|---|
Status: | closed → reopened |
Version: | 1.0 → 1.1 |
The resolution assumes this is just an IPv6 problem, and completely ignores the enormous performance problem introduced by casting every inet record in the database to a string. This is understandable for LIKE queries, but it even uses HOST() on exact match queries.
These two queries return the same results. The first is how Django currently runs this query, and is roughly 2000x slower than the second. (Yes, I restarted Postgres between tests and flushed the OS buffers, so it is a fair comparison.)
The difference is performing n inet->string casts vs. performing 1 string->inet cast.
This also fixes the original IPv6 problem in this ticket.
Takes ~30.0 sec:
SELECT ip from ips where host(ip) = '10.0.0.1'
Takes ~0.15 sec:
SELECT ip from ips where ip = inet '10.0.0.1'
comment:6 by , 15 years ago
comment:7 by , 15 years ago
This also breaks ordering of IPv4 addresses, making some querysets return completely incorrect results.
foo=# select inet '127.0.0.3' < inet '127.0.0.10'; ?column? ---------- t (1 row) foo=# select '127.0.0.3' < '127.0.0.10'; ?column? ---------- f (1 row)
comment:8 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:11 by , 13 years ago
Cc: | added |
---|
This also unexpectedly breaks any custom fields that have the inet
database type. For example, when performing a lookup for a subnet, this HOST() conversion will produce no results, since the prefix length is stripped. And it is very annoying since the only way to work around it seems to be writing a custom manager that specifies a custom WhereNode
implementation for the query set (or an ugly way is to add an extra space into the returned db_type
so it doesn't match and is not mangled in the backend). This decision of casting every inet
-thing via HOST() was really unwise and should be fixed.
comment:12 by , 13 years ago
Cc: | added |
---|
comment:13 by , 12 years ago
Cc: | added |
---|
comment:14 by , 12 years ago
Status: | reopened → new |
---|
comment:15 by , 12 years ago
This is biting me right now. I need to lookup an ip-address in a range of ip-addresses, that is:
select * from ip_range where ipaddr_start <= '62.79.79.3' and ipaddr_end >= '62.79.79.3';
Which with the current test-database (postgresql 9.1.4) returns
id | ipaddr_start | ipaddr_end ----+--------------+------------- 47 | 62.79.79.0 | 62.79.79.255
As expected.
On Django 1.5.1:
>>> a = '62.79.79.3' >>> print IPRange.objects.filter(ipaddr_start__lte=a, ipaddr_end__gte=a).query 'SELECT "ip_range"."id", "ip_range"."ipaddr_start", "ip_range"."ipaddr_end" FROM "ip_range" WHERE (HOST("ip_range"."ipaddr_start") <= 62.79.79.3 AND HOST("ip_range"."ipaddr_end") >= 62.79.79.3 ) ORDER BY "ip_range"."ipaddr_start" ASC, "ip_range"."ipaddr_end" ASC;'
This yields a syntax error when pasted in to psql, on the first occurence of the unquoted "62.79.79.3".
If the ip-addresses are quoted and HOST() removed, it works.
The model is
class IPRange(models.Model): ipaddr_start = models.GenericIPAddressField(protocol='both', verbose_name='Start of IP-range', unpack_ipv4=True) ipaddr_end = models.GenericIPAddressField(protocol='both', verbose_name='End of IP-range', unpack_ipv4=True, blank=True, null=True, default=None) class Meta: db_table = 'ip_range' unique_together = ('ipaddr_start', 'ipaddr_end')
Psql reports the table as:
Table "public.ip_range" Column | Type | Modifiers --------------+------------------------+-------------------------------------------------------- id | integer | not null default nextval('ip_range_id_seq'::regclass) ipaddr_start | inet | not null ipaddr_end | inet |
I'm on a deadline and will therefore have to write my own field, will add to this ticket when done.
comment:16 by , 12 years ago
The query and the parameters are sent separately to PostgreSQL. In .query
the parameters are substituted for readability, but that obviously results in invalid SQL, and it isn't the query that's actually sent to PostgreSQL.
comment:17 by , 12 years ago
Still have to get rid of HOST(). LIKE isn't defined for 'inet' anyway, it has its own subnet-operators: '<<', '<<=', '>>', '>>='. The operators "contains" and "icontains" should map to postgres ">>=".It seems the backend-system isn't designed for adapting its operators to the data types so I think I'm better off using raw SQL right now :/.
comment:18 by , 12 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:19 by , 12 years ago
Keywords: | dceu13 added |
---|---|
Version: | 1.1 → master |
comment:20 by , 12 years ago
Has patch: | set |
---|
I can see the need for more native support of INET, but the problem is that other databases do not support this type. If we would not treat INET as a string, using HOST(), the behaviour across databases would become inconsistent. Therefore, I don't think we can remove the usage of HOST().
That said, I do understand the desire to use this in custom fields. Currently, the PostgreSQL backend applies HOST() on all queries for INET columns, making custom fields difficult. I have made a patch that instead decides on HOST() based on the get_internal_name()
of the field. Only when that is (Generic)IPAddressField, HOST() is used. This means behaviour of built-in fields remains consistent across database (and the same as in previous Django versions), but it is now possible to define custom fields that use INET but are not wrapped with HOST().
Tests successfully run on PostgreSQL 9.2.4 and SQLite, but overall the patch isn't very risky as no queries have changed. I have also added an additional test for #708, the reason we originally started using HOST():
comment:21 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Not only will using HOST yield wrong results, there are also severe performance implications to using the HOST function call in lookups, as it fails to utilize indexes on INET type fields. See the following example: