Opened 3 years ago

Closed 16 months ago

#18903 closed New feature (wontfix)

Add forms validation for GB telephone numbers.

Reported by: g1smd Owned by: nobody
Component: Forms Version: 1.4
Severity: Normal Keywords: Telephone Number Forms GB
Cc: bradpitcher@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: yes

Description

Full details can be found in https://github.com/django/django/pull/316

Change History (27)

comment:1 Changed 3 years ago by g1smd

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I see that there are routines for validating telephone numbers in forms in Django for several countries.

The code for that can usually be found in the forms.py file located in the various country folders here:
https://github.com/django/django/tree/master/django/contrib/localflavor

So far, there is nothing for GB numbers, here:
https://github.com/django/django/tree/master/django/contrib/localflavor/gb

I've written a bunch of RegEx patterns to get this functionality started. The patterns are 100% fully tested. All that's needed is a few lines of python logic to string them together. The details can be found at:
https://github.com/django/django/pull/316/files

My python foo is almost zero. Anyone care to have a go at getting this to work?

RegEx 1 checks the user entered something that looks like a GB telephone number:
020 3000 5555
02075 567 234
0114 223 4567
01145 345 567
+44 1213 456 789
00 44 (0) 1697 73555
011 44 11 4890 2345
and several other formats, without worrying if the format is correct for this particular number (but read on). It allows for national or international format, even for two common international dial prefixes. What is most important is that the user enters the right number of digits. Don't constrain the user to use a particular format for entry.
"Be lenient in what you accept, be strict in what you send." (Postel's Law)

RegEx 2 extracts the NSN part of the number in $3, with "44" or NULL in $2 (so you know if international or national format was used on input), and any extension in $4. Store $2 and $4 for later use. Send $3 on to RegEx 3.

RegEx 3 tests the NSN part of the number is in a valid range and has the right number of digits for that range (GB numbers can be 9 or 10 digits long). This RegEx pattern is very detailed. You can say that a number is possible or is invalid with this RegEx.

RegEx Group 4. Here, there's a bunch of RegEx patterns that specify how a GB number should be formatted, detailed by number length and initial digits. These rules cover all GB numbers.

The last bit is to add the 0 or +44 back on, and append the original extension number (if present) and present it back to the user.

020 3000 5555 => Valid: 020 3000 5555
02075 567 234 => Valid: 020 7556 7234
0114 223 4567 => Valid: 0114 223 4567
01145 345 567 => Valid: 0114 534 5567
+44 1213 456 789 => Valid: +44 121 345 6789
00 44 (0) 1697 73555 => Valid: +44 16977 3555
011 44 11 4890 2345 => Valid: +44 114 890 2345
0623 111 3456 => NOT VALID

comment:2 Changed 3 years ago by g1smd

  • Has patch set
  • Patch needs improvement set

comment:3 Changed 3 years ago by g1smd

  • UI/UX set

comment:4 Changed 3 years ago by g1smd

  • Needs tests set

comment:5 Changed 3 years ago by g1smd

  • Keywords Telephone Number Forms GB added

comment:7 Changed 3 years ago by bradpitcher

  • Has patch unset
  • Needs documentation set
  • Owner changed from nobody to bradpitcher
  • Patch needs improvement unset
  • Status changed from new to assigned

comment:8 Changed 3 years ago by bradpitcher

  • Has patch set
  • Needs documentation unset
  • Needs tests unset
  • Owner changed from bradpitcher to nobody
  • Patch needs improvement set
  • Status changed from assigned to new

A working patch can be found here:
https://github.com/django/django/pull/356

The patch still needs improvement, however the tests do pass.

What format should clean return the number in? Now it is returning +44xxxxxxx. Should it be more like +44 xxx xxx xxxx?

Also, what's the best practice with long regular expressions? Really long single lines or multiple lines with indentation for structure?

comment:9 Changed 3 years ago by bradpitcher

  • Cc bradpitcher@… added

comment:10 Changed 3 years ago by g1smd

I prefer the multi-line format RegEx for clarity. I have changed to that format as well as altering the patterns slightly in a later commit in my repo at g1smd/django.

Thanks for your fantastic effort in finishing the conversion to python. I have this code working in Java and PHP, but the python syntax was just too baffling. :)

As you are no doubt aware from studying the code, it allows a GB phone number such as London +44 20 3000 5555 to be entered as 020 3000 5555, 020-3000-5555, 0203 000 5555, 02030 005 555, (020) 3000 5555, (0203) 000 5555, (02030) 005 555, (+44) (20) 3000 5555, (00) (44) (20) 3000 5555, (011) (44) (20) 3000 5555, (+44)-(20)-3000-5555 and a variety of other formats - with or without spaces, hyphens or brackets.

It doesn't constrain the user to enter the number in the right format. It allows the user to make basic formatting errors and add a range of punctuation (which is then stripped out). What is important is that the phone number is entered with the right number of digits. This is thoroughly checked by a later very detailed RegEx pattern.

The detailed RegEx pattern also checks the number is in an area code or number range that actually exists.

For valid numbers, another set of RegEx patterns re-format the number in the right way for the number range and number length in question. I have added a commit that puts the correct spaces in the number formatting, and other improvements.

By splitting format checking, NSN extraction, number range and length checking, and the number formatting into separate processes much more detailed checks can be made at each stage.

Finally I have added a comment block listing accepted number formats for entry.

Please copy the updated code at g1smd/django to your repo. Thanks again!

comment:11 Changed 3 years ago by g1smd

  • Patch needs improvement unset

Progess in the last few days has seen the code syntax fixed up, a couple of bugs squashed, and some tests added and passed. The RegEx patterns were also further enhanced.

comment:12 Changed 3 years ago by bradpitcher

  • Triage Stage changed from Unreviewed to Accepted

comment:13 Changed 3 years ago by g1smd

We think this is now 'Ready for Checkin' but will leave it to a third party to make that call officially.

comment:14 Changed 3 years ago by anonymous

  • Easy pickings set

comment:15 Changed 3 years ago by andrewgodwin

Goodness me, that's one hell of a regular expression. Phone numbers probably need it, though.

My main worry is the enforcement of spacing rules - is that really necessary? If there's one thing people get wrong it's where to put the spaces in phone numbers (e.g. 0207 123 4567), and I feel that it's better to strip out the spaces and just do range checks.

comment:16 Changed 3 years ago by bradpitcher

I know, it's a wicked regex, but I don't think spacing is enforced. Maybe @g1smd can confirm, but if you look at the tests (https://github.com/brad/django/blob/ticket_18903/tests/regressiontests/localflavor/gb/tests.py#L36) there are a few with spaces in different places and the regex doesn't care.

comment:17 Changed 3 years ago by g1smd

Yep. Some people in London (for example) seem to think their numbers are 0203 000 5000 rather than 020 3000 5000. However, the 'faulty' number has the right number of digits. It would work if dialled. Let users enter it in the wrong format, then tidy it up before presenting it back to them. However, always reject 020 4000 5000 because no London numbers begin with a 4, and reject 022 3000 5000 because the 022 area code is not allocated.

comment:18 Changed 3 years ago by g1smd

The first RegEx rejects entries that look nothing like a GB phone number (especially those with the wrong number of digits or an invalid prefix), while allowing a wide range of entries that do. Allowed formats include a range of prefixes such as 0, +44, 00 44, 011 44, each with or without spaces or hyphens and/or brackets.

The remaining part of the number can be in any common number format with or without spaces or hyphens, and with or without brackets around the area code. The RegEx doesn't care if the format is the wrong format for this number. All it cares about is that it has the right number of digits.

The later RegEx patterns extract the NSN part, then check that the area code, number range, and length of number for this range are all correct.

01750 61555 is rejected (too short). 01750 62555 is accepted ("4+5").
01750 610555 is accepted ("4+6").
01750 6100555 is rejected (too long). 01750 6255 is rejected (too short).

01750 610555 can be entered as 017 5061 0555 or 0175 061 0555 or 017506 10555 and will be accepted.
01750 62555 can be entered as 0175 062555 or 017506 2555 and will be accepted.
These latter examples will be reformatted for display.

comment:19 Changed 3 years ago by andrewgodwin

Alright, spacing does matter - while my london example works due to the other area code rules, this does not: '014 153 455 67'.

Over-validation is as bad as no validation - I'm happy to accept a patch that checks the number ranges but having the format hard-enforced like this is just going to annoy people.

comment:20 Changed 3 years ago by g1smd

There's a good chance that someone entering that number in that format is entering a number from Belguim, not the UK. Spacing only matters when it's something that doesn't even look like a UK number.

comment:21 Changed 3 years ago by g1smd

The only constraint I placed was "make it look something like a UK phone number". Most people are aware that UK numbers are entered as 2+8, 3+7, 4+6 or 5+5 for 10 digit numbers or 3+6, 4+5 or 5+4 for 9 digit numbers, but not many people are aware of which format goes with which area code. The RegEx here allows any of those formats with any number and various punctuation options. Numbers can, of course, be entered with no spaces or punctuation at all.

Merely checking the number of digits in the NSN part would allow people to enter

(011)-(44)-(0)-(((2 -(- 0)-((---)))()()()--5- --3 (---4() --0-- )))----((2)(()- )3)))----3-- 7

and that number still be accepted. Do you really want that? There's a point at which the user input should be rejected as garbage.

comment:22 Changed 3 years ago by aaugustin

While I appreciate the effort that went into this patch, I'm surprised by how deep it goes into the territory of specialized validation and formatting tools such as phonenumbers. I don't know any local phone number field with such minute validation in Django.

I'm not against this patch — especially given the BDFL's plan to split localflavor and rely on national communities for maintenance. I'm just pointing out a possible mismatch with user expectations regarding localflavor.

comment:23 Changed 3 years ago by g1smd

A lot of software seems to use this RegEx http://regexlib.com/REDetails.aspx?regexp_id=589 to validate GB phone numbers. It validates 2+8, 3+7 and 4+6 format numbers and numbers without spaces but doesn't match valid 4+5, 5+5, 5+4 and 3+6 numbers. Other software requires that no spaces or punctuation be entered at all. Some require a 0 plus exactly ten digits, ignoring the fact that 41 area codes (plus 0500 and some of 0800) also have 9 digit numbers.

The idea here is to be a lot more lenient with inputs but to still chuck out obviously incorrect stuff while using the exact GB phone number rules. You're correct that a lot of software doesn't do range checking, but that's often down to a lack of detailed data available to construct the rules. I already have these detailed rules working in other software and that has led to much better data quality; many typos are spotted early and never make it into the database.

comment:24 Changed 3 years ago by bradpitcher@…

The new pull request is here

comment:25 Changed 2 years ago by claudep

  • Resolution set to invalid
  • Status changed from new to closed

comment:26 follow-up: Changed 16 months ago by anonymous

  • Resolution invalid deleted
  • Status changed from closed to new

So since the django-localflavor-gb repo has been shelved, this still isn't anywhere to be seen...

comment:27 in reply to: ↑ 26 Changed 16 months ago by bmispelon

  • Resolution set to wontfix
  • Status changed from new to closed

Replying to anonymous:

So since the django-localflavor-gb repo has been shelved, this still isn't anywhere to be seen...

Hi,

All the various django-localflavor-* have been merged into one project: django-localflavor.

Please direct your questions to them.

Thanks.

Note: See TracTickets for help on using tickets.
Back to Top