Opened 12 years ago

Closed 11 years ago

#18903 closed New feature (wontfix)

Add forms validation for GB telephone numbers.

Reported by: g1smd Owned by: nobody
Component: Forms Version: 1.4
Severity: Normal Keywords: Telephone Number Forms GB
Cc: bradpitcher@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: yes

Description

Full details can be found in https://github.com/django/django/pull/316

Change History (27)

comment:1 by g1smd, 12 years ago

I see that there are routines for validating telephone numbers in forms in Django for several countries.

The code for that can usually be found in the forms.py file located in the various country folders here:
https://github.com/django/django/tree/master/django/contrib/localflavor

So far, there is nothing for GB numbers, here:
https://github.com/django/django/tree/master/django/contrib/localflavor/gb

I've written a bunch of RegEx patterns to get this functionality started. The patterns are 100% fully tested. All that's needed is a few lines of python logic to string them together. The details can be found at:
https://github.com/django/django/pull/316/files

My python foo is almost zero. Anyone care to have a go at getting this to work?

RegEx 1 checks the user entered something that looks like a GB telephone number:
020 3000 5555
02075 567 234
0114 223 4567
01145 345 567
+44 1213 456 789
00 44 (0) 1697 73555
011 44 11 4890 2345
and several other formats, without worrying if the format is correct for this particular number (but read on). It allows for national or international format, even for two common international dial prefixes. What is most important is that the user enters the right number of digits. Don't constrain the user to use a particular format for entry.
"Be lenient in what you accept, be strict in what you send." (Postel's Law)

RegEx 2 extracts the NSN part of the number in $3, with "44" or NULL in $2 (so you know if international or national format was used on input), and any extension in $4. Store $2 and $4 for later use. Send $3 on to RegEx 3.

RegEx 3 tests the NSN part of the number is in a valid range and has the right number of digits for that range (GB numbers can be 9 or 10 digits long). This RegEx pattern is very detailed. You can say that a number is possible or is invalid with this RegEx.

RegEx Group 4. Here, there's a bunch of RegEx patterns that specify how a GB number should be formatted, detailed by number length and initial digits. These rules cover all GB numbers.

The last bit is to add the 0 or +44 back on, and append the original extension number (if present) and present it back to the user.

020 3000 5555 => Valid: 020 3000 5555
02075 567 234 => Valid: 020 7556 7234
0114 223 4567 => Valid: 0114 223 4567
01145 345 567 => Valid: 0114 534 5567
+44 1213 456 789 => Valid: +44 121 345 6789
00 44 (0) 1697 73555 => Valid: +44 16977 3555
011 44 11 4890 2345 => Valid: +44 114 890 2345
0623 111 3456 => NOT VALID

comment:2 by g1smd, 12 years ago

Has patch: set
Patch needs improvement: set

comment:3 by g1smd, 12 years ago

UI/UX: set

comment:4 by g1smd, 12 years ago

Needs tests: set

comment:5 by g1smd, 12 years ago

Keywords: Telephone Number Forms GB added

comment:7 by Brad Pitcher, 12 years ago

Has patch: unset
Needs documentation: set
Owner: changed from nobody to Brad Pitcher
Patch needs improvement: unset
Status: newassigned

comment:8 by Brad Pitcher, 12 years ago

Has patch: set
Needs documentation: unset
Needs tests: unset
Owner: changed from Brad Pitcher to nobody
Patch needs improvement: set
Status: assignednew

A working patch can be found here:
https://github.com/django/django/pull/356

The patch still needs improvement, however the tests do pass.

What format should clean return the number in? Now it is returning +44xxxxxxx. Should it be more like +44 xxx xxx xxxx?

Also, what's the best practice with long regular expressions? Really long single lines or multiple lines with indentation for structure?

comment:9 by Brad Pitcher, 12 years ago

Cc: bradpitcher@… added

comment:10 by g1smd, 12 years ago

I prefer the multi-line format RegEx for clarity. I have changed to that format as well as altering the patterns slightly in a later commit in my repo at g1smd/django.

Thanks for your fantastic effort in finishing the conversion to python. I have this code working in Java and PHP, but the python syntax was just too baffling. :)

As you are no doubt aware from studying the code, it allows a GB phone number such as London +44 20 3000 5555 to be entered as 020 3000 5555, 020-3000-5555, 0203 000 5555, 02030 005 555, (020) 3000 5555, (0203) 000 5555, (02030) 005 555, (+44) (20) 3000 5555, (00) (44) (20) 3000 5555, (011) (44) (20) 3000 5555, (+44)-(20)-3000-5555 and a variety of other formats - with or without spaces, hyphens or brackets.

It doesn't constrain the user to enter the number in the right format. It allows the user to make basic formatting errors and add a range of punctuation (which is then stripped out). What is important is that the phone number is entered with the right number of digits. This is thoroughly checked by a later very detailed RegEx pattern.

The detailed RegEx pattern also checks the number is in an area code or number range that actually exists.

For valid numbers, another set of RegEx patterns re-format the number in the right way for the number range and number length in question. I have added a commit that puts the correct spaces in the number formatting, and other improvements.

By splitting format checking, NSN extraction, number range and length checking, and the number formatting into separate processes much more detailed checks can be made at each stage.

Finally I have added a comment block listing accepted number formats for entry.

Please copy the updated code at g1smd/django to your repo. Thanks again!

comment:11 by g1smd, 12 years ago

Patch needs improvement: unset

Progess in the last few days has seen the code syntax fixed up, a couple of bugs squashed, and some tests added and passed. The RegEx patterns were also further enhanced.

comment:12 by Brad Pitcher, 12 years ago

Triage Stage: UnreviewedAccepted

comment:13 by g1smd, 12 years ago

We think this is now 'Ready for Checkin' but will leave it to a third party to make that call officially.

comment:14 by anonymous, 12 years ago

Easy pickings: set

comment:15 by Andrew Godwin, 12 years ago

Goodness me, that's one hell of a regular expression. Phone numbers probably need it, though.

My main worry is the enforcement of spacing rules - is that really necessary? If there's one thing people get wrong it's where to put the spaces in phone numbers (e.g. 0207 123 4567), and I feel that it's better to strip out the spaces and just do range checks.

comment:16 by Brad Pitcher, 12 years ago

I know, it's a wicked regex, but I don't think spacing is enforced. Maybe @g1smd can confirm, but if you look at the tests (https://github.com/brad/django/blob/ticket_18903/tests/regressiontests/localflavor/gb/tests.py#L36) there are a few with spaces in different places and the regex doesn't care.

comment:17 by g1smd, 12 years ago

Yep. Some people in London (for example) seem to think their numbers are 0203 000 5000 rather than 020 3000 5000. However, the 'faulty' number has the right number of digits. It would work if dialled. Let users enter it in the wrong format, then tidy it up before presenting it back to them. However, always reject 020 4000 5000 because no London numbers begin with a 4, and reject 022 3000 5000 because the 022 area code is not allocated.

comment:18 by g1smd, 12 years ago

The first RegEx rejects entries that look nothing like a GB phone number (especially those with the wrong number of digits or an invalid prefix), while allowing a wide range of entries that do. Allowed formats include a range of prefixes such as 0, +44, 00 44, 011 44, each with or without spaces or hyphens and/or brackets.

The remaining part of the number can be in any common number format with or without spaces or hyphens, and with or without brackets around the area code. The RegEx doesn't care if the format is the wrong format for this number. All it cares about is that it has the right number of digits.

The later RegEx patterns extract the NSN part, then check that the area code, number range, and length of number for this range are all correct.

01750 61555 is rejected (too short). 01750 62555 is accepted ("4+5").
01750 610555 is accepted ("4+6").
01750 6100555 is rejected (too long). 01750 6255 is rejected (too short).

01750 610555 can be entered as 017 5061 0555 or 0175 061 0555 or 017506 10555 and will be accepted.
01750 62555 can be entered as 0175 062555 or 017506 2555 and will be accepted.
These latter examples will be reformatted for display.

comment:19 by Andrew Godwin, 12 years ago

Alright, spacing does matter - while my london example works due to the other area code rules, this does not: '014 153 455 67'.

Over-validation is as bad as no validation - I'm happy to accept a patch that checks the number ranges but having the format hard-enforced like this is just going to annoy people.

comment:20 by g1smd, 12 years ago

There's a good chance that someone entering that number in that format is entering a number from Belguim, not the UK. Spacing only matters when it's something that doesn't even look like a UK number.

comment:21 by g1smd, 12 years ago

The only constraint I placed was "make it look something like a UK phone number". Most people are aware that UK numbers are entered as 2+8, 3+7, 4+6 or 5+5 for 10 digit numbers or 3+6, 4+5 or 5+4 for 9 digit numbers, but not many people are aware of which format goes with which area code. The RegEx here allows any of those formats with any number and various punctuation options. Numbers can, of course, be entered with no spaces or punctuation at all.

Merely checking the number of digits in the NSN part would allow people to enter

(011)-(44)-(0)-(((2 -(- 0)-((---)))()()()--5- --3 (---4() --0-- )))----((2)(()- )3)))----3-- 7

and that number still be accepted. Do you really want that? There's a point at which the user input should be rejected as garbage.

comment:22 by Aymeric Augustin, 12 years ago

While I appreciate the effort that went into this patch, I'm surprised by how deep it goes into the territory of specialized validation and formatting tools such as phonenumbers. I don't know any local phone number field with such minute validation in Django.

I'm not against this patch — especially given the BDFL's plan to split localflavor and rely on national communities for maintenance. I'm just pointing out a possible mismatch with user expectations regarding localflavor.

comment:23 by g1smd, 12 years ago

A lot of software seems to use this RegEx http://regexlib.com/REDetails.aspx?regexp_id=589 to validate GB phone numbers. It validates 2+8, 3+7 and 4+6 format numbers and numbers without spaces but doesn't match valid 4+5, 5+5, 5+4 and 3+6 numbers. Other software requires that no spaces or punctuation be entered at all. Some require a 0 plus exactly ten digits, ignoring the fact that 41 area codes (plus 0500 and some of 0800) also have 9 digit numbers.

The idea here is to be a lot more lenient with inputs but to still chuck out obviously incorrect stuff while using the exact GB phone number rules. You're correct that a lot of software doesn't do range checking, but that's often down to a lack of detailed data available to construct the rules. I already have these detailed rules working in other software and that has led to much better data quality; many typos are spotted early and never make it into the database.

comment:24 by bradpitcher@…, 12 years ago

The new pull request is here

comment:25 by Claude Paroz, 12 years ago

Resolution: invalid
Status: newclosed

comment:26 by anonymous, 11 years ago

Resolution: invalid
Status: closednew

So since the django-localflavor-gb repo has been shelved, this still isn't anywhere to be seen...

in reply to:  26 comment:27 by Baptiste Mispelon, 11 years ago

Resolution: wontfix
Status: newclosed

Replying to anonymous:

So since the django-localflavor-gb repo has been shelved, this still isn't anywhere to be seen...

Hi,

All the various django-localflavor-* have been merged into one project: django-localflavor.

Please direct your questions to them.

Thanks.

Note: See TracTickets for help on using tickets.
Back to Top