Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#28688 closed Bug (fixed)

Unicode slugs are not properly slugified due to javascript limitations

Reported by: Sævar Öfjörð Magnússon Owned by: Sævar Öfjörð Magnússon
Component: contrib.admin Version: 1.11
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When using unicode slugs, and the slug contains a unicode character and a word from the removelist in urlify.js right after it (e.g. the letter "a"), the unicode character is detected as a word boundary in javascript and the text that matches the removelist is stripped from the slug...
The removelist contains the following words:

var removelist = [
"a", "an", "as", "at", "before", "but", "by", "for", "from", "is",
"in", "into", "like", "of", "off", "on", "onto", "per", "since",
"than", "the", "this", "that", "to", "up", "via", "with"
];

And when I try to slugify this text: "Kaupa miða"
The result becomes: "kaupa-mið"

This can be tested using the following line in console:
URLify("Kaupa miða", 255, true)

(only when urlify.js is loaded, of course)

Change History (7)

comment:1 by Tim Graham, 7 years ago

Triage Stage: UnreviewedAccepted

comment:2 by Claude Paroz, 7 years ago

The solution could be to check if the string contains any non-ASCII char and simply skip the removelist removal, as the language is probably not English in that case.

comment:3 by Sævar Öfjörð Magnússon, 7 years ago

Owner: changed from nobody to Sævar Öfjörð Magnússon
Status: newassigned

comment:4 by Sævar Öfjörð Magnússon, 7 years ago

Has patch: set
Version 0, edited 7 years ago by Sævar Öfjörð Magnússon (next)

comment:5 by Claude Paroz, 7 years ago

Triage Stage: AcceptedReady for checkin

comment:6 by Tim Graham <timograham@…>, 7 years ago

Resolution: fixed
Status: assignedclosed

In f90be0a8:

Fixed #28688 -- Made admin's URLify.js skip removal of English words if non-ASCII chars are present.

comment:7 by Tim Graham <timograham@…>, 7 years ago

In 8b9a163:

Refs #28688 -- Updated a selenium test for admin's URLify.js change.

English words aren't removed if non-ASCII chars are present.

Note: See TracTickets for help on using tickets.
Back to Top