Opened 5 years ago

Closed 5 years ago

#30635 closed New feature (needsinfo)

Add feature to sanitize text include control characters

Reported by: Tatsuya Matoba Owned by: nobody
Component: Utilities Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Current, If Django send to feed text with control characters, Django raise the UnserializableContentError .

It was added by #20197.

I think a common solution for app developers who encounter this problem would be to sanitize the control characters.

I think it is desirable Django provide sanitizing feature for control characters.

For example, add the following function to django/utils/xmlutils.py :

def sanitize_control_charcters(text):
    return re.sub(r'[\x00-\x08\x0B-\x0C\x0E-\x1F]', " ", text)

If there is no problem with adding functions, I would like to send a PR.

Change History (1)

comment:1 by Carlton Gibson, 5 years ago

Resolution: needsinfo
Status: newclosed
Version: 2.2master

I'm not sure a util in Django is really called for here. (The thought is: Process your data prior to handing it off for serialization.)

Maybe we could include something but...

  1. Are there not already utilities available that do everything you want here (and more)? (Searching "Python xml escape control characters" seems to provide some suggestions.)
  2. The W3C doc suggests replacing replacing control characters with numeric codes. Should we not be doing this?
  3. Then if we are going to just strip the characters out, I'm not sure we need a utility just for that? (Back to the initial thought.)

I'll mark this Needs Info. Maybe a fuller suggestion presented to the DevelopersMailingList might be a way forward.

Note: See TracTickets for help on using tickets.
Back to Top