Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

How to include escape sequence characters in regular expressions?

by pat_mc (Pilgrim)
on Feb 12, 2014 at 19:35 UTC ( #1074672=perlquestion: print w/replies, xml ) Need Help??
pat_mc has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed Monks,

I am currently extracting plain text from an InDesign file containing heaps of escape sequence characters. Yes, I am reading the file line by line (as if it were a text file) and am not reading it in in bin mode. This is because - apart frome the escape characters - the input file is quite human-readable.
I would like to write regular expessions for some of those escape sequences to capture specific patterns. My problem is that I don't know how to reference those escape characters in the regexes I am using.
I already figured out that the character displaying as 'NUL' in my text editor of the InDesign file is \000. I also found that \x should reference escape characters. However, I have not been able to find a systematic representation for 'EOT', 'DLE' and what have you. Is there something like \xABC I can use to specify the general escape character showing up as ABC in my text editor?

Your help will be much appreciated!

Kind regards -


Replies are listed 'Best First'.
Re: How to include escape sequence characters in regular expressions?
by CountZero (Bishop) on Feb 12, 2014 at 19:56 UTC
    Did you try using the unicode "Control Character" property? You can select them using m/\pCc/ or select everything but them using m/\PCc/

    I have no idea if this will work as reading a binary file and assuming it is in unicode format is fraught with uncertainties. Hic leones!

    PS: If you have access to the InDesign program then you can export the text as an unformated text file (use File - Export). See: InDesign supported file formats


    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: How to include escape sequence characters in regular expressions?
by Your Mother (Chancellor) on Feb 12, 2014 at 20:02 UTC

    I'm not a pro at this but here is where I might start playing around with it (the character classes suggested already was also a good idea).

    perl -Mcharnames=all -le 'printf "%5d -> \\x{%04x} -> %s\n", $_, $_, c +harnames::viacode($_) for 4,127' 4 -> \x{0004} -> END OF TRANSMISSION 127 -> \x{007f} -> DELETE
Re: How to include escape sequence characters in regular expressions? (ddumper)
by Anonymous Monk on Feb 12, 2014 at 19:48 UTC

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1074672]
Approved by davido
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2017-06-28 10:46 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (632 votes). Check out past polls.