Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

How to include escape sequence characters in regular expressions?

by pat_mc (Pilgrim)
on Feb 12, 2014 at 19:35 UTC ( #1074672=perlquestion: print w/ replies, xml ) Need Help??
pat_mc has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed Monks,

I am currently extracting plain text from an InDesign file containing heaps of escape sequence characters. Yes, I am reading the file line by line (as if it were a text file) and am not reading it in in bin mode. This is because - apart frome the escape characters - the input file is quite human-readable.
I would like to write regular expessions for some of those escape sequences to capture specific patterns. My problem is that I don't know how to reference those escape characters in the regexes I am using.
I already figured out that the character displaying as 'NUL' in my text editor of the InDesign file is \000. I also found that \x should reference escape characters. However, I have not been able to find a systematic representation for 'EOT', 'DLE' and what have you. Is there something like \xABC I can use to specify the general escape character showing up as ABC in my text editor?

Your help will be much appreciated!

Kind regards -

Pat

Comment on How to include escape sequence characters in regular expressions?
Select or Download Code
Re: How to include escape sequence characters in regular expressions? (ddumper)
by Anonymous Monk on Feb 12, 2014 at 19:48 UTC
Re: How to include escape sequence characters in regular expressions?
by CountZero (Bishop) on Feb 12, 2014 at 19:56 UTC
    Did you try using the unicode "Control Character" property? You can select them using m/\pCc/ or select everything but them using m/\PCc/

    I have no idea if this will work as reading a binary file and assuming it is in unicode format is fraught with uncertainties. Hic leones!

    PS: If you have access to the InDesign program then you can export the text as an unformated text file (use File - Export). See: InDesign supported file formats

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: How to include escape sequence characters in regular expressions?
by Your Mother (Canon) on Feb 12, 2014 at 20:02 UTC

    I'm not a pro at this but here is where I might start playing around with it (the character classes suggested already was also a good idea).

    perl -Mcharnames=all -le 'printf "%5d -> \\x{%04x} -> %s\n", $_, $_, c +harnames::viacode($_) for 4,127' 4 -> \x{0004} -> END OF TRANSMISSION 127 -> \x{007f} -> DELETE

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1074672]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2014-08-31 08:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (294 votes), past polls