|Problems? Is your data what you think it is?|
regex'ing source codeby zuma53 (Beadle)
|on Jun 28, 2012 at 19:39 UTC||Need Help??|
zuma53 has asked for the
wisdom of the Perl Monks concerning the following question:
I have a regex puzzle I thought of posting.
I have some saved source code that I would like to extract and reformat to make it "look" like as it appeared in an editor in an HTML way (using <PRE> tags).
Embedded in the code are strings, spaces, tabs, and carriage returns. I thought of blindly replacing all tabs with 4/8 spaces, but then thought of the case where one or two spaces sit right before a tab (and then, what if they do). Because the spaces aren't at a tab boundary, the tab will take precedence as though the spaces don't exist. Also, how many spaces to insert also depends on what column the tab is at.
Examples (* = tabs => 4 spaces):
^**ab*$ next char at col 13 ^ **ab*$ next char at col 13, as the space is absorbed ^ * *ab *$ next char at col 17 ^ *here is some text***int;$ next char is at 37
I can do this by brute force, but going after this character by character, seems silly and moderately complex. (Though deciphering whatever regex is appropriate lies at the other end of the spectrum).
Most importantly, I don't want to reformat the lines "my way", as I want to retain identical side-by-side sameness.
What's the best way to approach this?