Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

regex'ing source code

by zuma53 (Beadle)
on Jun 28, 2012 at 19:39 UTC ( #978982=perlquestion: print w/ replies, xml ) Need Help??
zuma53 has asked for the wisdom of the Perl Monks concerning the following question:

Hi--

I have a regex puzzle I thought of posting.

I have some saved source code that I would like to extract and reformat to make it "look" like as it appeared in an editor in an HTML way (using <PRE> tags).
Embedded in the code are strings, spaces, tabs, and carriage returns. I thought of blindly replacing all tabs with 4/8 spaces, but then thought of the case where one or two spaces sit right before a tab (and then, what if they do). Because the spaces aren't at a tab boundary, the tab will take precedence as though the spaces don't exist. Also, how many spaces to insert also depends on what column the tab is at.

Examples (* = tabs => 4 spaces):

^**ab*$                       next char at col 13
^ **ab*$                      next char at col 13, as the space is absorbed
^  *  *ab    *$               next char at col 17
^ *here is some text***int;$  next char is at 37


I can do this by brute force, but going after this character by character, seems silly and moderately complex. (Though deciphering whatever regex is appropriate lies at the other end of the spectrum).

Most importantly, I don't want to reformat the lines "my way", as I want to retain identical side-by-side sameness.

What's the best way to approach this?

Thanks.

Comment on regex'ing source code
Re: regex'ing source code
by RichardK (Priest) on Jun 28, 2012 at 21:38 UTC

    You didn't say what language the source code is, so I'm guessing perl :)

    In which case, does perltidy -html do the right thing?

Re: regex'ing source code
by muba (Priest) on Jun 29, 2012 at 00:38 UTC

    It's not that complex. Just split your lines on tab characters, using split( /(\t)/, $line ). Those parens around \t make sure the tabs themself will be included in the elements of split, too. From there on, it's simple: print out every piece that's not a tab character, and keep track of how many characters you've printed so far. If you eventually do run into a tab character, you just print enough spaces to end up at a tab boundary. To make sure things keep running smoothly, you want to make sure that those spaces also count for the number of characters printed so far.

    Still sounds complicated? This is all it takes.

    use strict; use warnings; my @lines = map {s/@/\t/g; "$_\n"} split(/\n/, << 'EOF'); @foo bar baz quux @foo bar baz quux foo@bar baz quux foo @bar baz quux foo bar@baz quux foo bar @baz quux EOF my $tabwidth = 8; for my $line (@lines) { my $pos = 0; for my $part (split/(\t)/,$line) { if ($part eq "\t") { my $spaces = " " x ($tabwidth - ($pos % $tabwidth)); $pos += length($spaces); print $spaces; } else { $pos += length $part; print $part; } } } __END__ foo bar baz quux foo bar baz quux foo bar baz quux foo bar baz quux foo bar baz quux foo bar baz quux
      Ah, that missing tidbit of info!

      After the post, it dawned on me about using split. But then I realized that split spits out the split phrase. I never knew about the paren expression before. So much still left to learn about Perl.

      Thanks for the example!

        Well, even if split didn't had that handy bit of also returning the things it captured, you could've gone my @parts = $line =~ m/(\t|[^\t])/g;, which does essentially the same. With that regex you tell perl, "gimme an array of all tabs and all sequences of non-tabs."

Re: regex'ing source code
by monsoon (Pilgrim) on Jun 29, 2012 at 03:24 UTC

    How about this?

    while(/\t/g){ s/\t/$-[0]%4?" "x(4-$-[0]%4):" "x4/e; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://978982]
Approved by Perlbotics
Front-paged by ansh batra
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (11)
As of 2014-08-29 08:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (277 votes), past polls