Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Here-document extraction

by converter (Priest)
on Jan 14, 2002 at 03:09 UTC ( #138488=CUFP: print w/replies, xml ) Need Help??

Have you ever found yourself working on code with a lot of here-documents and wished you could read the here-documents outside the context of the source file?

I'm working on a CGI application with a lot of here-docs, and this little chunk of code is proving itself useful. The output includes tags with the subroutine name (if applicable, and line number) in which the here-document was found, and BEGIN:line, END:line tags (planning ahead for other uses of the output).

Running the code without a filename argument dups the DATA filehandle and runs the program against the test input in the __END__ section. The output from the test input:

[here-docs in source file: <DATA>] @@sub:bar:1@@ @@BEGIN:2@@ print <<'FOO'; blah FOO @@END:4@@ @@sub:ralph:7@@ @@BEGIN:7@@ sub ralph { print <<EOF } This is a test! Here's some more text. EOF @@END:10@@ @@BEGIN:13@@ $foo = <<STRING; blah blah blah blah. how's that? STRING @@END:16@@

Follow the Read More link for the code:

#!/usr/bin/perl # usage: grep.html-here-docs filename # usage: grep.html-here-docs # (without filename argument, runs # against test data in the __END__ section) # grep out all here-documents: # prints the source filename at the top of the output. # prints the last subroutine definition name seen # for each here-document. use warnings; use strict; use re 'eval'; my $R_SP = qq{[\x20\t]}; my $R_QUOTE = qq{[\'\"]}; my $TERM = ''; my $p = qr! ^ # at beginning of line [^\x23]*? # match one or more non-comment chars (?: print ${R_SP}* # match print, one or more space | # OR \w+ ${R_SP}* = ${R_SP}* # assignment ) << ${R_SP}* # begin here-doc, zero or more spaces (${R_QUOTE}?) # an optional quote character (\w+) # match/capture one or more word character +s (?(1) \1 ) # if a quote was matched, look for another +. (?(2) (?{$TERM = $2}) ) !x; my @sub; my $new_here_doc = 1; if (@ARGV) { open FH, $ARGV[0] or die "error opening input: $!"; } else { open FH, "<&DATA" or die "error duping DATA!: $!"; } while (<FH>) { if (1 == $.) { print "[here-docs in source file: @{[@ARGV ? $ARGV[ +0] : '<DATA>']}]\n\n" } if (/^[ \t]*sub[ \t]*(\w+)/) { @sub = ($1,$.); } if (my $s = /$p/../^${TERM}/) { if ((1 == $s) && @sub) { print "\n\@\@sub:$sub[0]:$sub[1]\@\@\n\n"; @sub = (); } if ($new_here_doc) { print "\@\@BEGIN:$.\@\@\n"; $new_here_doc = 0; } print; if ($s =~ /E0$/) { print "\@\@END:$.\@\@\n\n"; $TERM = ''; $new_here_doc = 1; } } } close FH; __END__ sub bar { print <<'FOO'; blah FOO } sub ralph { print <<EOF } This is a test! Here's some more text. EOF # try matching an assignment: $foo = <<STRING; blah blah blah blah. how's that? STRING

Update: Fixed thinko in error msg when dup DATA filehandle fails

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://138488]
Approved by root
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2018-03-23 09:21 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (289 votes). Check out past polls.