Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Reading "slices" out of a file at known start markers

by hacker (Priest)
on Oct 03, 2008 at 04:32 UTC ( #715130=perlquestion: print w/ replies, xml ) Need Help??
hacker has asked for the wisdom of the Perl Monks concerning the following question:

I have a need to read "slices" out of a text file (a Makefile), based on known values that appear within the file. For example:

YACCCOMPILE = $(YACC) $(YFLAGS) $(AM_YFLAGS) LTYACCCOMPILE = $(LIBTOOL) $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) + \ --mode=compile $(YACC) $(YFLAGS) $(AM_YFLAGS) YLWRAP = $(top_srcdir)/ylwrap SOURCES = $(libpiuserland_la_SOURCES) $(pilot_addresses_SOURCES) + \ $(pilot_clip_SOURCES) $(pilot_csd_SOURCES) + \ $(pilot_debugsh_SOURCES) $(pilot_dedupe_SOURCES) + \ $(pilot_dlpsh_SOURCES) $(pilot_file_SOURCES) + \ $(pilot_foto_SOURCES) $(pilot_foto_treo600_SOURCES) + \ $(pilot_foto_treo650_SOURCES) $(pilot_getram_SOURCES) + \ $(pilot_getrom_SOURCES) $(pilot_getromtoken_SOURCES) + \ $(pilot_hinotes_SOURCES) $(pilot_install_datebook_SOURCES) + \ $(pilot_install_expenses_SOURCES) + \ $(pilot_install_hinote_SOURCES) $(pilot_install_memo_SOURCES) + \ $(pilot_install_netsync_SOURCES) $(pilot_install_todo_SOURCES) + \ $(pilot_install_todos_SOURCES) $(pilot_install_user_SOURCES) + \ $(pilot_memos_SOURCES) $(pilot_nredir_SOURCES) + \ $(pilot_read_expenses_SOURCES) $(pilot_read_ical_SOURCES) + \ $(pilot_read_notepad_SOURCES) $(pilot_read_palmpix_SOURCES) + \ $(pilot_read_screenshot_SOURCES) $(pilot_read_todos_SOURCES) + \ $(pilot_read_veo_SOURCES) $(pilot_reminders_SOURCES) + \ $(pilot_schlep_SOURCES) $(pilot_wav_SOURCES) + \ $(pilot_xfer_SOURCES) RECURSIVE_TARGETS = all-recursive check-recursive dvi-recursive + \ html-recursive info-recursive install-data-recursive + \ install-dvi-recursive install-exec-recursive + \ install-html-recursive install-info-recursive + \ install-pdf-recursive install-ps-recursive install-recursive + \ installcheck-recursive installdirs-recursive pdf-recursive + \ ps-recursive uninstall-recursive
In this example, I want everything from ^SOURCES until the beginning of the next "block" that starts with ^RECURSIVE_TARGETS.

I know what my ^MARKERS will be in-advance, but I don't know how many lines will follow them until the next section that begins with another marker I may or may not need.

One of the problems is that there can be any number of spaces/tabs in the middle of, or at the end of each "continuation" line (or no continuation delimiters at all), and the "block" can be continued for anywhere from 1 line to dozens of lines.

Any idea how I would approach this problem in a clean, simplistic way? Is there enough detail provided here to understand my problem?

Comment on Reading "slices" out of a file at known start markers
Select or Download Code
Re: Reading "slices" out of a file at known start markers
by CountZero (Bishop) on Oct 03, 2008 at 04:55 UTC
    The RANGE-operator will do the trick here:
    use strict; while (<DATA>) { print if m/^SOURCES/ .. m/^RECURSIVE_TARGETS/ and not m/^RECURSIVE +_TARGETS/; } __DATA__ YACCCOMPILE = $(YACC) $(YFLAGS) $(AM_YFLAGS) LTYACCCOMPILE = $(LIBTOOL) $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) + \ --mode=compile $(YACC) $(YFLAGS) $(AM_YFLAGS) YLWRAP = $(top_srcdir)/ylwrap SOURCES = $(libpiuserland_la_SOURCES) $(pilot_addresses_SOURCES) + \ $(pilot_clip_SOURCES) $(pilot_csd_SOURCES) + \ $(pilot_debugsh_SOURCES) $(pilot_dedupe_SOURCES) + \ $(pilot_dlpsh_SOURCES) $(pilot_file_SOURCES) + \ $(pilot_foto_SOURCES) $(pilot_foto_treo600_SOURCES) + \ $(pilot_foto_treo650_SOURCES) $(pilot_getram_SOURCES) + \ $(pilot_getrom_SOURCES) $(pilot_getromtoken_SOURCES) + \ $(pilot_hinotes_SOURCES) $(pilot_install_datebook_SOURCES) + \ $(pilot_install_expenses_SOURCES) + \ $(pilot_install_hinote_SOURCES) $(pilot_install_memo_SOURCES) + \ $(pilot_install_netsync_SOURCES) $(pilot_install_todo_SOURCES) + \ $(pilot_install_todos_SOURCES) $(pilot_install_user_SOURCES) + \ $(pilot_memos_SOURCES) $(pilot_nredir_SOURCES) + \ $(pilot_read_expenses_SOURCES) $(pilot_read_ical_SOURCES) + \ $(pilot_read_notepad_SOURCES) $(pilot_read_palmpix_SOURCES) + \ $(pilot_read_screenshot_SOURCES) $(pilot_read_todos_SOURCES) + \ $(pilot_read_veo_SOURCES) $(pilot_reminders_SOURCES) + \ $(pilot_schlep_SOURCES) $(pilot_wav_SOURCES) + \ $(pilot_xfer_SOURCES) RECURSIVE_TARGETS = all-recursive check-recursive dvi-recursive + \ html-recursive info-recursive install-data-recursive + \ install-dvi-recursive install-exec-recursive + \ install-html-recursive install-info-recursive + \ install-pdf-recursive install-ps-recursive install-recursive + \ installcheck-recursive installdirs-recursive pdf-recursive + \ ps-recursive uninstall-recursive

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      This is an interesting solution, however... I might have some use cases where this will fail, for example if SOURCES = ... happens to be the very last block in the file (i.e. there IS no next marker, other than EOF).

      Or, if the next marker is also one I might want to grab in a different pass, but I don't know the order of them to know what order to put the /BEGIN/ and /END/ range markers in.

      I'll hit this with a few broad examples and see if it works when I intentionally try to break it with different constructs.

        Responding to my own question: I think I may have figured out an easier way:
        use strict; use warnings; use Data::Dump qw(dump ddx);; use Config::General; my $conf = new Config::General("Makefile"); my %config = $conf->getall; print dump(\%config);

        This puts each item as key/value pairs in the %config hash, which I can them print out and manipulate as $config{'SOURCES'} for example.

        I guess the only thing I end up losing here, is the basic original formatting, but I can probably iterate through the hash and reconstruct that back somehow.

        Anyone familiar with the innards of Config::General that can lend a hand so I can use it correctly?

        I might have some use cases where this will fail, for example if SOURCES = ... happens to be the very last block in the file (i.e. there IS no next marker, other than EOF).
        The range operator will still work as expected since this operator returns TRUE once the first test succeeds until the second test succeeds, which by definition will never happen as there will be no next marker. Hence it will print from the first marker until EOF.

        For multiple blocks, it will be easiest to rewind the file and start again with new begin and end markers. Unless your MAKEFILE is huge that would not cost you a lot of time and will keep the logic of your program easy to understand and maintain.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Reading "slices" out of a file at known start markers
by Tanktalus (Canon) on Oct 03, 2008 at 05:11 UTC

    I'd likely try something entirely different:

    #!/usr/bin/perl use strict; use warnings; my @file; while(<DATA>) { # if we end in a backslash, add the next line... # (more error checking probably is required if we're at eof, # but I'll leave that as an excersise for the reader ;-) while (s/\\\s*\Z//) { $_ .= <DATA> } push @file, $_; } print @file; __DATA__ YACCCOMPILE = $(YACC) $(YFLAGS) $(AM_YFLAGS) LTYACCCOMPILE = $(LIBTOOL) $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) + \ --mode=compile $(YACC) $(YFLAGS) $(AM_YFLAGS) YLWRAP = $(top_srcdir)/ylwrap SOURCES = $(libpiuserland_la_SOURCES) $(pilot_addresses_SOURCES) + \ $(pilot_clip_SOURCES) $(pilot_csd_SOURCES) + \ $(pilot_debugsh_SOURCES) $(pilot_dedupe_SOURCES) + \ $(pilot_dlpsh_SOURCES) $(pilot_file_SOURCES) + \ $(pilot_foto_SOURCES) $(pilot_foto_treo600_SOURCES) + \ $(pilot_foto_treo650_SOURCES) $(pilot_getram_SOURCES) + \ $(pilot_getrom_SOURCES) $(pilot_getromtoken_SOURCES) + \ $(pilot_hinotes_SOURCES) $(pilot_install_datebook_SOURCES) + \ $(pilot_install_expenses_SOURCES) + \ $(pilot_install_hinote_SOURCES) $(pilot_install_memo_SOURCES) + \ $(pilot_install_netsync_SOURCES) $(pilot_install_todo_SOURCES) + \ $(pilot_install_todos_SOURCES) $(pilot_install_user_SOURCES) + \ $(pilot_memos_SOURCES) $(pilot_nredir_SOURCES) + \ $(pilot_read_expenses_SOURCES) $(pilot_read_ical_SOURCES) + \ $(pilot_read_notepad_SOURCES) $(pilot_read_palmpix_SOURCES) + \ $(pilot_read_screenshot_SOURCES) $(pilot_read_todos_SOURCES) + \ $(pilot_read_veo_SOURCES) $(pilot_reminders_SOURCES) + \ $(pilot_schlep_SOURCES) $(pilot_wav_SOURCES) + \ $(pilot_xfer_SOURCES) RECURSIVE_TARGETS = all-recursive check-recursive dvi-recursive + \ html-recursive info-recursive install-data-recursive + \ install-dvi-recursive install-exec-recursive + \ install-html-recursive install-info-recursive + \ install-pdf-recursive install-ps-recursive install-recursive + \ installcheck-recursive installdirs-recursive pdf-recursive + \ ps-recursive uninstall-recursive
      We can avoid pushing into an array @file if the goal is just to print it out:
      while(<DATA>) { chomp; # find out if there's a line continuation # and remove the trailing backslash my $cont = s/\\\s*$//; # print without newline, no matter what print; # decide whether to end print with newline print "\n" unless $cont; }

      Update: Actually, a simple substitution would suffice:
      while (<DATA>) { s/\\\s*\n//; print; }
Re: Reading "slices" out of a file at known start markers
by smiffy (Pilgrim) on Oct 03, 2008 at 06:43 UTC

    OK, here's yet another way of doing it. This uses a state machine (or at least that's what I think is what is meant by a state machine.)

    I've tried to take it a bit further to show that this approach can be quite flexible - I've got it to pull out the YACCCOMPILE lines as well to show that you can look for more than one thing and have also added dummy YACCCOMPILE lines to show that you can actually pick up further lines as they occur.

    You can, of course, get this to look for as many start points as you want. The code could probably be reduced, but attempting to do so would probably detract from the clarity of the example.

    Hope this helps!

    use strict; use warnings; my $state=0; my ($yacccompile,$sources); while (<DATA>) { if ($state==0) { # Wait until we see SOURCES. if ($_=~/^SOURCES/) { # Move to next state, re-read line # in that next state. $state=1; redo; } # ...or we see YACCCOMPILE elsif ($_=~/^YACCCOMPILE/) { $state=2; redo; } } elsif ($state==1) { # See if we have reached another section: if ($_=~/^[A-Z]/ && $_!~/^SOURCES/ ) { $state=0; redo; } else { # Supress blank lines. next unless $_=~/\w/; # Add the line to what we have so far. $sources.=$_; } } elsif ($state==2) { # See if we have reached another section: if ($_=~/^[A-Z]/ && $_!~/^YACCCOMPILE/ ) { $state=0; redo; } else { # Supress blank lines. next unless $_=~/\w/; # Add the line to what we have so far. $yacccompile.=$_; } } } # Now print it all out with some titles. print "SOURCES\n-------\n\n"; print $sources; print "\n"; print "YACCCOMPILE\n----------\n\n"; print $yacccompile; print "\n"; __DATA__ YACCCOMPILE = $(YACC) $(YFLAGS) $(AM_YFLAGS) LTYACCCOMPILE = $(LIBTOOL) $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) + \ --mode=compile $(YACC) $(YFLAGS) $(AM_YFLAGS) YLWRAP = $(top_srcdir)/ylwrap SOURCES = $(libpiuserland_la_SOURCES) $(pilot_addresses_SOURCES) + \ $(pilot_clip_SOURCES) $(pilot_csd_SOURCES) + \ $(pilot_debugsh_SOURCES) $(pilot_dedupe_SOURCES) + \ $(pilot_dlpsh_SOURCES) $(pilot_file_SOURCES) + \ $(pilot_foto_SOURCES) $(pilot_foto_treo600_SOURCES) + \ $(pilot_foto_treo650_SOURCES) $(pilot_getram_SOURCES) + \ $(pilot_getrom_SOURCES) $(pilot_getromtoken_SOURCES) + \ $(pilot_hinotes_SOURCES) $(pilot_install_datebook_SOURCES) + \ $(pilot_install_expenses_SOURCES) + \ $(pilot_install_hinote_SOURCES) $(pilot_install_memo_SOURCES) + \ $(pilot_install_netsync_SOURCES) $(pilot_install_todo_SOURCES) + \ $(pilot_install_todos_SOURCES) $(pilot_install_user_SOURCES) + \ $(pilot_memos_SOURCES) $(pilot_nredir_SOURCES) + \ $(pilot_read_expenses_SOURCES) $(pilot_read_ical_SOURCES) + \ $(pilot_read_notepad_SOURCES) $(pilot_read_palmpix_SOURCES) + \ $(pilot_read_screenshot_SOURCES) $(pilot_read_todos_SOURCES) + \ $(pilot_read_veo_SOURCES) $(pilot_reminders_SOURCES) + \ $(pilot_schlep_SOURCES) $(pilot_wav_SOURCES) + \ $(pilot_xfer_SOURCES) YACCCOMPILE = niddle naddle noo RECURSIVE_TARGETS = all-recursive check-recursive dvi-recursive + \ html-recursive info-recursive install-data-recursive + \ install-dvi-recursive install-exec-recursive + \ install-html-recursive install-info-recursive + \ install-pdf-recursive install-ps-recursive install-recursive + \ installcheck-recursive installdirs-recursive pdf-recursive + \ ps-recursive uninstall-recursive YACCCOMPILE = foo bar baz
Re: Reading "slices" out of a file at known start markers
by dragonchild (Archbishop) on Oct 03, 2008 at 13:30 UTC
    I'm appalled at everyone who answered. You're not the first to have wanted to do this. Makefile::Parser is on CPAN. If it doesn't do everything you need, it at least has the beginnings and (presumably) a test suite so that you can extend it.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      TIMTOWTDI. I personally enjoyed reading all of these techniques.
Re: Reading "slices" out of a file at known start markers
by jwkrahn (Monsignor) on Oct 03, 2008 at 14:29 UTC

    Another way to do it:

    while ( <FH> ) { if ( /^SOURCES/ && /\\$/ ) { $_ .= <FH>; redo; } elsif ( /^SOURCES/ ) { print; } }

    And yet another way:

    my $data; while ( <FH> ) { $data .= $_ if /^SOURCES/ || $data && /^\s/; last if $data && !/\S/; } print $data;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://715130]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (12)
As of 2014-07-22 21:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (128 votes), past polls