Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Change the behavior of Perl's IRS

by LighthouseJ (Sexton)
on Jul 14, 2007 at 18:39 UTC ( [id://626656]=perlquestion: print w/replies, xml ) Need Help??

LighthouseJ has asked for the wisdom of the Perl Monks concerning the following question:

I have a question that's been bugging me that I'd like to offer to the more experienced Monks out there. I'm interested in changing Perl's behavior with the input record seperator, specifically, I want Perl to handle it like it's at the beginning of a segment. Say I have input text that looks like this:
myrecordsep field1=item1 field2=item2 ... myrecordsep field1=item9 field2=item10 ...
Perl expects the input record separator to be at the end of a chunk of text but my data prints it at the beginning. When I set $/ = 'myrecordsep' and read chunks, as expected the first chunk is defined but zero-length, the last chunk isn't read at all and all the chunks in the middle are read fine of course. I have done some ugly corrective measures while using $/, and I've of course read each line of text in and gone "the long way". I haven't lost hope that there's a nice and clean way to use the chunked method (with $/) and without any ugly corrective code. Can anybody think of anything I haven't?
"The three principal virtues of a programmer are Laziness, Impatience, and Hubris. See the Camel Book for why." -- `man perl`

Replies are listed 'Best First'.
Re: Change the behavior of Perl's IRS
by ikegami (Patriarch) on Jul 14, 2007 at 18:56 UTC

    the last chunk isn't read at all

    eh? That shouldn't be. Perl returns whatever's after the last $/.

    hope that there's a nice and clean way [...] without any ugly corrective code.

    I don't know if you'll consider the following "nice and clean", but at least there's no "ugly corrective code" (or any corrective code at all) in the following. It employs a single-line lookahead.

    (Update: I've replaced the code I had here originally with a version that hides the guts in an iterator. It's longer, but the usage is much simpler.)

    Usage:

    my $rec_reader = make_rec_reader('myrecordsep'); while (my $rec = $rec_reader->($fh)) { print("Record\n"); print("======\n"); print "$_\n" for @$rec; print("\n"); }

    Guts:

    sub make_rec_reader { my ($sep) = @_; my $first = 1; my $line; my @rec; return sub { my ($fh) = @_; # Skip what's before first record. if ($first) { $first = 0; for (;;) { $line = <$fh>; last if not defined $line; chomp($line); last if $line eq $sep; } } while (defined($line)) { my @rec; for (;;) { push @rec, $line; $line = <$fh>; last if not defined $line; chomp($line); last if $line eq $sep; } return \@rec; } }; }
      But see, that's precisely what I'm trying to avoid, anything except the absolute minimum of code which I'd like to think Perl strives for.  All I want to do is write something like the following and have it work properly.
      { $/ = 'myrecordsep'; while (<DATA>) { # do the actual work on text here } } __DATA__ myrecordsep field1=item1 field2=item2 myrecordsep ...
      I want Perl to read a chunk at a time following that model, that's absolutely all I'm looking for. Like I mentioned before, I've written different scripts that utilized different methods but I'm exploring this particular avenue. I appreciate the attention to the problem though.
      "The three principal virtues of a programmer are Laziness, Impatience, and Hubris. See the Camel Book for why." -- `man perl`

        It's not complicated, just discard the first separator:

        #! perl use strict; { $/ = "myrecordsep\n"; scalar <DATA>; ##discard the first; while (<DATA>) { chomp; print "'$_'\n"; } } __DATA__ myrecordsep field1=item1 field2=item2 myrecordsep field1=item1 field2=item2 myrecordsep field1=item1 field2=item2 myrecordsep field1=item1 field2=item2

        Produces:

        C:\test>junk2 'field1=item1 field2=item2 ' 'field1=item1 field2=item2 ' 'field1=item1 field2=item2 ' 'field1=item1 field2=item2 '

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        But see, that's precisely what I'm trying to avoid, anything except the absolute minimum of code which I'd like to think Perl strives for.

        True. This is often done by placing reusable code in modules. I wrote the solution to be reusable so you could place it in a module. All that's left is two lines:

        my $rec_reader = make_rec_reader('myrecordsep'); while (my $rec = $rec_reader->($fh)) { ... }
Re: Change the behavior of Perl's IRS
by FunkyMonk (Chancellor) on Jul 14, 2007 at 20:10 UTC
    Can't you just discard the first line with your record separator, and the treat it as a normal separator?

    <DATA>; # discard first line $/ = "myrecordsep\n"; while ( <DATA> ) { #s{$/}{}; # get rid of your separator if you want print; print "---\n"; } __DATA__ myrecordsep field1=item1 field2=item2 ... myrecordsep field1=item9 field2=item10 ...

    Seems (to me) to do what you want.

Re: Change the behavior of Perl's IRS
by daxim (Curate) on Jul 14, 2007 at 20:20 UTC
    If you can afford to slurp the file into memory/into a scalar, split is very handy. Just one additional statement as corrective measure does not hurt clarity.
    $_ = q{stuff before myrecordsep field1=item1 field2=item2 ... myrecordsep field1=item9 field2=item10 ... }; @chunks = split /myrecordsep\n/; # /(?=myrecordsep\n)/ in case you want to keep the sep in the chunk Dump @chunks; shift @chunks; # discard the first if not needed __END__ $VAR1 = 'stuff before '; $VAR2 = 'field1=item1 field2=item2 ... '; $VAR3 = 'field1=item9 field2=item10 ...';
    (update: lookahead)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://626656]
Approved by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2025-03-25 22:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    When you first encountered Perl, which feature amazed you the most?










    Results (67 votes). Check out past polls.

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.