Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Read forward and backward - need help

by Limbic~Region (Chancellor)
on Jan 17, 2003 at 20:44 UTC ( [id://227789]=perlquestion: print w/replies, xml ) Need Help??

Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All:
How can I read a file until I come across a piece of information, , go backwards to the start of record, and then process record from start to finish.

In the past, this is how I have accomplished it:

  • 1. Read file ignoring lines until I see start of record
  • 2. Store all lines I read in until I encounter information
  • 3. If information not a match, throw out stored information and go back to first item
  • 4. If information is a match, continue reading/storing until end of record

    The problem is that this is slow and memory intensive depending on how large the record is and how far down is the information I am searching for. I would like to use seek or something to read ignoring everything until I encounter match, read backwards to start of record, and then process the whole record.

    Here is an example of what a file would look like:

    blank line
    START OF RECORD
      LINE 1
      INFORMATION I AM LOOKING FOR  
      LINE 3
      LINE 4
      LINE 5
      LINE 6
    END OF RECORD
    blank line
    START OF RECORD
      LINE 1
      LINE 2
      LINE 3
      LINE 4
      INFORMATION I AM LOOKING FOR  
      LINE 6
    END OF RECORD
    

    Any ideas?

    L~R

  • Replies are listed 'Best First'.
    Re: Read forward and backward - need help
    by gjb (Vicar) on Jan 17, 2003 at 21:00 UTC

      The following code should do what you want. Essentially, it's a finite state machine.

      Hope this helps, -gjb-

      use strict; use warnings; my $currentPos; my $processing = 0; while (<DATA>) { chomp($_); if ($_ eq q(START OF RECORD)) { $currentPos = tell(DATA); } elsif (/INFORMATION I AM LOOKING FOR/ && !$processing) { seek(DATA, $currentPos, 0); $processing = 1; } elsif ($_ eq q(END OF RECORD)) { $processing = 0; } elsif ($processing) { print "$_\n"; } } __DATA__ START OF RECORD LINE 1 INFORMATION I AM NOT LOOKING FOR LINE 3 LINE 4 LINE 5 LINE 6 END OF RECORD START OF RECORD LINE 1 ok INFORMATION I AM LOOKING FOR LINE 3 ok LINE 4 ok LINE 5 ok LINE 6 ok END OF RECORD START OF RECORD LINE 1 LINE 2 LINE 3 LINE 4 INFORMATION I AM NOT LOOKING FOR LINE 6 END OF RECORD START OF RECORD LINE 1 ok LINE 2 ok LINE 3 ok LINE 4 ok INFORMATION I AM LOOKING FOR LINE 6 ok END OF RECORD
        Thanks - I am sure a variation on this is exactly what I am looking for. Not that the other solutions were not also good, but the records will vary in size so modifying $/ isn't a viable option because of the memory issue.
    Re: Read forward and backward - need help
    by Fletch (Bishop) on Jan 17, 2003 at 20:56 UTC

      A couple of ideas off the cuff:

      • Set $/ and let perl handle reading records and then parse out what you need a whole recrod at a time
      • When you see the start of a record, use tell() to save off the location. Read enough to determine if the record's interesting, and seek() back to the start if it is
      • If the data you're looking at is relatively static, you could preprocess things and use a tied hash to store record number => tell() offeset; when you process it you just seek directly to the start of the record

      Of course then again, unless you're talking about really, really, really large amounts of information in each record you're probalby going to introduce more complexity than its worth trying to minimize memory usage than if you just kept state as you go along.

    Re: Read forward and backward - need help
    by BrowserUk (Patriarch) on Jan 17, 2003 at 21:13 UTC

      You could save some effort by reading in paragraph mode rather than line-by-line.

      #! perl -slw use strict; local $/ = ''; my $n = 0; while (<DATA>) { $n++; next unless /INFORMATION I AM LOOKING FOR 2/; print 'Found the info at record ', $n; print; } __DATA__ START OF RECORD LINE 1 INFORMATION I AM LOOKING FOR 1 LINE 3 LINE 4 LINE 5 LINE 6 END OF RECORD START OF RECORD LINE 1 LINE 2 LINE 3 LINE 4 INFORMATION I AM LOOKING FOR 2 LINE 6 END OF RECORD

      Gives

      C:\test>227789 Found the info at record 2 START OF RECORD LINE 1 LINE 2 LINE 3 LINE 4 INFORMATION I AM LOOKING FOR 2 LINE 6 END OF RECORD C:\test>

      Alternatively, setting $/ = 'INFORMATION I AM LOOKING FOR' then <read>ing once will get to the position of the end of that info. You then use seek to back up the read pointer by the size of a record (assuming they are consistant size) and then set $/ = '' and <read> twice. The second <read> should be your record.

      If you records are of wildly varing sizes you would have to backup by the size of the largest record and then <read> forward line-by-line ($/="\n") until you get a blank one, then set $/= '' and <read> should get you the whole record.


      Examine what is said, not who speaks.

      The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

    Re: Read forward and backward - need help
    by MZSanford (Curate) on Jan 17, 2003 at 20:59 UTC

      Depending on how rigid the format is, maybe something like this :

      use strict; $/ = "\n\n"; while (my $record = <>) { chomp($record); # $record contains all needed lines # from START OF RECORD to END OF RECORD }

      If you are planning to have 1000's of line between begining and end, and are really worried about memory usage, you may want to save each start when you encounter it, then if you find the data you want, seek() back. From what i can gather this is similar to what you do now, only without the full scan of the file

      Above code is untested and was typed into this page with no warranty or promise with the hope it will help. Misuse may result in premature baldness. Your milage may vary.

      Update : Fixed syntax error
      from the frivolous to the serious
    Re: Read forward and backward - need help
    by jimc (Sexton) on Jan 17, 2003 at 22:58 UTC
      your best bet, if youre gambling, is YAML.

      YAML can use indentation to carry structural info, which is as powerful as <XML> but instantly undertandable to humans too.

      The learning curve is possibly higher than custom hacks, which are helpful in understanding the problem, but the solution has legs for the long run.

    Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Domain Nodelet?
    Node Status?
    node history
    Node Type: perlquestion [id://227789]
    Approved by rozallin
    help
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this?Last hourOther CB clients
    Other Users?
    Others contemplating the Monastery: (4)
    As of 2024-04-23 19:40 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found