Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Hi, bobdabuilda

You've given this much thought, and I think you're pseudocode is on target.

The orders are only separated by a blank line, but they all start wth the "Order ID:" text, so looking at using that as the separator.

The "Order ID:" as record separator makes sense.

The page header should be automatically filtered out by the regex the way it stands anyway... I think.

You're correct.

I've taken the liberty to implement an interpretation of this. It does use two loops, but the outer loop is a for loop that iterates over an array of Order records:

use strict; use warnings; use Data::Dumper; # Place a filename into $recordsFile to read Orders from that file # else the Orders below __DATA__ will be used for demo purposes my $recordsFile = ''; my ( @records, @orders ); my $recSeparator = 'Order ID:'; # Orders will initially be array elements 1 .. n in @orders; element 0 + is initially the first page header { # Set the record separator local $/ = $recSeparator; # If there's a file name, try to read from that file if ($recordsFile) { open my $fh, '<', $recordsFile or die $!; @records = <$fh>; close $fh; } else { @records = <DATA>; } } # Remove the first page header shift @records; # Add Order ID: back into each record for later matching $_ = "$recSeparator$_" for @records; # Iterate through each record (Order) for my $record (@records) { my %hash; # Treat the record string like a file, opening it for reading open my $sh, '<', \$record or die "Unable to open record string: $ +!"; # Read the string like a file, one line at a time now while (<$sh>) { $hash{orderID} //= do { /Order ID:(\S+)/; $1 }; $hash{fiscalCycle} //= do { /cycle:(\d+)/; $1 }; $hash{vendorID} //= do { /Vendor ID:(\S+)/; $1 }; $hash{requisitionNum} //= do { /\s+(\d+).+requisition/; $1 }; $hash{copies} //= do { /copies:(\d+)/; $1 }; $hash{title} //= do { /Title:(.+)/; $1 }; $hash{'ISBN/ISSN'} //= do { m{ISBN/ISSN:(\S+)}; $1 }; # Distributions started? if (/Distribution--/) { # Save the current record separator my $oldRecSeparator = $/; # Set a new record separator local $/ = 'Distribution--'; # Read the string like a file, a distribution 'chunk' at a + time while (<$sh>) { my %tempHash; ( $tempHash{holdingCode} ) = /code:(\S+)/; ( $tempHash{copies} ) = /copies:(\d+)/; ( $tempHash{dateReceived} ) = /received:(\S+)/; ( $tempHash{dateLoaded} ) = /loaded:(\S+)/; push @{ $hash{distribution} }, \%tempHash; } # Restore the old record separator $/ = $oldRecSeparator; } } # Work with the filled-in %hash by sending a reference to it to a +subroutine # This is a complete record writeToSpreadSheet( \%hash ); print Dumper \%hash; # Done 'reading' the string close $sh; } # Printing in a subroutine's not a good idea, but done here only to sh +ow how to access the hash sub writeToSpreadSheet { my ($hashReference) = @_; # The $$ notation dereferences the hash reference print $$hashReference{vendorID}, "\n"; # The @{} notation deferences the array reference; the arrow opera +tor deferences to get hash value for my $distribution ( @{ $$hashReference{distribution} } ) { print $distribution->{holdingCode}, "\n"; } print "\n"; } __DATA__ List of Distributions + + Produced Tuesday, 9 October, 2012 at 1:38 PM + Order ID:PO-9999 fiscal cycle:21112 Vendor ID:VEND99 order type:SUBSCRIPT 15) requisition number: copies:9 call number:XX(9999999.999) ISBN/ISSN:9999-999X Title:Item title here. ISSN:9999-999X Publication info:More text here about stuff Distribution-- packing list:STUFF-I-DONT-NEED-999 holding code:CODEINFO1 copies:1 date received:27/6/2012 date lo +aded:27/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-999 holding code:CODEINFO3 copies:2 date received:27/9/2012 date lo +aded:27/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-999 holding code:CODEINFO2 copies:1 date received:25/8/2012 date lo +aded:27/6/2012 List of Distributions + + Produced Tuesday, 9 October, 2012 at 1:38 PM + Order ID:PO-1111 fiscal cycle:21112 Vendor ID:VEND11 order type:SUBSCRIPT 15) requisition number: copies:417 call number:XX(11111111.111) ISBN/ISSN:1111-111X Title:Item title here. ISSN:9999-999X Publication info:More text here about stuff Distribution-- packing list:STUFF-I-DONT-NEED-111 holding code:CODEINFO9 copies:5 date received:11/6/2012 date lo +aded:12/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-111 holding code:CODEINFO8 copies:4 date received:11/9/2012 date lo +aded:12/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-111 holding code:CODEINFO7 copies:3 date received:11/8/2012 date lo +aded:12/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-111 holding code:CODEINFO6 copies:2 date received:11/8/2012 date lo +aded:12/6/2012

Output

VEND99 CODEINFO1 CODEINFO3 CODEINFO2 $VAR1 = { 'vendorID' => 'VEND99', 'copies' => '9', 'fiscalCycle' => '21112', 'distribution' => [ { 'dateLoaded' => '27/6/2012', 'dateReceived' => '27/6/2012', 'copies' => '1', 'holdingCode' => 'CODEINFO1' }, { 'dateLoaded' => '27/6/2012', 'dateReceived' => '27/9/2012', 'copies' => '2', 'holdingCode' => 'CODEINFO3' }, { 'dateLoaded' => '27/6/2012', 'dateReceived' => '25/8/2012', 'copies' => '1', 'holdingCode' => 'CODEINFO2' } ], 'ISBN/ISSN' => '9999-999X', 'title' => 'Item title here.', 'orderID' => 'PO-9999', 'requisitionNum' => '15' }; VEND11 CODEINFO9 CODEINFO8 CODEINFO7 CODEINFO6 $VAR1 = { 'vendorID' => 'VEND11', 'copies' => '417', 'fiscalCycle' => '21112', 'distribution' => [ { 'dateLoaded' => '12/6/2012', 'dateReceived' => '11/6/2012', 'copies' => '5', 'holdingCode' => 'CODEINFO9' }, { 'dateLoaded' => '12/6/2012', 'dateReceived' => '11/9/2012', 'copies' => '4', 'holdingCode' => 'CODEINFO8' }, { 'dateLoaded' => '12/6/2012', 'dateReceived' => '11/8/2012', 'copies' => '3', 'holdingCode' => 'CODEINFO7' }, { 'dateLoaded' => '12/6/2012', 'dateReceived' => '11/8/2012', 'copies' => '2', 'holdingCode' => 'CODEINFO6' } ], 'ISBN/ISSN' => '1111-111X', 'title' => 'Item title here.', 'requisitionNum' => '15', 'orderID' => 'PO-1111' };

Included a subroutine and a call to it that shows how to handle accessing the hash a record at a time.

The code is commented, to assist with understanding it.

Let me know if you have any questions about this...

Enjoy!


In reply to Re^7: How best to strip text from a file? by Kenosis
in thread How best to strip text from a file? by bobdabuilda

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others contemplating the Monastery: (12)
    As of 2014-09-02 17:15 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My favorite cookbook is:










      Results (25 votes), past polls