Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Hi, bobdabuilda

You've given this much thought, and I think you're pseudocode is on target.

The orders are only separated by a blank line, but they all start wth the "Order ID:" text, so looking at using that as the separator.

The "Order ID:" as record separator makes sense.

The page header should be automatically filtered out by the regex the way it stands anyway... I think.

You're correct.

I've taken the liberty to implement an interpretation of this. It does use two loops, but the outer loop is a for loop that iterates over an array of Order records:

use strict; use warnings; use Data::Dumper; # Place a filename into $recordsFile to read Orders from that file # else the Orders below __DATA__ will be used for demo purposes my $recordsFile = ''; my ( @records, @orders ); my $recSeparator = 'Order ID:'; # Orders will initially be array elements 1 .. n in @orders; element 0 + is initially the first page header { # Set the record separator local $/ = $recSeparator; # If there's a file name, try to read from that file if ($recordsFile) { open my $fh, '<', $recordsFile or die $!; @records = <$fh>; close $fh; } else { @records = <DATA>; } } # Remove the first page header shift @records; # Add Order ID: back into each record for later matching $_ = "$recSeparator$_" for @records; # Iterate through each record (Order) for my $record (@records) { my %hash; # Treat the record string like a file, opening it for reading open my $sh, '<', \$record or die "Unable to open record string: $ +!"; # Read the string like a file, one line at a time now while (<$sh>) { $hash{orderID} //= do { /Order ID:(\S+)/; $1 }; $hash{fiscalCycle} //= do { /cycle:(\d+)/; $1 }; $hash{vendorID} //= do { /Vendor ID:(\S+)/; $1 }; $hash{requisitionNum} //= do { /\s+(\d+).+requisition/; $1 }; $hash{copies} //= do { /copies:(\d+)/; $1 }; $hash{title} //= do { /Title:(.+)/; $1 }; $hash{'ISBN/ISSN'} //= do { m{ISBN/ISSN:(\S+)}; $1 }; # Distributions started? if (/Distribution--/) { # Save the current record separator my $oldRecSeparator = $/; # Set a new record separator local $/ = 'Distribution--'; # Read the string like a file, a distribution 'chunk' at a + time while (<$sh>) { my %tempHash; ( $tempHash{holdingCode} ) = /code:(\S+)/; ( $tempHash{copies} ) = /copies:(\d+)/; ( $tempHash{dateReceived} ) = /received:(\S+)/; ( $tempHash{dateLoaded} ) = /loaded:(\S+)/; push @{ $hash{distribution} }, \%tempHash; } # Restore the old record separator $/ = $oldRecSeparator; } } # Work with the filled-in %hash by sending a reference to it to a +subroutine # This is a complete record writeToSpreadSheet( \%hash ); print Dumper \%hash; # Done 'reading' the string close $sh; } # Printing in a subroutine's not a good idea, but done here only to sh +ow how to access the hash sub writeToSpreadSheet { my ($hashReference) = @_; # The $$ notation dereferences the hash reference print $$hashReference{vendorID}, "\n"; # The @{} notation deferences the array reference; the arrow opera +tor deferences to get hash value for my $distribution ( @{ $$hashReference{distribution} } ) { print $distribution->{holdingCode}, "\n"; } print "\n"; } __DATA__ List of Distributions + + Produced Tuesday, 9 October, 2012 at 1:38 PM + Order ID:PO-9999 fiscal cycle:21112 Vendor ID:VEND99 order type:SUBSCRIPT 15) requisition number: copies:9 call number:XX(9999999.999) ISBN/ISSN:9999-999X Title:Item title here. ISSN:9999-999X Publication info:More text here about stuff Distribution-- packing list:STUFF-I-DONT-NEED-999 holding code:CODEINFO1 copies:1 date received:27/6/2012 date lo +aded:27/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-999 holding code:CODEINFO3 copies:2 date received:27/9/2012 date lo +aded:27/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-999 holding code:CODEINFO2 copies:1 date received:25/8/2012 date lo +aded:27/6/2012 List of Distributions + + Produced Tuesday, 9 October, 2012 at 1:38 PM + Order ID:PO-1111 fiscal cycle:21112 Vendor ID:VEND11 order type:SUBSCRIPT 15) requisition number: copies:417 call number:XX(11111111.111) ISBN/ISSN:1111-111X Title:Item title here. ISSN:9999-999X Publication info:More text here about stuff Distribution-- packing list:STUFF-I-DONT-NEED-111 holding code:CODEINFO9 copies:5 date received:11/6/2012 date lo +aded:12/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-111 holding code:CODEINFO8 copies:4 date received:11/9/2012 date lo +aded:12/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-111 holding code:CODEINFO7 copies:3 date received:11/8/2012 date lo +aded:12/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-111 holding code:CODEINFO6 copies:2 date received:11/8/2012 date lo +aded:12/6/2012

Output

VEND99 CODEINFO1 CODEINFO3 CODEINFO2 $VAR1 = { 'vendorID' => 'VEND99', 'copies' => '9', 'fiscalCycle' => '21112', 'distribution' => [ { 'dateLoaded' => '27/6/2012', 'dateReceived' => '27/6/2012', 'copies' => '1', 'holdingCode' => 'CODEINFO1' }, { 'dateLoaded' => '27/6/2012', 'dateReceived' => '27/9/2012', 'copies' => '2', 'holdingCode' => 'CODEINFO3' }, { 'dateLoaded' => '27/6/2012', 'dateReceived' => '25/8/2012', 'copies' => '1', 'holdingCode' => 'CODEINFO2' } ], 'ISBN/ISSN' => '9999-999X', 'title' => 'Item title here.', 'orderID' => 'PO-9999', 'requisitionNum' => '15' }; VEND11 CODEINFO9 CODEINFO8 CODEINFO7 CODEINFO6 $VAR1 = { 'vendorID' => 'VEND11', 'copies' => '417', 'fiscalCycle' => '21112', 'distribution' => [ { 'dateLoaded' => '12/6/2012', 'dateReceived' => '11/6/2012', 'copies' => '5', 'holdingCode' => 'CODEINFO9' }, { 'dateLoaded' => '12/6/2012', 'dateReceived' => '11/9/2012', 'copies' => '4', 'holdingCode' => 'CODEINFO8' }, { 'dateLoaded' => '12/6/2012', 'dateReceived' => '11/8/2012', 'copies' => '3', 'holdingCode' => 'CODEINFO7' }, { 'dateLoaded' => '12/6/2012', 'dateReceived' => '11/8/2012', 'copies' => '2', 'holdingCode' => 'CODEINFO6' } ], 'ISBN/ISSN' => '1111-111X', 'title' => 'Item title here.', 'requisitionNum' => '15', 'orderID' => 'PO-1111' };

Included a subroutine and a call to it that shows how to handle accessing the hash a record at a time.

The code is commented, to assist with understanding it.

Let me know if you have any questions about this...

Enjoy!


In reply to Re^7: How best to strip text from a file? by Kenosis
in thread How best to strip text from a file? by bobdabuilda

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others browsing the Monastery: (8)
    As of 2015-07-31 02:15 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









      Results (274 votes), past polls