Hi, bobdabuilda
You've given this much thought, and I think you're pseudocode is on target.
The orders are only separated by a blank line, but they all start wth the "Order ID:" text, so looking at using that as the separator.
The "Order ID:" as record separator makes sense.
The page header should be automatically filtered out by the regex the way it stands anyway... I think.
You're correct.
I've taken the liberty to implement an interpretation of this. It does use two loops, but the outer loop is a for loop that iterates over an array of Order records:
use strict;
use warnings;
use Data::Dumper;
# Place a filename into $recordsFile to read Orders from that file
# else the Orders below __DATA__ will be used for demo purposes
my $recordsFile = '';
my ( @records, @orders );
my $recSeparator = 'Order ID:';
# Orders will initially be array elements 1 .. n in @orders; element 0
+ is initially the first page header
{
# Set the record separator
local $/ = $recSeparator;
# If there's a file name, try to read from that file
if ($recordsFile) {
open my $fh, '<', $recordsFile or die $!;
@records = <$fh>;
close $fh;
}
else {
@records = <DATA>;
}
}
# Remove the first page header
shift @records;
# Add Order ID: back into each record for later matching
$_ = "$recSeparator$_" for @records;
# Iterate through each record (Order)
for my $record (@records) {
my %hash;
# Treat the record string like a file, opening it for reading
open my $sh, '<', \$record or die "Unable to open record string: $
+!";
# Read the string like a file, one line at a time now
while (<$sh>) {
$hash{orderID} //= do { /Order ID:(\S+)/; $1 };
$hash{fiscalCycle} //= do { /cycle:(\d+)/; $1 };
$hash{vendorID} //= do { /Vendor ID:(\S+)/; $1 };
$hash{requisitionNum} //= do { /\s+(\d+).+requisition/; $1 };
$hash{copies} //= do { /copies:(\d+)/; $1 };
$hash{title} //= do { /Title:(.+)/; $1 };
$hash{'ISBN/ISSN'} //= do { m{ISBN/ISSN:(\S+)}; $1 };
# Distributions started?
if (/Distribution--/) {
# Save the current record separator
my $oldRecSeparator = $/;
# Set a new record separator
local $/ = 'Distribution--';
# Read the string like a file, a distribution 'chunk' at a
+ time
while (<$sh>) {
my %tempHash;
( $tempHash{holdingCode} ) = /code:(\S+)/;
( $tempHash{copies} ) = /copies:(\d+)/;
( $tempHash{dateReceived} ) = /received:(\S+)/;
( $tempHash{dateLoaded} ) = /loaded:(\S+)/;
push @{ $hash{distribution} }, \%tempHash;
}
# Restore the old record separator
$/ = $oldRecSeparator;
}
}
# Work with the filled-in %hash by sending a reference to it to a
+subroutine
# This is a complete record
writeToSpreadSheet( \%hash );
print Dumper \%hash;
# Done 'reading' the string
close $sh;
}
# Printing in a subroutine's not a good idea, but done here only to sh
+ow how to access the hash
sub writeToSpreadSheet {
my ($hashReference) = @_;
# The $$ notation dereferences the hash reference
print $$hashReference{vendorID}, "\n";
# The @{} notation deferences the array reference; the arrow opera
+tor deferences to get hash value
for my $distribution ( @{ $$hashReference{distribution} } ) {
print $distribution->{holdingCode}, "\n";
}
print "\n";
}
__DATA__
List of Distributions
+
+
Produced Tuesday, 9 October, 2012 at 1:38 PM
+
Order ID:PO-9999 fiscal cycle:21112
Vendor ID:VEND99 order type:SUBSCRIPT
15) requisition number: copies:9
call number:XX(9999999.999)
ISBN/ISSN:9999-999X
Title:Item title here.
ISSN:9999-999X
Publication info:More text here about stuff
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO1 copies:1
date received:27/6/2012 date lo
+aded:27/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO3 copies:2
date received:27/9/2012 date lo
+aded:27/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO2 copies:1
date received:25/8/2012 date lo
+aded:27/6/2012
List of Distributions
+
+
Produced Tuesday, 9 October, 2012 at 1:38 PM
+
Order ID:PO-1111 fiscal cycle:21112
Vendor ID:VEND11 order type:SUBSCRIPT
15) requisition number: copies:417
call number:XX(11111111.111)
ISBN/ISSN:1111-111X
Title:Item title here.
ISSN:9999-999X
Publication info:More text here about stuff
Distribution--
packing list:STUFF-I-DONT-NEED-111
holding code:CODEINFO9 copies:5
date received:11/6/2012 date lo
+aded:12/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-111
holding code:CODEINFO8 copies:4
date received:11/9/2012 date lo
+aded:12/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-111
holding code:CODEINFO7 copies:3
date received:11/8/2012 date lo
+aded:12/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-111
holding code:CODEINFO6 copies:2
date received:11/8/2012 date lo
+aded:12/6/2012
Output
VEND99
CODEINFO1
CODEINFO3
CODEINFO2
$VAR1 = {
'vendorID' => 'VEND99',
'copies' => '9',
'fiscalCycle' => '21112',
'distribution' => [
{
'dateLoaded' => '27/6/2012',
'dateReceived' => '27/6/2012',
'copies' => '1',
'holdingCode' => 'CODEINFO1'
},
{
'dateLoaded' => '27/6/2012',
'dateReceived' => '27/9/2012',
'copies' => '2',
'holdingCode' => 'CODEINFO3'
},
{
'dateLoaded' => '27/6/2012',
'dateReceived' => '25/8/2012',
'copies' => '1',
'holdingCode' => 'CODEINFO2'
}
],
'ISBN/ISSN' => '9999-999X',
'title' => 'Item title here.',
'orderID' => 'PO-9999',
'requisitionNum' => '15'
};
VEND11
CODEINFO9
CODEINFO8
CODEINFO7
CODEINFO6
$VAR1 = {
'vendorID' => 'VEND11',
'copies' => '417',
'fiscalCycle' => '21112',
'distribution' => [
{
'dateLoaded' => '12/6/2012',
'dateReceived' => '11/6/2012',
'copies' => '5',
'holdingCode' => 'CODEINFO9'
},
{
'dateLoaded' => '12/6/2012',
'dateReceived' => '11/9/2012',
'copies' => '4',
'holdingCode' => 'CODEINFO8'
},
{
'dateLoaded' => '12/6/2012',
'dateReceived' => '11/8/2012',
'copies' => '3',
'holdingCode' => 'CODEINFO7'
},
{
'dateLoaded' => '12/6/2012',
'dateReceived' => '11/8/2012',
'copies' => '2',
'holdingCode' => 'CODEINFO6'
}
],
'ISBN/ISSN' => '1111-111X',
'title' => 'Item title here.',
'requisitionNum' => '15',
'orderID' => 'PO-1111'
};
Included a subroutine and a call to it that shows how to handle accessing the hash a record at a time.
The code is commented, to assist with understanding it.
Let me know if you have any questions about this...
Enjoy! |