in reply to Re^6: How best to strip text from a file? in thread How best to strip text from a file?
Hi, bobdabuilda
You've given this much thought, and I think you're pseudocode is on target.
The orders are only separated by a blank line, but they all start wth the "Order ID:" text, so looking at using that as the separator.
The "Order ID:" as record separator makes sense.
The page header should be automatically filtered out by the regex the way it stands anyway... I think.
You're correct.
I've taken the liberty to implement an interpretation of this. It does use two loops, but the outer loop is a for loop that iterates over an array of Order records:
use strict;
use warnings;
use Data::Dumper;
# Place a filename into $recordsFile to read Orders from that file
# else the Orders below __DATA__ will be used for demo purposes
my $recordsFile = '';
my ( @records, @orders );
my $recSeparator = 'Order ID:';
# Orders will initially be array elements 1 .. n in @orders; element 0
+ is initially the first page header
{
# Set the record separator
local $/ = $recSeparator;
# If there's a file name, try to read from that file
if ($recordsFile) {
open my $fh, '<', $recordsFile or die $!;
@records = <$fh>;
close $fh;
}
else {
@records = <DATA>;
}
}
# Remove the first page header
shift @records;
# Add Order ID: back into each record for later matching
$_ = "$recSeparator$_" for @records;
# Iterate through each record (Order)
for my $record (@records) {
my %hash;
# Treat the record string like a file, opening it for reading
open my $sh, '<', \$record or die "Unable to open record string: $
+!";
# Read the string like a file, one line at a time now
while (<$sh>) {
$hash{orderID} //= do { /Order ID:(\S+)/; $1 };
$hash{fiscalCycle} //= do { /cycle:(\d+)/; $1 };
$hash{vendorID} //= do { /Vendor ID:(\S+)/; $1 };
$hash{requisitionNum} //= do { /\s+(\d+).+requisition/; $1 };
$hash{copies} //= do { /copies:(\d+)/; $1 };
$hash{title} //= do { /Title:(.+)/; $1 };
$hash{'ISBN/ISSN'} //= do { m{ISBN/ISSN:(\S+)}; $1 };
# Distributions started?
if (/Distribution--/) {
# Save the current record separator
my $oldRecSeparator = $/;
# Set a new record separator
local $/ = 'Distribution--';
# Read the string like a file, a distribution 'chunk' at a
+ time
while (<$sh>) {
my %tempHash;
( $tempHash{holdingCode} ) = /code:(\S+)/;
( $tempHash{copies} ) = /copies:(\d+)/;
( $tempHash{dateReceived} ) = /received:(\S+)/;
( $tempHash{dateLoaded} ) = /loaded:(\S+)/;
push @{ $hash{distribution} }, \%tempHash;
}
# Restore the old record separator
$/ = $oldRecSeparator;
}
}
# Work with the filled-in %hash by sending a reference to it to a
+subroutine
# This is a complete record
writeToSpreadSheet( \%hash );
print Dumper \%hash;
# Done 'reading' the string
close $sh;
}
# Printing in a subroutine's not a good idea, but done here only to sh
+ow how to access the hash
sub writeToSpreadSheet {
my ($hashReference) = @_;
# The $$ notation dereferences the hash reference
print $$hashReference{vendorID}, "\n";
# The @{} notation deferences the array reference; the arrow opera
+tor deferences to get hash value
for my $distribution ( @{ $$hashReference{distribution} } ) {
print $distribution->{holdingCode}, "\n";
}
print "\n";
}
__DATA__
List of Distributions
+
+
Produced Tuesday, 9 October, 2012 at 1:38 PM
+
Order ID:PO-9999 fiscal cycle:21112
Vendor ID:VEND99 order type:SUBSCRIPT
15) requisition number: copies:9
call number:XX(9999999.999)
ISBN/ISSN:9999-999X
Title:Item title here.
ISSN:9999-999X
Publication info:More text here about stuff
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO1 copies:1
date received:27/6/2012 date lo
+aded:27/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO3 copies:2
date received:27/9/2012 date lo
+aded:27/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO2 copies:1
date received:25/8/2012 date lo
+aded:27/6/2012
List of Distributions
+
+
Produced Tuesday, 9 October, 2012 at 1:38 PM
+
Order ID:PO-1111 fiscal cycle:21112
Vendor ID:VEND11 order type:SUBSCRIPT
15) requisition number: copies:417
call number:XX(11111111.111)
ISBN/ISSN:1111-111X
Title:Item title here.
ISSN:9999-999X
Publication info:More text here about stuff
Distribution--
packing list:STUFF-I-DONT-NEED-111
holding code:CODEINFO9 copies:5
date received:11/6/2012 date lo
+aded:12/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-111
holding code:CODEINFO8 copies:4
date received:11/9/2012 date lo
+aded:12/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-111
holding code:CODEINFO7 copies:3
date received:11/8/2012 date lo
+aded:12/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-111
holding code:CODEINFO6 copies:2
date received:11/8/2012 date lo
+aded:12/6/2012
Output
VEND99
CODEINFO1
CODEINFO3
CODEINFO2
$VAR1 = {
'vendorID' => 'VEND99',
'copies' => '9',
'fiscalCycle' => '21112',
'distribution' => [
{
'dateLoaded' => '27/6/2012',
'dateReceived' => '27/6/2012',
'copies' => '1',
'holdingCode' => 'CODEINFO1'
},
{
'dateLoaded' => '27/6/2012',
'dateReceived' => '27/9/2012',
'copies' => '2',
'holdingCode' => 'CODEINFO3'
},
{
'dateLoaded' => '27/6/2012',
'dateReceived' => '25/8/2012',
'copies' => '1',
'holdingCode' => 'CODEINFO2'
}
],
'ISBN/ISSN' => '9999-999X',
'title' => 'Item title here.',
'orderID' => 'PO-9999',
'requisitionNum' => '15'
};
VEND11
CODEINFO9
CODEINFO8
CODEINFO7
CODEINFO6
$VAR1 = {
'vendorID' => 'VEND11',
'copies' => '417',
'fiscalCycle' => '21112',
'distribution' => [
{
'dateLoaded' => '12/6/2012',
'dateReceived' => '11/6/2012',
'copies' => '5',
'holdingCode' => 'CODEINFO9'
},
{
'dateLoaded' => '12/6/2012',
'dateReceived' => '11/9/2012',
'copies' => '4',
'holdingCode' => 'CODEINFO8'
},
{
'dateLoaded' => '12/6/2012',
'dateReceived' => '11/8/2012',
'copies' => '3',
'holdingCode' => 'CODEINFO7'
},
{
'dateLoaded' => '12/6/2012',
'dateReceived' => '11/8/2012',
'copies' => '2',
'holdingCode' => 'CODEINFO6'
}
],
'ISBN/ISSN' => '1111-111X',
'title' => 'Item title here.',
'requisitionNum' => '15',
'orderID' => 'PO-1111'
};
Included a subroutine and a call to it that shows how to handle accessing the hash a record at a time.
The code is commented, to assist with understanding it.
Let me know if you have any questions about this...
Enjoy!
Re^8: How best to strip text from a file?
by bobdabuilda (Beadle) on Nov 09, 2012 at 04:55 UTC
|
Wow... that's awesome! Thank you VERY much!
Worked great on my Windows test box using ActivePerl... but I get compile errors on the 'Nix server that I need to run it on (am only a tenant, not an admin etc. so no option of upgrading)... so I strongly suspect I'm coming across Perl versioning issues. Version on the server is "This is perl, v5.8.8 built for sun4-solaris" - which I suspect isn't compatible with something you've used in this script?
Bareword found where operator expected at real_test.pl line 44, near "
+/= do { /Order"
(Missing operator before Order?)
Use of /c modifier is meaningless without /g at real_test.pl line 45.
Bareword found where operator expected at real_test.pl line 45, near "
+/= do { /cycle"
(Missing operator before ycle?)
Bareword found where operator expected at real_test.pl line 46, near "
+/= do { /Vendor"
(Missing operator before Vendor?)
syntax error at real_test.pl line 44, near "/= do { /Order ID"
syntax error at real_test.pl line 45, near "/= do { /cycle"
syntax error at real_test.pl line 46, near "/= do { /Vendor ID"
Unmatched right curly bracket at real_test.pl line 46, at end of line
syntax error at real_test.pl line 46, near "$1 }"
real_test.pl had compilation errors.
Note - this compiles and runs without issue on ActivePERL on my PC, as I'm sure it did on yours///
Any suggestions on reading I should do to work out how best to fit this into the version of Perl on the server?
Sorry to be a pain... you're being extremely helpful, and I'm being nothing but more problems lol | [reply] [d/l] |
|
You're very welcome--and not a pain nor more problems! It was actually quite fun to work on and am glad it's working for you.
I think you're correct about the reason why the defined-or-equals (//=) throws an error on your *nix box. I believe that the operator was introduced in v5.10. You can recode each of the //= lines as follows:
This:
$hash{orderID} //= do { /Order ID:(\S+)/; $1 };
Can be written as:
$hash{orderID} = $1 if !defined $hash{orderID} and /Order ID:(\S+)/;
Again, let me know if you have any more questions about this script... | [reply] [d/l] [select] |
|
Alright! Getting somewhere. Worked out part of the issue was based around the location of the WriteExcel block set up to write the Order details - placed it inside the "if (/Distribution--/) {" loop, and it sorted that issue out nicely. Makes sense now that I look back on it, as AFTER that distribution matches, it's finished processing all of the Order header fields - whereas prior to that, the first few times it hit the WriteExcel stuff, it was only partially processed.
So, now it's trimming the Order list down nicely. TOO nicely. On looking back over the data, I noticed that each order had potential of having more than one title - so I was skipping data that should be kept.
So, added in another check on the title, and that's not behaving as expected... of course...
With this latest version, and the example data provided, the output I am expecting is 2 order entries for PC-9999, and 1 for PC-1111. 2 entries for the first, because there are 3 entries, but 2 of those 3 are duplicates (3rd one I added a "2" onto the title to make it different). However, it's only printing out one order for each... and I can't work out why...
Any suggestions on how to track the issue down please?
use strict;
use warnings;
use Data::Dumper;
use Spreadsheet::WriteExcel;
# Place a filename into $recordsFile to read Orders from that file
# else the Orders below __DATA__ will be used for demo purposes
#my $recordsFile = 'finished_report_sample.txt';
my $recordsFile = '';
my ( @records, @orders );
my $recSeparator = 'Order ID:';
# Orders will initially be array elements 1 .. n in @orders; element 0
+ is initially the first page header
{
# Set the record separator
local $/ = $recSeparator;
# If there's a file name, try to read from that file
if ($recordsFile) {
open my $fh, '<', $recordsFile or die $!;
@records = <$fh>;
close $fh;
} # End If
else {
@records = <DATA>;
} # End Else
} # End preparatory loop
# Remove the first page header
shift @records;
# Add Order ID: back into each record for later matching
$_ = "$recSeparator$_" for @records;
########## Added for writing to Excel
# Open a new xls file then create a sheet
my $workbook = Spreadsheet::WriteExcel->new('distlist.xls');
my $worksheet= $workbook->add_worksheet();
# Write headings
$worksheet->write(0,0,'Fiscal Year');
$worksheet->write(0,1,'Vendor');
$worksheet->write(0,2,'PO Number');
$worksheet->write(0,3,'Orderline');
$worksheet->write(0,4,'Title');
$worksheet->write(0,5,'ISBN/ISSN');
$worksheet->write(0,6,'# copies for Title');
$worksheet->write(0,7,'Distribution');
$worksheet->write(0,8,'Date Received');
$worksheet->write(0,9,'Date Loaded');
$worksheet->write(0,10,'Number of Copies');
# Initialise spreadheet counters
my $row=1;
my $column=0;
# Set this up ready for checking for duplicate orders
my $previousOrder="";
my $previousTitle="";
# Iterate through each record (Order)
for my $record (@records) {
my %hash;
# For testing WriteExcel
# $row+=1;
# Treat the record string like a file, opening it for reading
open my $sh, '<', \$record or die "Unable to open record string: $
+!";
# Read the string like a file, one line at a time now
while (<$sh>) {
$hash{orderID} = $1 if !defined $hash{orderID} and /Ord
+er ID:(\S+)/;
$hash{fiscalCycle} = $1 if !defined $hash{fiscalCycle} and
+/cycle:(\d+)/;
$hash{vendorID} = $1 if !defined $hash{vendorID} and /Ve
+ndor ID:(\S+)/;
$hash{requisitionNum} = $1 if !defined $hash{requisitionNum} a
+nd /\s+(\d+).+requisition/;
$hash{copies} = $1 if !defined $hash{copies} and /copi
+es:(\d+)/;
$hash{'ISBN/ISSN'} = $1 if !defined $hash{'ISBN/ISSN'} and
+m{ISBN/ISSN:(\S+)};
$hash{title} = $1 if !defined $hash{title} and /Title
+:(.+)/;
my ($hashReference) = \%hash;
# Had to put this in to suppress warnings about $$hashReference{ti
+tle} not being populated yet during the loop
no warnings 'uninitialized';
# Check to see if it's a repeat order and title, skip if it is.
if (($previousOrder eq $$hashReference{orderID}) && ($previousOrde
+r ne "")){
if (($previousTitle eq $$hashReference{title}) && ($previousTi
+tle ne "")) {
print "Order: $previousOrder HashOrder: $$hashReference{ord
+erID} Title: $previousTitle HashTitle: $$hashReference{title} \n";
print "Order already processed. Skipping...\n";
last;
} # End if
} # End If
else {
# Distributions started?
if (/Distribution--/) {
$worksheet->write($row,0,$$hashReference{fiscalCycle});
$worksheet->write($row,1,$$hashReference{vendorID});
$worksheet->write($row,2,$$hashReference{orderID});
$worksheet->write($row,3,$$hashReference{requisitionNum});
$worksheet->write($row,4,$$hashReference{title});
$worksheet->write($row,5,$$hashReference{'ISBN/ISSN'});
$worksheet->write($row,6,$$hashReference{copies});
# Set the current title and Order number for duplicate
+ checking
$previousOrder = $$hashReference{orderID};
$previousTitle = $$hashReference{title};
# Save the current record separator
my $oldRecSeparator = $/;
# Set a new record separator
local $/ = 'Distribution--';
# Read the string like a file, a distribution 'chunk'
+at a time
while (<$sh>) {
#I realise this hashing is now superfluous, with data bein
+g written
# direct to Excel, but am keeping changes to a minimum until
+ I get
# the overall functionality correct.
my %tempHash;
( $tempHash{holdingCode} ) = /code:(\S+)/;
( $tempHash{copies} ) = /copies:(\d+)/;
( $tempHash{dateReceived} ) = /received:(\S+)/;
( $tempHash{dateLoaded} ) = /loaded:(\S+)/;
$worksheet->write($row,7,$tempHash{holdingCode});
$worksheet->write($row,8,$tempHash{dateReceived});
$worksheet->write($row,9,$tempHash{dateLoaded});
$worksheet->write($row,10,$tempHash{copies});
$row+=1;
push @{ $hash{distribution} }, \%tempHash;
} # End While
# Restore the old record separator
$/ = $oldRecSeparator;
} # End If
} # End Else
} # End While
# Work with the filled-in %hash by sending a reference to it to a
+subroutine
# This is a complete record
# writeToSpreadSheet( \%hash );
# print Dumper \%hash;
# Done 'reading' the string
close $sh;
} # End For - last of the loops
$workbook->close();
# Printing in a subroutine's not a good idea, but done here only to sh
+ow how to access the hash
#sub writeToSpreadSheet {
# my ($hashReference) = @_;
#
# The $$ notation dereferences the hash reference
# print $$hashReference{vendorID}, "\n";
#
# The @{} notation deferences the array reference; the arrow opera
+tor deferences to get hash value
# for my $distribution ( @{ $$hashReference{distribution} } ) {
# print $distribution->{holdingCode}, "\n";
# }
#
# print "\n";
#}
__DATA__
List of Distributions
Produced Tuesday, 9 October, 2012 at 1:38 PM
Order ID:PO-9999 fiscal cycle:21112
Vendor ID:VEND99 order type:SUBSCRIPT
15) requisition number: copies:9
call number:XX(9999999.999)
ISBN/ISSN:9999-999X
Title:Item title here.
ISSN:9999-999X
Publication info:More text here about stuff
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO1 copies:1
date received:27/6/2012 date lo
+aded:27/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO3 copies:2
date received:27/9/2012 date lo
+aded:27/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO2 copies:1
date received:25/8/2012 date lo
+aded:27/6/2012
Order ID:PO-9999 fiscal cycle:21112
Vendor ID:VEND99 order type:SUBSCRIPT
15) requisition number: copies:9
call number:XX(9999999.999)
ISBN/ISSN:9999-999X
Title:Item title here.
ISSN:9999-999X
Publication info:More text here about stuff
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO1 copies:1
date received:27/6/2012 date lo
+aded:27/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO3 copies:2
date received:27/9/2012 date lo
+aded:27/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO2 copies:1
date received:25/8/2012 date lo
+aded:27/6/2012
Order ID:PO-9999 fiscal cycle:21112
Vendor ID:VEND99 order type:SUBSCRIPT
15) requisition number: copies:9
call number:XX(9999999.999)
ISBN/ISSN:9999-999X
Title:Item title here 2.
ISSN:9999-999X
Publication info:More text here about stuff
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO1 copies:1
date received:27/6/2012 date lo
+aded:27/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO3 copies:2
date received:27/9/2012 date lo
+aded:27/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO2 copies:1
date received:25/8/2012 date lo
+aded:27/6/2012
List of Distributions
Produced Tuesday, 9 October, 2012 at 1:38 PM
Order ID:PO-1111 fiscal cycle:21112
Vendor ID:VEND11 order type:SUBSCRIPT
15) requisition number: copies:417
call number:XX(11111111.111)
ISBN/ISSN:1111-111X
Title:Item title here.
ISSN:9999-999X
Publication info:More text here about stuff
Distribution--
packing list:STUFF-I-DONT-NEED-111
holding code:CODEINFO9 copies:5
date received:11/6/2012 date lo
+aded:12/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-111
holding code:CODEINFO8 copies:4
date received:11/9/2012 date lo
+aded:12/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-111
holding code:CODEINFO7 copies:3
date received:11/8/2012 date lo
+aded:12/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-111
holding code:CODEINFO6 copies:2
date received:11/8/2012 date lo
+aded:12/6/2012
| [reply] [d/l] |
|
|
That certainly did the trick!! Nice one. So now, I've been playing around with the logic behind skipping duplicated orders within the file... and of course, having issues with it...
Using the same script and data, and keeping in mind that orders, if duplicated, will always (I'm pretty certain) be duplicated in order - ie. an order will be replicated 3 or 4 or more times, and then the report moves on to a different order.
Sorry for the long bit of code here, but figured it easiest to show you what I'm trying :
use strict;
use warnings;
use Data::Dumper;
use Spreadsheet::WriteExcel;
# Place a filename into $recordsFile to read Orders from that file
# else the Orders below __DATA__ will be used for demo purposes
#my $recordsFile = 'finished_report_sample.txt';
my $recordsFile = '';
my ( @records, @orders );
my $recSeparator = 'Order ID:';
# Orders will initially be array elements 1 .. n in @orders; element 0
+ is initially the first page header
{
# Set the record separator
local $/ = $recSeparator;
# If there's a file name, try to read from that file
if ($recordsFile) {
open my $fh, '<', $recordsFile or die $!;
@records = <$fh>;
close $fh;
} # End If
else {
@records = <DATA>;
} # End Else
} # End preparatory loop
# Remove the first page header
shift @records;
# Add Order ID: back into each record for later matching
$_ = "$recSeparator$_" for @records;
########## Added for writing to Excel
# Open a new xls file then create a sheet
my $workbook = Spreadsheet::WriteExcel->new('distlist.xls');
my $worksheet= $workbook->add_worksheet();
# Write headings
$worksheet->write(0,0,'Fiscal Year');
$worksheet->write(0,1,'Vendor');
$worksheet->write(0,2,'PO Number');
$worksheet->write(0,3,'Orderline');
$worksheet->write(0,4,'Title');
$worksheet->write(0,5,'ISBN/ISSN');
$worksheet->write(0,6,'# copies for Title');
$worksheet->write(0,7,'Distribution');
$worksheet->write(0,8,'Date Received');
$worksheet->write(0,9,'Date Loaded');
$worksheet->write(0,10,'Number of Copies');
# Initialise spreadheet counters
my $row=1;
my $column=0;
# Set this up ready for checking for duplicate orders
my $previousOrder="";
# Iterate through each record (Order)
for my $record (@records) {
my %hash;
# Treat the record string like a file, opening it for reading
open my $sh, '<', \$record or die "Unable to open record string: $
+!";
# Read the string like a file, one line at a time now
while (<$sh>) {
$hash{orderID} = $1 if !defined $hash{orderID} and /Ord
+er ID:(\S+)/;
$hash{fiscalCycle} = $1 if !defined $hash{fiscalCycle} and
+/cycle:(\d+)/;
$hash{vendorID} = $1 if !defined $hash{vendorID} and /Ve
+ndor ID:(\S+)/;
$hash{requisitionNum} = $1 if !defined $hash{requisitionNum} a
+nd /\s+(\d+).+requisition/;
$hash{copies} = $1 if !defined $hash{copies} and /copi
+es:(\d+)/;
$hash{'ISBN/ISSN'} = $1 if !defined $hash{'ISBN/ISSN'} and
+m{ISBN/ISSN:(\S+)};
$hash{title} = $1 if !defined $hash{title} and /Title
+:(.+)/;
# CHeck to see if it's a repeat order, skip if it is.
my ($hashReference) = \%hash;
if (($previousOrder eq $$hashReference{orderID}) && ($previousOrde
+r ne "")) {
print "Order: $previousOrder HashOrder: $$hashReference{o
+rderID} \n";
print "Order already processed. Skipping...\n";
last;
} # End If
else {
$worksheet->write($row,0,$$hashReference{fiscalCycle});
$worksheet->write($row,1,$$hashReference{vendorID});
$worksheet->write($row,2,$$hashReference{orderID});
$worksheet->write($row,3,$$hashReference{requisitionNum});
$worksheet->write($row,4,$$hashReference{title});
$worksheet->write($row,5,$$hashReference{'ISBN/ISSN'});
$worksheet->write($row,6,$$hashReference{copies});
$previousOrder = $$hashReference{orderID};
# $row+=1;
# Distributions started?
if (/Distribution--/) {
# Save the current record separator
my $oldRecSeparator = $/;
# Set a new record separator
local $/ = 'Distribution--';
# Read the string like a file, a distribution 'chunk'
+at a time
while (<$sh>) {
#I realise this hashing is now superfluous, with data bein
+g written
# direct to Excel, but am keeping changes to a minimum until
+ I get
# the overall functionality correct.
my %tempHash;
( $tempHash{holdingCode} ) = /code:(\S+)/;
( $tempHash{copies} ) = /copies:(\d+)/;
( $tempHash{dateReceived} ) = /received:(\S+)/;
( $tempHash{dateLoaded} ) = /loaded:(\S+)/;
$worksheet->write($row,7,$tempHash{holdingCode});
$worksheet->write($row,8,$tempHash{dateReceived});
$worksheet->write($row,9,$tempHash{dateLoaded});
$worksheet->write($row,10,$tempHash{copies});
$row+=1;
push @{ $hash{distribution} }, \%tempHash;
} # End While
# Restore the old record separator
$/ = $oldRecSeparator;
} # End If
} # End Else
} # End While
# Work with the filled-in %hash by sending a reference to it to a
+subroutine
# This is a complete record
# writeToSpreadSheet( \%hash );
# print Dumper \%hash;
# Done 'reading' the string
close $sh;
} # End For - last of the loops
$workbook->close();
# Printing in a subroutine's not a good idea, but done here only to sh
+ow how to access the hash
#sub writeToSpreadSheet {
# my ($hashReference) = @_;
#
# The $$ notation dereferences the hash reference
# print $$hashReference{vendorID}, "\n";
#
# The @{} notation deferences the array reference; the arrow opera
+tor deferences to get hash value
# for my $distribution ( @{ $$hashReference{distribution} } ) {
# print $distribution->{holdingCode}, "\n";
# }
#
# print "\n";
#}
__DATA__
List of Distributions
Produced Tuesday, 9 October, 2012 at 1:38 PM
Order ID:PO-9999 fiscal cycle:21112
Vendor ID:VEND99 order type:SUBSCRIPT
15) requisition number: copies:9
call number:XX(9999999.999)
ISBN/ISSN:9999-999X
Title:Item title here.
ISSN:9999-999X
Publication info:More text here about stuff
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO1 copies:1
date received:27/6/2012 date lo
+aded:27/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO3 copies:2
date received:27/9/2012 date lo
+aded:27/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO2 copies:1
date received:25/8/2012 date lo
+aded:27/6/2012
Order ID:PO-9999 fiscal cycle:21112
Vendor ID:VEND99 order type:SUBSCRIPT
15) requisition number: copies:9
call number:XX(9999999.999)
ISBN/ISSN:9999-999X
Title:Item title here.
ISSN:9999-999X
Publication info:More text here about stuff
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO1 copies:1
date received:27/6/2012 date lo
+aded:27/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO3 copies:2
date received:27/9/2012 date lo
+aded:27/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-999
holding code:CODEINFO2 copies:1
date received:25/8/2012 date lo
+aded:27/6/2012
List of Distributions
Produced Tuesday, 9 October, 2012 at 1:38 PM
Order ID:PO-1111 fiscal cycle:21112
Vendor ID:VEND11 order type:SUBSCRIPT
15) requisition number: copies:417
call number:XX(11111111.111)
ISBN/ISSN:1111-111X
Title:Item title here.
ISSN:9999-999X
Publication info:More text here about stuff
Distribution--
packing list:STUFF-I-DONT-NEED-111
holding code:CODEINFO9 copies:5
date received:11/6/2012 date lo
+aded:12/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-111
holding code:CODEINFO8 copies:4
date received:11/9/2012 date lo
+aded:12/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-111
holding code:CODEINFO7 copies:3
date received:11/8/2012 date lo
+aded:12/6/2012
Distribution--
packing list:STUFF-I-DONT-NEED-111
holding code:CODEINFO6 copies:2
date received:11/8/2012 date lo
+aded:12/6/2012
Now, when I run this, somehow $previousOrder is getting a value in it at the "if" logic check, even though the logic check is the first time the variable is seen after it's initialised as ""... as you can see from the output of the first print command.
The other thing I don't understand from this outcome, is that the resultant XLS file has only 2 fields (as well as the headers) written to it - the fiscal year and PO number. I would have thought that, if it made it to the section that writes the PO stuff at all, it would write it all - I don't understand how it can only write part of it?
There's also an issue with the XLS file itself for some reason - giving an error when opening it... but I'm off to research that one myself now while I wait for suggestions on what I've stuffed up with my logic for the duplication check :) | [reply] [d/l] |
|
|
|
|
|
|