Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re^12: How best to strip text from a file?

by bobdabuilda (Beadle)
on Nov 13, 2012 at 03:54 UTC ( #1003550=note: print w/replies, xml ) Need Help??

in reply to Re^11: How best to strip text from a file?
in thread How best to strip text from a file?

Hmmm yeah, ok... I see how that works. Will have a play with it and work out how best to get it to match both an Order ID and a Title at the same time to ensure matching record/title combo's are skipped.

You mention in your comments that printing from a subroutine isn't a good idea. Would you mind explaining the reasoning for that (combined with the fact you're telling me to do it here lol) please?

  • Comment on Re^12: How best to strip text from a file?

Replies are listed 'Best First'.
Re^13: How best to strip text from a file?
by Kenosis (Priest) on Nov 13, 2012 at 04:30 UTC

    My apologies, as I think I may have misunderstood you about Order ID/Title. If you're looking to skip all orders that have the same Order ID and Title, you can use a similar construct as before except at the title-matching line:

    if(/Title:(.+)/){ next RECORD if $seen{$hash{orderID}}{$1}; $seen{$hash{orderID}}{$1}++; $hash{title} = $1; }

    The above builds a hash of hashes, like this:


    This will effectively keep track whether an identical Order ID and title has been previously seen.

    You can then restore the OrderID line to its former self:

    $hash{orderID} = $1 if !defined $hash{orderID} and /Order ID:(\ +S+)/;

    You mention in your comments that printing from a subroutine isn't a good idea.

    I meant printing to the console. Yet, I didn't find anything about this being bad practice when searching for it, so I need to reevaluate my position on this. Nevertheless, writing to a spread sheet from within a subroutine is just fine (and this sounds inconsistent with printing not being fine, so I appreciate you asking me about it).

      Hmmm... I'll have to go back and have a play with what you've suggested above. Before I noticed you'd come back again, I went off and had a play as promised, and came up with the following change to deal with the Order/Title issue :

      if (/Distribution--/) { # Check to see if the Order/Title combo has been seen before if ($seen{$$hashReference{orderID},$$hashReference{title}}) { # Visual prompt for debugging print "Bollox\n"; next RECORD; } # End if else { # Add the "unseen" Order/Title combo to the hash $seen{$$hashReference{orderID},$$hashReference{title}}++; #Write the Order "header" info to the spreadsheet $worksheet->write($row,0,$$hashReference{fiscalCycle}); $worksheet->write($row,1,$$hashReference{vendorID}); $worksheet->write($row,2,$$hashReference{orderID}); $worksheet->write($row,3,$$hashReference{requisitionNum}); $worksheet->write($row,4,$$hashReference{title}); $worksheet->write($row,5,$$hashReference{'ISBN/ISSN'}); $worksheet->write($row,6,$$hashReference{copies});

      And it SEEMS to be working ok so far - but that's only with a small subset of the data. About to branch out and run it on a larger subset and see how it goes with regard to both results and performance... or at least I will do once I fix the server I need to get the data from, which appears to have halted a couple of core services needed for me to get to it... never a dull moment ;)

      As you can see I've not gotten a chance as yet to re-visit how/when I am doing the writing to Excel, but I'll do that once I've confirmed the functionality of the script as a whole, and once I've had a bit of a play with your suggestions also.

      Thanks once again for the help... it's been VERY valuable and very much appreciated. I'm sure I could have done it without you... eventually... but to be frank - I don't have enough hair left to be able to spare what it would have cost me ;)

        Yes, nice one! The keys you're creating to track Order ID/title will look something like "PO-9999,Item title here.", so that should work.

        As a point of clarification, since you've declared %hash right below the start of the for loop that iterates through the records, it's not necessary to use a hash reference within that hash's scope. A hash reference to %hash was sent to the subroutine, so the entire hash wouldn't have to be copied in order to access the hash's values for writing to the Excel spread sheet.

        Here are some equivalents:

        $$hashReference{orderID} eq $hash{orderID} is true $$hashReference{title} eq $hash{title} is true

        This means that the following:

        if ($seen{$$hashReference{orderID},$$hashReference{title}}) { ...

        Can be written as:

        if ($seen{$hash{orderID},$hash{title}}) { ...

        It's certainly OK if you prefer working with the hash reference. I'd tend to work with the hash within its scope, and a reference to it outside its scope, e.g., within a subroutine.

        It's been nice working with you on this, and it's clear to me that you would have done it on your own, but I appreciate this opportunity, as I learn from each task.

        As before, please let me know if you have any more questions about this...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1003550]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (7)
As of 2018-03-22 14:16 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (276 votes). Check out past polls.