Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Re^3: How best to strip text from a file?

by Kenosis (Priest)
on Nov 07, 2012 at 03:18 UTC ( #1002621=note: print w/replies, xml ) Need Help??

in reply to Re^2: How best to strip text from a file?
in thread How best to strip text from a file?

You've very welcome, bobdabuilda! I hope it'll fit your needs.

Please let me know if you have any questions about it or if you encounter any problems using it...

  • Comment on Re^3: How best to strip text from a file?

Replies are listed 'Best First'.
Re^4: How best to strip text from a file?
by bobdabuilda (Beadle) on Nov 07, 2012 at 22:14 UTC

    Well, I did get a chance to look at it yesterday before I headed home, and realised I didn't give as much example data as I should have - there are usually numerous Orders containing the multiple distributions... so I'm going to hav a play with the logic today, hopefully, to work out how to perform that loop...

    The quick look I had at it got me there, to a point - but "lost" the first line of each subsequent order due to the way I had the loops set up... should hopefully be able to get that right today... but your code has certainly put me well and truly on the way to what I was after, and I'm very thankful for that :)

      You're most welcome, bobdabuilda!

      ...there are usually numerous Orders containing the multiple distributions...

      Suspected so. What separates these Orders? One option is to set the record separator ($/) to the text that separates Orders, and then do the matching on each Order.

        Yes, that's what I've been looking at (trying) doing. The orders are only separated by a blank line, but they all start wth the "Order ID:" text, so looking at using that as the separator.

        The report also spans multiple pages, including a header on each page, which complicates things just that little bit more also... but I'll worry about that later, once I have the logic for the full order sorted. The page header should be automatically filtered out by the regex the way it stands anyway... I think.

        One thing I *could* do with a suggestion on, is how to handle breaking out of the loop at the end of each Order. About the only way I can think of to know to stop processing distributions, is to look for the start of the next Order record. In order to do that, though, the line containing data I want has to be read in at the "end" of the loop for the previous Order... and then back up at the start of the loop, it reads the next line of the file in, dropping the previous one, which contains (some of) the data I'm after.

        Probably easier to show you what I mean in pseudocode to give a better idea :

        while <DATA> { if (start of record) { get order details while (not a new order) { get distribution details into a hash } print order details and distributions to Excel } }

        So, from the above, the issue I am having is the two While loops... the second one "eats" the order info of any Orders following the first. I'm sure I could put some post-While processing there to trap the data before it loops to the next line... but that just seems a bit... uncouth, for wont of a better word. Can't help thinking it should be more elegant (not to mention less likely to fail) than that.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1002621]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2018-05-27 23:46 GMT
Find Nodes?
    Voting Booth?