Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re^2: Sort text by Chapter names

by Anonymous Monk
on May 28, 2018 at 12:06 UTC ( #1215329=note: print w/replies, xml ) Need Help??

in reply to Re: Sort text by Chapter names
in thread Sort text by Chapter names

Another alternative that I often see used here is to calculate the position and length of each unit within the file as you are scanning through it looking for markers. Put the marker-name, position, and length into an array of hashes, then sort the array by name using a custom sort-function. Retrieve each chapter directly from the original file by seeking to the proper position and reading the calculated number of bytes. (If the length might also be huge, read and write it in chunks not-to-exceed a digestible buffer-size.)

Replies are listed 'Best First'.
Re^3: Sort text by Chapter names
by tybalt89 (Vicar) on May 28, 2018 at 18:23 UTC

    For the fun of it ( and also to show you can seek on DATA )

    #!/usr/bin/perl # use strict; use warnings; my %chapters; my $previous = undef; my $buffer; my $max = 4096; while(<DATA>) { if( /^Chapter/ ) { $chapters{$_} = $previous = [ tell(DATA) - length, length ]; } elsif( defined $previous ) { $previous->[1] += length; } } use Data::Dump 'pp'; print pp \%chapters; print "\n\n"; for ( sort keys %chapters ) { my ($start, $length) = $chapters{$_}->@*; seek DATA, $start, 0; while( $length > $max ) { read DATA, $buffer, $max; print $buffer; $length -= $max; } read DATA, $buffer, $length; print $buffer; } __DATA__ Chapter One There were lots of monkeys here and they ate all the bananas... lots more text up to hundreds of words. Chapter Nine This chapter has probably 1000 words. Chapter Two Here is the text in the second chapter... Chapter Five Here is the text in the fifth chapter... every chapter is of differing length, some long some short.
Re^3: Sort text by Chapter names
by jimpudar (Pilgrim) on May 28, 2018 at 12:25 UTC

    This is an elegant solution. I do like it better than mine as it uses only half the disk space!

    This thread is a fantastic example of TMTOWTDI.



      I'm beginning to figure out this Anonymous Monk thing. If I post such a suggestion as myself, I get slammed-as-usual by "the Magnificent Seven." Whereas, if I post the same suggestion anonymously, it is a favorite. Got it!

        Months ago I preemptively explained this is exactly what you would post given enough time. I also explained why you'd be deluded into making the conclusion you were guaranteed to, and indeed did, make.

        Your advice is often half-right. We've heard the twice a day broken clock analogy before here. Though a broken clock in the digital age is never right and a sundial only has a single chance a day. Anyway! Users here are given some benefit of the doubt. When you post anonymously there is less scrutiny and there is not a BIG BADGE of THE 30 YEAR VETERAN OF 10,000 LANGUAGES! lurking behind the half-right trough of sloth.

        Here is a perfect and current example of your making–

        Design your application from the very start so that content editors can devise new e-mails and change their content just by editing template files, which you read from a designated location separate from your program. Your software "renders the template" to produce the content that is then e-mailed. It generates a list of values which the editors can use in the templates.
        Re^2: Formatting a MAil in PERL

        Ostensibly a cromulent answer. There is nothing intrinsically wrong with it. It is, however, painfully lazy. Embarrassingly trite. A beginner or dilettante might be excused and encouraged for it but not a long term user who has been cautioned 100 times against it. Consider an analog offered on another topic–

        Be safe in your car from the moment you climb in so that all safety features work as designed within the car which you do by following the owner's manual. Your key "starts the car" to produce motive power that then impels the vehicle. It takes you where you want to go.

        Valueless and distracting at best. Dangerous and disruptive at worst. It's not seven, as I've demonstrated. I guarantee that if you write one of your 7 paragraph soft-serve diatribes or technical fubars instead of handwaving anonymously, you'll get your usual 30+ downvotes. Playing the victim is always going to backfire because there is a mountain of evidence that it's just not the case. Sticking to sunk costs and the Monte Carlo fallacy is going to continue in the only direction it can.

        "the Magnificent Seven."

        You aren't counting properly. One person preferring your answer to their own doesn't necessarily make it a "favourite", by definition.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1215329]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2019-04-20 14:13 GMT
Find Nodes?
    Voting Booth?
    I am most likely to install a new module from CPAN if:

    Results (110 votes). Check out past polls.

    • (Sep 10, 2018 at 18:53 UTC) Welcome new users!