Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Sorting Text File for HTML Output (updated x2)

by haukex (Monsignor)
on Sep 14, 2017 at 07:21 UTC ( #1199378=note: print w/replies, xml ) Need Help??


in reply to Sorting Text File for HTML Output

I would recommend a module like Text::CSV to read a delimited file like this, as it automatically takes over the handling of quotes, escaped delimiters, etc. But since this question is about sorting, let's leave it at that for now that for the update below. First, to sort, it's easiest to read the entire file into memory (unless it's too big for that, but I'm going to assume for now it's not).

my $file = 'books/bookLIST.txt'; open my $info, '<', $file or die "Could not open $file: $!"; my @lines = <$info>; close $info;

I have used the three-argument form of open here, which is generally recommended nowadays. Next, you should build the data structure you want to sort, I'll build an array of hashes. It's also possible to combine this with the above step and build the data structure directly while you're reading the file, saving the intermediate @lines array, like you're doing in the original code (I would've probably written it that way to begin with, I guess my morning caffeine hadn't fully kicked in yet when I wrote this node ;-) ).

my @vlinks; for my $line (@lines) { chomp($line); my %row; @row{qw/title title2 pages author/} = split /:/, $line, 4; push @vlinks, \%row; } use Data::Dump; dd \@vlinks;

You should use chomp instead of chop. The @row{...} syntax is a hash slice, and I limited the number of fields split will split into so no data gets lost. I wasn't sure what the difference between the first two fields in your file is, so you might want to pick a more descriptive name than I did (title2). You can then use Data::Dumper or Data::Dump to look at your data structure, as I showed above. Next the interesting part: the sorting. The technique is described in the FAQ How do I sort an array by (anything)? - You compare the primary sort field, but if that comparison returns 0 (meaning they are equal), you move on to the next criteria, and so on. Note I am using the numerical comparison operator <=> instead of the string comparison operator cmp (see perlop) on the pages field, as I assume it's numeric.

@vlinks = sort { $a->{title} cmp $b->{title} or $a->{pages} <=> $b->{pages} or $a->{author} cmp $b->{author} } @vlinks;

Then you can output your data (but see "Update 2" below!).

for my $vlink (@vlinks) { print "<tr><td><a href=\"bookDETAIL.cgi?book=$vlink->{title2}\">" ."$vlink->{title}</a></td><td>$vlink->{pages}</td>" ."<td>$vlink->{author}</a></td></tr>\n" }

Update: Here's how to do it with Text::CSV (also install Text::CSV_XS for speed), note this also takes care of building the array of hashes for you, i.e. it replaces the first two pieces of code above. However, there is one minor difference, it does not limit the number of fields to four as the above code does.

use Text::CSV; my $file = 'books/bookLIST.txt'; open my $info, '<', $file or die "Could not open $file: $!"; my $csv = Text::CSV->new({binary=>1, auto_diag=>2, sep_char=>":", allow_whitespace=>1 }); $csv->column_names(qw/title title2 pages author/); my @vlinks; while ( my $row = $csv->getline_hr($info) ) { push @vlinks, $row; } $csv->eof or $csv->error_diag; close $info;

Update 2: In your HTML generation, you should guard against any special characters in the input file messing up your HTML, such as quotes or angle brackets. Depending on where your input data is coming from, you may also be exposing yourself to a Cross-site scripting (XSS) attack (longer explanation) if you don't escape the special characters. The very minimum is HTML::Entities as I show here, but you can also look into other ways to "safely" generate HTML like maybe using HTML::Tiny, Template::Toolkit, or even one of the many web frameworks.

use HTML::Entities qw/encode_entities/; for my $vlink (@vlinks) { print "<tr><td><a href=\"bookDETAIL.cgi?book=" .encode_entities($vlink->{title2})."\">" .encode_entities($vlink->{title})."</a></td><td>" .encode_entities($vlink->{pages})."</td><td>" .encode_entities($vlink->{author})."</td></tr>\n" }

Update 3: Added the two comments about building the structure in one step and the potential security issue.

Replies are listed 'Best First'.
Re^2: Sorting Text File for HTML Output (updated x2)
by michael.kitchen (Initiate) on Sep 15, 2017 at 00:18 UTC

    Thanks haukex!!!! Really, thanks!!! You gave a lot of explanations and I grasped some of them. Most of my information comes from an old book and realize that much of my code could be out of date with better ways of coding things. Thanks for the following code:
     
    @row{qw/title title2 pages author/} = split /:/, $line, 4;
    More user friendly. :)
     
    The only thing I couldn't get to work was:

    use Data::Dump; dd \@vlinks;
    I would get an error 500 message. I omitted it and everything worked great.

    Just to fill in some missing information....
    ->title could include any character except a colon
    ->title2 acted as a filename so I could reference artwork
    ->pages number of pages as a number (assumption correct)
    ->author could include any character except a colon
     
    The data is added to txt file by way of another cgi program which has all the substitutions I need.

        Maybe the dump gets printed before the header?

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1199378]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2017-09-24 02:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    During the recent solar eclipse, I:









    Results (273 votes). Check out past polls.

    Notices?