http://www.perlmonks.org?node_id=1199378


in reply to Sorting Text File for HTML Output

I would recommend a module like Text::CSV to read a delimited file like this, as it automatically takes over the handling of quotes, escaped delimiters, etc. But since this question is about sorting, let's leave it at that for now that for the update below. First, to sort, it's easiest to read the entire file into memory (unless it's too big for that, but I'm going to assume for now it's not).

my $file = 'books/bookLIST.txt'; open my $info, '<', $file or die "Could not open $file: $!"; my @lines = <$info>; close $info;

I have used the three-argument form of open here, which is generally recommended nowadays. Next, you should build the data structure you want to sort, I'll build an array of hashes. It's also possible to combine this with the above step and build the data structure directly while you're reading the file, saving the intermediate @lines array, like you're doing in the original code (I would've probably written it that way to begin with, I guess my morning caffeine hadn't fully kicked in yet when I wrote this node ;-) ).

my @vlinks; for my $line (@lines) { chomp($line); my %row; @row{qw/title title2 pages author/} = split /:/, $line, 4; push @vlinks, \%row; } use Data::Dump; dd \@vlinks;

You should use chomp instead of chop. The @row{...} syntax is a hash slice, and I limited the number of fields split will split into so no data gets lost. I wasn't sure what the difference between the first two fields in your file is, so you might want to pick a more descriptive name than I did (title2). You can then use Data::Dumper or Data::Dump to look at your data structure, as I showed above. Next the interesting part: the sorting. The technique is described in the FAQ How do I sort an array by (anything)? - You compare the primary sort field, but if that comparison returns 0 (meaning they are equal), you move on to the next criteria, and so on. Note I am using the numerical comparison operator <=> instead of the string comparison operator cmp (see perlop) on the pages field, as I assume it's numeric.

@vlinks = sort { $a->{title} cmp $b->{title} or $a->{pages} <=> $b->{pages} or $a->{author} cmp $b->{author} } @vlinks;

Then you can output your data (but see "Update 2" below!).

for my $vlink (@vlinks) { print "<tr><td><a href=\"bookDETAIL.cgi?book=$vlink->{title2}\">" ."$vlink->{title}</a></td><td>$vlink->{pages}</td>" ."<td>$vlink->{author}</a></td></tr>\n" }

Update: Here's how to do it with Text::CSV (also install Text::CSV_XS for speed), note this also takes care of building the array of hashes for you, i.e. it replaces the first two pieces of code above. However, there is one minor difference, it does not limit the number of fields to four as the above code does.

use Text::CSV; my $file = 'books/bookLIST.txt'; open my $info, '<', $file or die "Could not open $file: $!"; my $csv = Text::CSV->new({binary=>1, auto_diag=>2, sep_char=>":", allow_whitespace=>1 }); $csv->column_names(qw/title title2 pages author/); my @vlinks; while ( my $row = $csv->getline_hr($info) ) { push @vlinks, $row; } $csv->eof or $csv->error_diag; close $info;

Update 2: In your HTML generation, you should guard against any special characters in the input file messing up your HTML, such as quotes or angle brackets. Depending on where your input data is coming from, you may also be exposing yourself to a Cross-site scripting (XSS) attack (longer explanation) if you don't escape the special characters. The very minimum is HTML::Entities as I show here, but you can also look into other ways to "safely" generate HTML like maybe using HTML::Tiny, Template::Toolkit, or even one of the many web frameworks.

use HTML::Entities qw/encode_entities/; for my $vlink (@vlinks) { print "<tr><td><a href=\"bookDETAIL.cgi?book=" .encode_entities($vlink->{title2})."\">" .encode_entities($vlink->{title})."</a></td><td>" .encode_entities($vlink->{pages})."</td><td>" .encode_entities($vlink->{author})."</td></tr>\n" }

Update 3: Added the two comments about building the structure in one step and the potential security issue.