Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Sorting Text File for HTML Output

by michael.kitchen (Novice)
on Sep 14, 2017 at 06:46 UTC ( #1199374=perlquestion: print w/replies, xml ) Need Help??
michael.kitchen has asked for the wisdom of the Perl Monks concerning the following question:

I searched the database and found a lot of information about sorting but unfortunately I could not quite wrap my brain around what I was seeing so decided to post

I have a text file that has names of books, number of pages, etc. The information is separated by colons (:). I could manually create sorted lists but what's the fun in that? I have not made any attempts that I can show, but I will list the code as it is, which deals with the formatted data being used in HTML output

my $file = 'books/bookLIST.txt'; open my $info, $file or die "Could not open $file: $!"; while(my $line = <$info>) { foreach ($line) { chop; @vlinks = split(/:/,$_) ; print "<tr><td><a href=\"bookDETAIL.cgi?book=$vlinks[1]\">$vli +nks[0]</a></td><td>$vlinks[2]</td><td>$vlinks[3]</a></td></tr>" } } # each line in the text file is as follows: # Title of Book:title_of_book:<number of pages>:Author's Name

I would like to be able to sort by vlinks[0], vlinks[2] or vlinks[3] but cannot figure where to sort and how. Any help would be appreciated.

Replies are listed 'Best First'.
Re: Sorting Text File for HTML Output (updated x2)
by haukex (Monsignor) on Sep 14, 2017 at 07:21 UTC

    I would recommend a module like Text::CSV to read a delimited file like this, as it automatically takes over the handling of quotes, escaped delimiters, etc. But since this question is about sorting, let's leave it at that for now that for the update below. First, to sort, it's easiest to read the entire file into memory (unless it's too big for that, but I'm going to assume for now it's not).

    my $file = 'books/bookLIST.txt'; open my $info, '<', $file or die "Could not open $file: $!"; my @lines = <$info>; close $info;

    I have used the three-argument form of open here, which is generally recommended nowadays. Next, you should build the data structure you want to sort, I'll build an array of hashes. It's also possible to combine this with the above step and build the data structure directly while you're reading the file, saving the intermediate @lines array, like you're doing in the original code (I would've probably written it that way to begin with, I guess my morning caffeine hadn't fully kicked in yet when I wrote this node ;-) ).

    my @vlinks; for my $line (@lines) { chomp($line); my %row; @row{qw/title title2 pages author/} = split /:/, $line, 4; push @vlinks, \%row; } use Data::Dump; dd \@vlinks;

    You should use chomp instead of chop. The @row{...} syntax is a hash slice, and I limited the number of fields split will split into so no data gets lost. I wasn't sure what the difference between the first two fields in your file is, so you might want to pick a more descriptive name than I did (title2). You can then use Data::Dumper or Data::Dump to look at your data structure, as I showed above. Next the interesting part: the sorting. The technique is described in the FAQ How do I sort an array by (anything)? - You compare the primary sort field, but if that comparison returns 0 (meaning they are equal), you move on to the next criteria, and so on. Note I am using the numerical comparison operator <=> instead of the string comparison operator cmp (see perlop) on the pages field, as I assume it's numeric.

    @vlinks = sort { $a->{title} cmp $b->{title} or $a->{pages} <=> $b->{pages} or $a->{author} cmp $b->{author} } @vlinks;

    Then you can output your data (but see "Update 2" below!).

    for my $vlink (@vlinks) { print "<tr><td><a href=\"bookDETAIL.cgi?book=$vlink->{title2}\">" ."$vlink->{title}</a></td><td>$vlink->{pages}</td>" ."<td>$vlink->{author}</a></td></tr>\n" }

    Update: Here's how to do it with Text::CSV (also install Text::CSV_XS for speed), note this also takes care of building the array of hashes for you, i.e. it replaces the first two pieces of code above. However, there is one minor difference, it does not limit the number of fields to four as the above code does.

    use Text::CSV; my $file = 'books/bookLIST.txt'; open my $info, '<', $file or die "Could not open $file: $!"; my $csv = Text::CSV->new({binary=>1, auto_diag=>2, sep_char=>":", allow_whitespace=>1 }); $csv->column_names(qw/title title2 pages author/); my @vlinks; while ( my $row = $csv->getline_hr($info) ) { push @vlinks, $row; } $csv->eof or $csv->error_diag; close $info;

    Update 2: In your HTML generation, you should guard against any special characters in the input file messing up your HTML, such as quotes or angle brackets. Depending on where your input data is coming from, you may also be exposing yourself to a Cross-site scripting (XSS) attack (longer explanation) if you don't escape the special characters. The very minimum is HTML::Entities as I show here, but you can also look into other ways to "safely" generate HTML like maybe using HTML::Tiny, Template::Toolkit, or even one of the many web frameworks.

    use HTML::Entities qw/encode_entities/; for my $vlink (@vlinks) { print "<tr><td><a href=\"bookDETAIL.cgi?book=" .encode_entities($vlink->{title2})."\">" .encode_entities($vlink->{title})."</a></td><td>" .encode_entities($vlink->{pages})."</td><td>" .encode_entities($vlink->{author})."</td></tr>\n" }

    Update 3: Added the two comments about building the structure in one step and the potential security issue.

      Thanks haukex!!!! Really, thanks!!! You gave a lot of explanations and I grasped some of them. Most of my information comes from an old book and realize that much of my code could be out of date with better ways of coding things. Thanks for the following code:
       
      @row{qw/title title2 pages author/} = split /:/, $line, 4;
      More user friendly. :)
       
      The only thing I couldn't get to work was:

      use Data::Dump; dd \@vlinks;
      I would get an error 500 message. I omitted it and everything worked great.

      Just to fill in some missing information....
      ->title could include any character except a colon
      ->title2 acted as a filename so I could reference artwork
      ->pages number of pages as a number (assumption correct)
      ->author could include any character except a colon
       
      The data is added to txt file by way of another cgi program which has all the substitutions I need.

Re:Sorting Text File for HTML Output
by afoken (Abbot) on Sep 14, 2017 at 16:45 UTC
    while(my $line = <$info>) { foreach ($line) { chop; @vlinks = split(/:/,$_) ; print "<tr><td><a href=\"bookDETAIL.cgi?book=$vlinks[1]\">$vli +nks[0]</a></td><td>$vlinks[2]</td><td>$vlinks[3]</ +a></td></tr>" } }

    Some notes on that piece of code:

    • chop is not what you want. chop chops off any character found at the end of the string. Use chomp, that only removes trailing newlines. If your input file does not end with a newline, chop damages data. chomp does not.
    • foreach ($line) is quite useless, you are not iterating over an array here. But it makes $_ an alias for $line. You use $_ only as explicit argument to split. Better remove foreach and write the split simply as @vlinks = split /:/,$line;.
    • You could entirely avoid $line, because while (<$handle>) already implicitly assigns to $_ and checks for end of file. In that case, split can also be reduced to a single argument, as it operates on $_ by default. So:
      while (<$info>) { @vlinks=split /:/; # ... }
    • You use @vlinks only inside the while loop, so you should limit its scope to that loop: while (<$info>) { my @vlinks=split /:/;
    • And, as many other monks told you: Don't try to handle CSV files manually, use Text::CSV and its fast companion, Text::CSV_XS. CSV may look like a cute and harmless file format, but it's like a gremlin waiting for being feed in a shower after midnight. CSV lacks a proper, agreed specification, and so everybody has invented variants for quoting, escaping, empty fields, NULL, binary data, line endings and tons of other nasty details over the last decades. Tux++ has put a lot of effort into the Text::CSV family of modules (including DBD::CSV) to handle all of that ugly stuff for you.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Sorting Text File for HTML Output
by clueless newbie (Chaplain) on Sep 14, 2017 at 14:04 UTC
    Using DBD::CSV
    #!/usr/bin/env perl use Data::Dumper; use DBI; use Try::Tiny; use strict; use warnings; try { my $dbh = DBI->connect ("dbi:CSV:", undef, undef, { f_dir => "./", f_ext => ".csv/r", f_lock => 2, csv_eol => "\r\n", csv_sep_char => "|", RaiseError => 1, PrintError => 1, FetchHashKeyName => "NAME_lc", }) or die $DBI::errstr; # Sorting by author my $sth=$dbh->prepare(<<"__SQL__"); select * from books order by author __SQL__ $sth->execute(); my $field_aref=$sth->{NAME_lc}; warn Data::Dumper->Dump([\$field_aref],[qw(*field_aref)]),' '; while (my $value_aref=$sth->fetchrow_arrayref()) { warn Data::Dumper->Dump([\$value_aref],[qw(*value_aref)]),' '; }; } catch { Carp::confess $!; };
    on ./books.csv which contains
    title1|title2|pages|author Perl Best Practices|hound|a few|Damian Conway Learning Perl|llama|thin|Randal Schwartz Programming Perl|camel|many|Larry Wall
    results in
    $field_aref = \[ # fields 'title1', 'title2', 'pages', 'author' ]; at data.plx line 28. $value_aref = \[ # first when sorted by aut +hor 'Perl Best Practices', 'hound dog', 'a few', 'Damian Conway' ]; at data.plx line 30. $value_aref = \[ 'Programming Perl', 'camel', 'many', 'Larry Wall' ]; at data.plx line 30. $value_aref = \[ 'Learning Perl', 'llama', 'thin', 'Randal Schwartz' ]; at data.plx line 30.
Re: Sorting Text File for HTML Output
by Laurent_R (Canon) on Sep 14, 2017 at 16:19 UTC
    Hi michael.kitchen,

    You've been shown how to sort in Perl.

    An alternate solution might be to use the sort program probably provided by your operating system and to pipe the output to your Perl program, which would then have to read from the standard input, with something like while (my $line = <>) { ....

    The shell command would be something like this:

    $ sort [-options] books/bookLIST.txt | perl your_program.pl
    I'm doing that quite often because I'm dealing with files which are too large to fit into memory, so using the OS sort utility (which knows well how to deal with very large files) is quite convenient.
Re: Sorting Text File for HTML Output
by holli (Monsignor) on Sep 16, 2017 at 10:56 UTC
    As you are generating HTML, you may want to consider a Javascript based solution. That way the server doesn't have to care and can defer the sorting to the client (and hence let the user decide what to sort for).

    There even is an example of how to do that on the W3C web page, with clickable headers and shit.


    holli

    You can lead your users to water, but alas, you cannot drown them.
Re: Sorting Text File for HTML Output
by Anonymous Monk on Sep 15, 2017 at 14:48 UTC
    For files of moderate size or if you really don't care how long it takes the sort-compare subroutine can also be programmed to split the key string into its component parts every time. The components are then compared by the subroutine as has already been shown. The difference is the classic trade-off between "speed versus space." Splitting all the strings ahead of time saves time but costs memory and vice-versa. (Because of the number of times a sort-compare subroutine might be executed, the difference in speed can be substantial.) Also note that many external sort-commands are quite capable of handling delimited-string keys efficiently, such that you might not actually have to "write a program" to get the job done.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1199374]
Front-paged by Corion
help
Chatterbox?
[1nickt]: Discipulus I was sleeping in :-) It's a Holyday here. No work today. Just eating. Enjoy your day everymonk!

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (9)
As of 2017-11-23 14:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:













    Results (336 votes). Check out past polls.

    Notices?