Re^14: Text::CSV encoding parse()

ok here's a full program that produces the same result. It pulls in a $resultsFile called "results.txt."

#! /strawberry/Perl/bin/perl

use CGI;
use CGI::Carp qw( fatalsToBrowser );
use Text::CSV;
use Excel::Writer::XLSX;
use utf8;
use strict;

### read the output
my(@urls);
my($header);
my($resultsFile) = "results.txt";

open my $fh, "<:encoding(utf8)", "$resultsFile" || die("cannot open re
+sults file $resultsFile for reading.");
my($c)=0;   # just here for counting
my($d)=0;   # just here for counting
while(<$fh>){
 $c++;
 if ($_ =~ /\/search\//){
  push(@urls, $_);
 }
 else{
  $d++;
 }
}
close($fh);

# sort @urls based on the search string 

my @sorted_urls =
  map  { $_->[0] }
  sort { $a->[1] cmp $b->[1] }
  map  { m|/search/\s*([^\?]+)\?|; [$_, $1] }
  @urls;

my($count) = -1;

my $csv = Text::CSV->new ({ binary => 1, sep_char => "|" });
my $q = new CGI;

# parse and print
print $q->header(-charset    => 'utf-8');
print $q->start_html( -title      => 'SearchME');

print $q->start_table();
foreach my $row (@sorted_urls){
# print TEMP $row;
 $csv->parse($row);
 print "<tr>";
 $count++;


 my @els = $csv->fields;

 my(@splits) = split('\|',$row);

 $els[0] =~ /\/search\/(.+)\?scope=/i;
 my($term) = $1;

 my($link) = $els[0];

 print "<td>";
 # print $link;
 print $q->a({-href=>$link,-target=>'_blank'},$term);
 print "</td>";

  for(my $i=1; $i <= 4; $i++){
   print "<td>";
   print $els[$i];
   print "</td>";
  }   
 print "</tr>\n";
}
print $q->end_table,
$q->end_html;
[download]

And here's a short results file (not sure how to keep it from wrapping, so it's not code):

PAGE_COMPL_URL|PAGE_REFRL_COMPL_URL|IBMER|VIEWS|VISITORS|ENGAGED_VISITS
https://www.ibm.com/support/knowledgecenter/es/search/¿Cuales son las partes de una cadena de conexión??scope=SSGU8G_12.1.0|https://www.ibm.com/support/knowledgecenter/es/SSGU8G_12.1.0/com.ibm.jdbc_pg.doc/ids_jdbc_011.htm|0|1|1|0
https://www.ibm.com/support/knowledgecenter/search/onsmsync?scope=SSGU8G_12.1.0|https://www.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm.sec.doc/ids_lb_002.htm|1|1|1|1

Thank you.

Comment on Re^14: Text::CSV encoding parse() Download Code

Replies are listed 'Best First'.
Re^15: Text::CSV encoding parse() by hippo (Bishop) on Aug 22, 2019 at 08:49 UTC
Thank you for providing an SSCCE. There's a lot which could be removed from it but the first line is the one setting off the klaxons. `#! /strawberry/Perl/bin/perl` Are you running this on Microsoft Windows? If so, what have you done to confirm that your input data (your `results.txt` file) is genuinely UTF-8 encoded?	[reply] [d/l]
Re^16: Text::CSV encoding parse() by slugger415 (Monk) on Aug 22, 2019 at 18:25 UTC
I think I did, with haukex's script https://www.perlmonks.org/?node_id=11104578 and yes, Win 7	[reply]
Re^17: Text::CSV encoding parse() by hippo (Bishop) on Aug 23, 2019 at 13:26 UTC
Here then is an SSCCE which works for me. It shows both the broken output which you should be seeing at the moment and the fixed output after it is properly encoded. I've taken the liberty of cleaning up some of your code and removing all the HTML-generating functions from CGI.pm as those are deprecated. In fact I've removed CGI.pm entirely since you weren't using it for anything else. #!/usr/bin/env perl use utf8; use strict; use Text::CSV; use Encode 'encode'; ### read the output my $resultsFile = "results.txt"; open my $fh, "<:encoding(utf8)", $resultsFile or die "cannot open re +sults file $resultsFile for reading: $!"; my @urls = grep {/\/search\//} <$fh>; close ($fh); print "Content-type: text/html; charset=utf-8\n\n"; print "<h2>Pre-sort</h1><ul>"; print "<li>$_</li>" for @urls; print "</ul>\n"; print "<h2>Same, but encoded</h2><ul>"; print encode ('UTF-8', "<li>$_</li>") for @urls; print "</ul>\n"; # sort @urls based on the search string my @sorted_urls = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { m\|/search/\s*([^\?]+)\?\|; [$_, $1] } @urls; my $csv = Text::CSV->new ({binary => 0, sep_char => "\|"}); print "<h2>Broken sorted</h2>\n"; # parse and print print '<table>'; foreach my $row (@sorted_urls) { # print TEMP $row; $csv->parse ($row); my @els = $csv->fields; $els[0] =~ /\/search\/(.+)\?scope=/i; my ($term) = $1; my ($link) = $els[0]; print "<tr>"; print qq#<td><a href="$link" target="_blank">$term</a></td>#; print "<td>$_</td>\n" for @els[1 .. 4]; print "</tr>\n"; } print '</table>'; print "<h2>Same, but encoded</h2>"; # parse and print print '<table>'; foreach my $row (@sorted_urls) { # print TEMP $row; $csv->parse ($row); my @els = $csv->fields; $els[0] =~ /\/search\/(.+)\?scope=/i; my ($term) = $1; my ($link) = $els[0]; print "<tr>"; print encode ('UTF-8', qq#<td><a href="$link" target="_blank">$term</a></td>#); print "<td>$_</td>\n" for @els[1 .. 4]; print "</tr>\n"; } print '</table>'; [download] Hopefully you can see here that you just need to ensure that you properly encode the output. There are many ways to do this, I've picked one here which is very explicit, for your benefit. See Encode for lots more on how to use this module.	[reply] [d/l]
Re^18: Text::CSV encoding parse() by slugger415 (Monk) on Aug 23, 2019 at 16:49 UTC


We don't bite newbies here... much
	PerlMonks