Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^5: Perl custom sort for Portuguese Lanaguage (updated x2)

by haukex (Bishop)
on Jul 08, 2020 at 18:03 UTC ( #11119039=note: print w/replies, xml ) Need Help??


in reply to Re^4: Perl custom sort for Portuguese Lanaguage
in thread Perl custom sort for Portuguese Lanaguage

In that case it's fairly easy. I used Text::CSV to read the data file, but AFAIK it doesn't support ignoring comment lines. If you are certain your files are always going to be as simple as you showed, only two columns separated by | and no |s anywhere else, no quoted fields, etc., then it's also possible to parse the file manually with a regex, for example:

open my $fh, '<:encoding(UTF-8)', $filename or die "$filename: $!"; my @rows = map { /^([^|]+)\|([^|]+?)$/ or die $_; [$1,$2] } grep { /\S/ && !/^\s*#/ } <$fh>; close $fh;

And then you can use @rows instead of @$rows in my example above.

Update: Minor simplification to code.

Update 2: And soonix makes a good point that continuing to use Text::CSV is also most likely fine, since it's probably safe to assume that you don't have any actual data that starts with #.

Replies are listed 'Best First'.
Re^6: Perl custom sort for Portuguese Lanaguage
by hippo (Bishop) on Jul 08, 2020 at 20:04 UTC
    I used Text::CSV to read the data file, but AFAIK it doesn't support ignoring comment lines.

    This works for me:

    csv (in => 'quux.csv', filter => {1 => sub { !/^#/ }});
      This works for me: csv (in => 'quux.csv', filter => {1 => sub { !/^#/ }});

      Unfortunately that also filters lines whose first field is "#foo" (with the quotes). I remember Tux recently saying filtering before parsing wasn't supported, though I'm having trouble finding the reference at the moment (it could have been in the chatterbox too*). It may be a bit tricky because this is valid CSV too:

      abc,"d #e f",ghi

      (That's one row, ["abc", "d\n#e\nf", "ghi"].)

      * Update: I looked again and I think it must have been in the chatterbox; I do distinctly remember someone having a similar question recently...

        The meta info knows whether the field was quoted or not.
        #!/usr/bin/perl use warnings; use strict; use Text::CSV_XS; my $csv = 'Text::CSV_XS'->new ({ binary => 1, auto_diag => 1, keep_meta_info => 1 }); open my $in, '<:encoding(utf8)', shift or die $!; while (my $row = $csv->getline($in)) { next if $row->[0] =~ m/^#/ && ! $csv->is_quoted(0); $csv->say(*STDOUT, $row); }

        Tested with

        #x,y,z skip abc,"d #e f",ghi keep #comment skip a,b,c,#xyz keep "#foo",x,y,z keep
        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        In this special case it looks like there won't be portuguese words starting with a "#", so it would work for OP, as long as he is aware of it

      If you only want the first lines starting with # to be filtered, that is indeeed what filter is for:

      use Data::Peek; use Text::CSV_XS qw( csv ); my $r = 0; my $aoa = csv (in => *DATA, filter => sub { $_[1][0] =~ m/^\s*#/ ? $r +: ++$r; }); DDumper $aoa; __END__ # This is comment # and so is this # and this a,b,c #but,not,this 1,2,3

      -->

      [ [ 'a', 'b', 'c' ], [ '#but', 'not', 'this' ], [ '1', '2', '3' ] ]

      Enjoy, Have FUN! H.Merijn

        Thanks! I see you're filtering lines beginning with # when they occur at the beginning of the file; the way I understood the OP's sample data is that the comments can occur anywhere. And my worry was that, even though in the OP's data this is probably not the case, filter-based solutions will remove lines that may actually not be comments, and I wasn't sure if there was a easy solution for this?

        use warnings; use strict; use Data::Peek; use Text::CSV_XS qw/csv/; DDumper csv( in=>*DATA, escape_char=>"\\", filter => sub { $_[1][0] !~ m/^\s*#/ }); __DATA__ # This is a comment a,b,c # Also a comment x,y,z "#not",a,comment \#also,not,"a comment"

        Output:

        [ [ 'a', 'b', 'c' ], [ '' ], [ 'x', 'y', 'z' ], [ '' ] ]

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11119039]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (2)
As of 2022-01-28 06:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (73 votes). Check out past polls.

    Notices?