Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^8: Perl custom sort for Portuguese Lanaguage

by haukex (Bishop)
on Jul 09, 2020 at 14:08 UTC ( #11119085=note: print w/replies, xml ) Need Help??


in reply to Re^7: Perl custom sort for Portuguese Lanaguage
in thread Perl custom sort for Portuguese Lanaguage

Thanks! I see you're filtering lines beginning with # when they occur at the beginning of the file; the way I understood the OP's sample data is that the comments can occur anywhere. And my worry was that, even though in the OP's data this is probably not the case, filter-based solutions will remove lines that may actually not be comments, and I wasn't sure if there was a easy solution for this?

use warnings; use strict; use Data::Peek; use Text::CSV_XS qw/csv/; DDumper csv( in=>*DATA, escape_char=>"\\", filter => sub { $_[1][0] !~ m/^\s*#/ }); __DATA__ # This is a comment a,b,c # Also a comment x,y,z "#not",a,comment \#also,not,"a comment"

Output:

[ [ 'a', 'b', 'c' ], [ '' ], [ 'x', 'y', 'z' ], [ '' ] ]

Replies are listed 'Best First'.
Re^9: Perl custom sort for Portuguese Lanaguage
by Tux (Canon) on Jul 09, 2020 at 14:14 UTC

    So more or like like this:?

    DDumper csv ( in => *DATA, sep => "|", filter => sub { $_[1][0] =~ m/^\s*#/ && @{$_[1]} == 1 ? 0 : 1; }, );

    Which would not even need a ternary if slightly rewritten


    Enjoy, Have FUN! H.Merijn
      So more or like like this:?

      Closer, but then this no longer filters comments that contain the sep character (e.g. add "# This is a comment, too" to my example above)...

      Update: I realize this is less likely when sep=>'|', but my question is basically whether there's a "generic" way to filter lines. For example, I could load the file into memory and do s/^\s*#.*(?:\n|\z)//mg, but that would break any CSV data that contains embedded newlines that happen to match this pattern. In other words, with Text::CSV_XS, filter is only applied after parsing fields like "#foo" or \#foo to #foo, and I'm wondering if there's a hook into the parser before that takes place?

      Update 2: In the CB, you suggested in => \do { local $/; <DATA> =~ s/^\s*#.*(?:\n|\z)//mgr }, which gets closer as well, though it breaks this test case. Just for completeness, here are all the test cases so far combined into one data set:

      # This is a comment not,a,comment # This is a comment, too not,a,comment "#not",a,comment \#also,not,"a comment" foo,"bar # Not a comment, either! quz",baz

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11119085]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2022-05-16 18:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (63 votes). Check out past polls.

    Notices?