http://www.perlmonks.org?node_id=718759

Yesterday I hinted in two different replies to what is basically the same question (how to "read pairs from a file, in a situation that's natural to be handled by split, and put them into a hash?") to the fact that whichever technique one adopted, she should sanitize the pair she got in case it is not actually a pair. I also hinted to the fact that the techinque I "chose" is more sensible to such an inconvenience: there are tons of ways to sanitize it in Perl 5 already. One that I'm certain some people would disagree with but that I would probably choose is:

my %pairs = map { chomp; map @$_ == 2 ? @$_ : (), [split /\t/, $_, 2] +} <FH>;

if I just wanted to ignore "broken" records, or maybe, if I wanted to supply defaults for missing keys:

my %pairs = map { chomp; map @$_ == 2 ? @$_ : (@$_ == 1 ? $_->[0] => undef : ()), [split /\t/, $_, 2] } <FH>;

or possibly written like

my %pairs = map { chomp; map @$_ == 2 ? @$_ : (@$_ == 1 ? @$_, undef : ()), [split /\t/, $_, 2] } <FH>;

depending on the psychological feeling I want to convey...

In all three examples what bothers me most even if I'm far from being a premature optimization kinda guy is to "have" to take a reference only to dereference it: it strikes me as doing something unnecessary... but I like the conceptual terseness, especially of the first, while the latter ones begin to be visually obtrusive...

Whatever: I wonder whether Perl 6 provides a means to handle such situations in an even more clear, concise and syntactically sweet manner. Now that I'm about to terminate writing the post it occurs to me that in this particular case "the" solution may be in terms of a pair of adverbs for split which would respectively:

But what in more general situations?

One last minute thought: one characteristic of English I've always envied e.g. to my own mother tongue, i.e. Italian, is the ease with which one part of the discourse can be transformed into another one. For example from noun to adjective, and so on. Now I wonder whether in this vein a list could be made equivalent to the action of taking a list and thus whether it would be possible to apply an adverb to it, as in the paragraph above. Or am I just brainstorming too much, and perhaps inconsistently?

--
If you can't understand the incipit, then please check the IPB Campaign.

Replies are listed 'Best First'.
Re: [Perl 6] List of length 2 or...
by TimToady (Parson) on Oct 22, 2008 at 16:24 UTC
    I've come around to the opinion that the desire to add extra adverbs or arguments should usually be taken as a design smell that you're trying to use the wrong tool for the job. In this case, I think Perl 5 programmers have a bit of unlearning to do, since split has become one of those all-you-have-is-a-hammers that gets used inappropriately on various everything-is-a-nails because of the absence of its figure/ground counterpart, comb. In Perl 6 you should use whichever one is more appropriate and readable. So if I didn't mind slurping the whole file, I'd probably just say:
    my %hash = $fh.slurp.comb(/ ^^ (\T*) \t (\N*) /);
    though perhaps a Perl 5 programmer would be more comfortable with the implicit iteration of a global regex:
    my %hash = $fh.slurp ~~ m:g/ ^^ (\T*) \t (\N*) /;
    But I think the explicit use of .comb is more readable.

    If I didn't want to slurp the file, I'd use the convenient .lines method instead of the generic but cryptic prefix:<=> operator. I might write some kind of list comprehension:

    my %hash = ($0,$1 if / ^^ (\T*) \t (\N*) / for $fh.lines);
    or maybe just the same thing written as a normal for loop:
    my %hash = do for $fh.lines { / ^^ (\T*) \t (\N*) / and $0,$1; }
    Or if I wanted to emphasize the data flow, I might use an explicit gather/take construct:
    my %hash = gather for $fh.lines { take / ^^ (\T*) \t (\N*) / || (); }
    I guess there's still more than one way to do it in Perl 6. :)
      Where might one read about .comb?
Re: [Perl 6] List of length 2 or...
by kyle (Abbot) on Oct 22, 2008 at 14:56 UTC

    I don't know about Perl 6, but I don't see any reason for the reference judo here.

    my %h = map { chomp; (split /\s+/, $_, 2)[0,1] } <DATA>; print Dumper \%h; __DATA__ two 2 one three 3 X

    Output:

    $VAR1 = { 'three' => '3 X', 'one' => undef, 'two' => '2' };

    But then, I probably also wouldn't use map to loop over a reading filehandle this way either. I don't think the brevity quite justifies reading the whole file before processing.

    Update: And if you want to add in default values, another map would make that explicit easier than adding more into the one loop.

    my %h = map { defined ? $_ : $default } map { chomp; (split /\s+/, $_, 2)[0,1] } <DATA>;
      I don't know about Perl 6, but I don't see any reason for the reference judo here.

      No good reason but the obvious one that it just didn't occur to me! I only see a problem with it: it works nicely or at least in accordance with my own chosen example when one value is missing from a line, but when two are then I get both

      • a warning under warnings;
      • an empty key associated with an undef value.

      See?

      aleph:~ [19:27:57]$ perl -wMData::Dumper -e 'print Dumper {undef() => +undef}' Use of uninitialized value in anonymous hash ({}) at -e line 1. $VAR1 = { '' => undef };

      I don't see any equally elegant and simple way to get rid of those. So we're somewhat back to my point...

      Again, I was also about to ask you, just to destroy the missing-one-value-only case too: what if undef were not the default value I wanted? (Actually, you won't believe it but just before posting the root node, the default value in my code was 0 and only before pressing the button did I change it to undef which I thought was better suited for a generic example.) But I see that you thought of it yourself:

      my %h = map { defined ? $_ : $default } map { chomp; (split /\s+/, $_, 2)[0,1] } <DATA>;

      in this respect I must say that whenever I see or think of chained maps and greps I feel discomfortable with them even if they're aesthetically appealing and convert them to either a single map if at least one is present in the chain or to a single grep (theorically: I have yet to see an actual chain of greps...) In fact, I don't know whether the multiple loops they implicitly perform are actually optimized to a single one: this would certainly be easy when the mapping is 1-1 but not quite so in the general case... But often it is, if the operation is performed by a human as in this case and perhaps this may be the way to go:

      my %pairs = map { chomp; my($k,$v)=(split ' ', $_, 2)[0,1]; defined $k ? ($k => $v // $default) : () } <DATA>;

      At least, this is clear enough for a human to read. Otherwise another solution that I'm pasting here just for fun and still playing with references is:

      my %pairs = map { chomp; map @{ ([], [@$_, $default], $_)[scalar @$_] }, [split ' ', $_, 2] } <DATA>;

      Here, what bothers me most is the long scalar thingie, but Perl had to resolve the possibly ambiguous (LIST)[EXPR] expression and it did so by always assuming it's a list slice. Of course, I may use [0+@$_] but that would make me feel like I were playing golf, which is not the case now... But of course, it would be different if we were under Perl 6 and a single plus forcing numeric context would suffice: then again, if we were then we would be doing something entirely different to start with, so end of the story!

      To put it briefly, as far as I'm concerned, it would be best for me if there were some syntax available that would enable working with lists without enclosing them in references, making my last solution both more aesthetically appealing, intuitive, comprehensible, and avoiding the referencing / dereferencing madness...

      But then, I probably also wouldn't use map to loop over a reading filehandle this way either. I don't think the brevity quite justifies reading the whole file before processing.

      Actually, I specified in the node in which I (seriously) suggested the technique for the first time that it had sense because the file would have been processed as whole anyway. Of course, it's still not exactly the same thing, and it is only sensible a thing to do if it's reasonably sized. Since we're talking about Perl 6 here, it is worth reminding that it would be different if we had lazy list evaluation by default, as it actually has.

      --
      If you can't understand the incipit, then please check the IPB Campaign.
Re: [Perl 6] List of length 2 or...
by moritz (Cardinal) on Oct 22, 2008 at 14:59 UTC
    In Perl 6 a hash is list of Pair objects, so I guess your question translates to How do I turn a List into a Pair?.

    In general, if you want to coerce an object of type Foo to type Bar, you call the .Bar of that object. If class Foo provides such a conversion, that is.

    So if List defines a sensible conversion to Pair (which I don't know), it's as simple as

    my %hash = =$fh.map: { .split(/\t/, 2).Pair };

    If it's not, you can go the way of a temporary variable, which you seem to avoid in your examples:

    my %hash = =$fh.map: { my @a = .split(/\t/, 2); @a[0] => @a[1] };

    (This will store an undef as the value of the pair if there's not \t in $_).

    If you don't want to use that temporary variable, you can re-use $_ in an inner lexical scope with this evil trick:

    my %hash = =$fh.map: { given .split(/\t/, 2) { .[0] => .[1] } };

    This uses .[$index] to index $_ (all method calls without an explicit object work on $_).

    It's not fundamentally better than the Perl 5 approach IMHO, so I'll keep thinking about a nicer solution.

Re: [Perl 6] List of length 2 or...
by Jenda (Abbot) on Oct 22, 2008 at 23:37 UTC
Re: [Perl 6] List of length 2 or...
by ikegami (Patriarch) on Oct 22, 2008 at 16:53 UTC

    map @$_ == 2 ? @$_ : (),
    is
    grep @$_ == 2,

    But what kind of sanitation silently discards?

      Not quite. grep @$_ == 2 returns $_ for each item, the map solution return @$_.

      The equivalent is map @$_, grep @$_ == 2, I think.