[Perl 6] List of length 2 or...

Yesterday I hinted in two different replies to what is basically the same question (how to "read pairs from a file, in a situation that's natural to be handled by split, and put them into a hash?") to the fact that whichever technique one adopted, she should sanitize the pair she got in case it is not actually a pair. I also hinted to the fact that the techinque I "chose" is more sensible to such an inconvenience: there are tons of ways to sanitize it in Perl 5 already. One that I'm certain some people would disagree with but that I would probably choose is:

my %pairs = map { chomp; map @$_ == 2 ? @$_ : (), [split /\t/, $_, 2] 
+} <FH>;
[download]

if I just wanted to ignore "broken" records, or maybe, if I wanted to supply defaults for missing keys:

my %pairs = map { chomp; map @$_ == 2 ? @$_ : 
                (@$_ == 1 ? $_->[0] => undef : ()),
                [split /\t/, $_, 2] } <FH>;
[download]

or possibly written like

my %pairs = map { chomp; map @$_ == 2 ? @$_ : 
                (@$_ == 1 ? @$_, undef : ()),
                [split /\t/, $_, 2] } <FH>;
[download]

depending on the psychological feeling I want to convey...

In all three examples what bothers me most even if I'm far from being a premature optimization kinda guy is to "have" to take a reference only to dereference it: it strikes me as doing something unnecessary... but I like the conceptual terseness, especially of the first, while the latter ones begin to be visually obtrusive...

Whatever: I wonder whether Perl 6 provides a means to handle such situations in an even more clear, concise and syntactically sweet manner. Now that I'm about to terminate writing the post it occurs to me that in this particular case "the" solution may be in terms of a pair of adverbs for split which would respectively:

"specify the return list length," much like the current third parameter;
specify how to fill it in case splitted items are not enough, either in a dwimmy way or by some list construction method - and we've seen quite cool ones lately.

But what in more general situations?

One last minute thought: one characteristic of English I've always envied e.g. to my own mother tongue, i.e. Italian, is the ease with which one part of the discourse can be transformed into another one. For example from noun to adjective, and so on. Now I wonder whether in this vein a list could be made equivalent to the action of taking a list and thus whether it would be possible to apply an adverb to it, as in the paragraph above. Or am I just brainstorming too much, and perhaps inconsistently?

--
~~If you can't understand the incipit, then please check the IPB Campaign.~~

Comment on [Perl 6] List of length 2 or... Select or Download Code

Replies are listed 'Best First'.

Re: [Perl 6] List of length 2 or...
by TimToady (Parson) on Oct 22, 2008 at 16:24 UTC

split

comb

my %hash = $fh.slurp.comb(/ ^^ (\T*) \t (\N*) /);
[download]

my %hash = $fh.slurp ~~ m:g/ ^^ (\T*) \t (\N*) /;
[download]

.comb

If I didn't want to slurp the file, I'd use the convenient .lines method instead of the generic but cryptic prefix:<=> operator. I might write some kind of list comprehension:

my %hash = ($0,$1 if / ^^ (\T*) \t (\N*) / for $fh.lines);
[download]

for

my %hash = do for $fh.lines {
    / ^^ (\T*) \t (\N*) / and $0,$1;
}
[download]

gather

take

my %hash = gather for $fh.lines {
    take / ^^ (\T*) \t (\N*) / || ();
}
[download]

:)

[reply]
[d/l]
[select]

Re^2: [Perl 6] List of length 2 or...

by Anonymous Monk on Oct 23, 2008 at 10:57 UTC

Where might one read about .comb?

[reply]

Re^3: [Perl 6] List of length 2 or...

by moritz (Cardinal) on Oct 23, 2008 at 11:02 UTC

S29

in the test suite

[reply]

Re: [Perl 6] List of length 2 or...
by kyle (Abbot) on Oct 22, 2008 at 14:56 UTC

I don't know about Perl 6, but I don't see any reason for the reference judo here.

my %h = map { chomp; (split /\s+/, $_, 2)[0,1] } <DATA>;
print Dumper \%h;
__DATA__
two 2
one
three 3 X
[download]

Output:

$VAR1 = { 
          'three' => '3 X',
          'one' => undef,
          'two' => '2'
        };
[download]

But then, I probably also wouldn't use map to loop over a reading filehandle this way either. I don't think the brevity quite justifies reading the whole file before processing.

Update: And if you want to add in default values, another map would make that explicit easier than adding more into the one loop.

my %h = map { defined ? $_ : $default }
        map { chomp; (split /\s+/, $_, 2)[0,1] } <DATA>;
[download]

[reply]
[d/l]
[select]

Re^2: [Perl 6] List of length 2 or...

by blazar (Canon) on Oct 23, 2008 at 18:29 UTC

I don't know about Perl 6, but I don't see any reason for the reference judo here.

Read more... (1463 Bytes)

No good reason but the obvious one that it just didn't occur to me! I only see a problem with it: it works nicely or at least in accordance with my own chosen example when one value is missing from a line, but when two are then I get both

a warning under warnings;
an empty key associated with an undef value.

See?

aleph:~ [19:27:57]$ perl -wMData::Dumper -e 'print Dumper {undef() => 
+undef}'
Use of uninitialized value in anonymous hash ({}) at -e line 1.
$VAR1 = {
          '' => undef
        };
[download]

I don't see any equally elegant and simple way to get rid of those. So we're somewhat back to my point...

Again, I was also about to ask you, just to destroy the missing-one-value-only case too: what if undef were not the default value I wanted? (Actually, you won't believe it but just before posting the root node, the default value in my code was 0 and only before pressing the button did I change it to undef which I thought was better suited for a generic example.) But I see that you thought of it yourself:

my %h = map { defined ? $_ : $default } map { chomp; (split /\s+/, $_, 2)[0,1] } <DATA>;
[download]

in this respect I must say that whenever I see or think of chained maps and greps I feel discomfortable with them even if they're aesthetically appealing and convert them to either a single map if at least one is present in the chain or to a single grep (theorically: I have yet to see an actual chain of greps...) In fact, I don't know whether the multiple loops they implicitly perform are actually optimized to a single one: this would certainly be easy when the mapping is 1-1 but not quite so in the general case... But often it is, if the operation is performed by a human as in this case and perhaps this may be the way to go:

my %pairs = map { chomp; 
    my($k,$v)=(split ' ', $_, 2)[0,1];
    defined $k ? ($k => $v // $default) : () } <DATA>;
[download]

At least, this is clear enough for a human to read. Otherwise another solution that I'm pasting here just for fun and still playing with references is:

my %pairs = map { chomp; 
    map @{ ([], [@$_, $default], $_)[scalar @$_] }, 
    [split ' ', $_, 2] } <DATA>;
[download]

Here, what bothers me most is the long scalar thingie, but Perl had to resolve the possibly ambiguous (LIST)[EXPR] expression and it did so by always assuming it's a list slice. Of course, I may use [0+@$_] but that would make me feel like I were playing golf, which is not the case now... But of course, it would be different if we were under Perl 6 and a single plus forcing numeric context would suffice: then again, if we were then we would be doing something entirely different to start with, so end of the story!

To put it briefly, as far as I'm concerned, it would be best for me if there were some syntax available that would enable working with lists without enclosing them in references, making my last solution both more aesthetically appealing, intuitive, comprehensible, and avoiding the referencing / dereferencing madness...

But then, I probably also wouldn't use map to loop over a reading filehandle this way either. I don't think the brevity quite justifies reading the whole file before processing.

Actually, I specified in the node in which I (seriously) suggested the technique for the first time that it had sense because the file would have been processed as whole anyway. Of course, it's still not exactly the same thing, and it is only sensible a thing to do if it's reasonably sized. Since we're talking about Perl 6 here, it is worth reminding that it would be different if we had lazy list evaluation by default, as it actually has.

--
If you can't understand the incipit, then please check the IPB Campaign.

[reply]
[d/l]
[select]

Re: [Perl 6] List of length 2 or...
by moritz (Cardinal) on Oct 22, 2008 at 14:59 UTC

Pair

How do I turn a List into a Pair?

In general, if you want to coerce an object of type Foo to type Bar, you call the .Bar of that object. If class Foo provides such a conversion, that is.

So if List defines a sensible conversion to Pair (which I don't know), it's as simple as

my %hash = =$fh.map: { .split(/\t/, 2).Pair };
[download]

If it's not, you can go the way of a temporary variable, which you seem to avoid in your examples:

my %hash = =$fh.map: { my @a = .split(/\t/, 2); @a[0] => @a[1] };
[download]

(This will store an undef as the value of the pair if there's not \t in $_).

If you don't want to use that temporary variable, you can re-use $_ in an inner lexical scope with this evil trick:

my %hash = =$fh.map: { given .split(/\t/, 2) { .[0] => .[1] } };
[download]

This uses .[$index] to index $_ (all method calls without an explicit object work on $_).

It's not fundamentally better than the Perl 5 approach IMHO, so I'll keep thinking about a nicer solution.

[reply]
[d/l]
[select]

Re: [Perl 6] List of length 2 or...
by Jenda (Abbot) on Oct 22, 2008 at 23:37 UTC

I'd actually rather sanitize the input

my %pairs = map {chomp; split /\t/, $_, 2} grep /.\t./, <FH>;
[download]

Jenda
Support Denmark!
Defend the free world!

[reply]
[d/l]

Re: [Perl 6] List of length 2 or...
by ikegami (Patriarch) on Oct 22, 2008 at 16:53 UTC

map @$_ == 2 ? @$_ : (),
is
grep @$_ == 2,

But what kind of sanitation silently discards?

[reply]
[d/l]
[select]

Re^2: [Perl 6] List of length 2 or...

by moritz (Cardinal) on Oct 22, 2008 at 16:58 UTC

grep @$_ == 2

$_

@$_

The equivalent is map @$_, grep @$_ == 2, I think.

[reply]
[d/l]
[select]

Back to Meditations