Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^5: Smart match in p5

by demerphq (Chancellor)
on Mar 14, 2005 at 16:24 UTC ( [id://439335]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Smart match in p5
in thread Smart match in p5

Double ~ is already extremely rare currently, and using ~~ such that it's parsed differently with my patch applied seems only possible in contrieved situations.

Based on what analysis? Is this just your opinion based on your experience or is it actually backed up a sampling of Perl modules from CPAN?

My point here is that i think this all is an interesting and clever idea that should go through the acceptance process, but that IMO you will need better evidence than that which you and Juerd have provided so far. Anyway.... I actually agree that this will probably prove to be a very rare construct, but you should have some solid numbers to back it up. Anyway, i hope you try to get this into blead, i think it would be nice.

---
demerphq

Replies are listed 'Best First'.
Re^6: Smart match in p5
by Juerd (Abbot) on Mar 14, 2005 at 16:44 UTC

    or is it actually backed up a sampling of Perl modules from CPAN?

    CPAN is not a good source of information, as ~~ (as short form of "scalar") is used mostly in one liners. That use is still valid, except with the weird print syntax. But ~~ was never meant like this anyway. ~ Just happens to be a reversible operation that works losslessly on both strings and numbers.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      Thing is its not good enough to say "CPAN is not a good source of information" nor is it good enough to say "probably wont break anything".

      If CPAN isnt good enough then what is better? If there isnt anything better then CPAN is all you have. If you can turn around to P5P and say gee whiz, in 5000 CPAN modules ~~ isnt used once then you have an argument, if you say to them "CPAN isnt a good source of information as ~~ is only used in one liners and this patch probably wont break anything" then I would have to assume they will not take your efforts seriously. Nor should they and nor should you expect them to...

      ---
      demerphq

        I walked into this one by catching mention of '~~' and of a count of how many modules might contain it as the discussion was ending for a while in the CB, and was unable to come up with the numbers before it ended. Here, however, are my numbers and methods, in hopes they may be of help.

        • Version of perl: 5.8.3, installed on Mandrake 10.0 from RPM.
        • Command used to generate listing: perl -MCPAN -e 'autobundle;'
        • Number of modules installed (perl -ne 'print if (m/CONTENTS/..m/CONFIGURATION/ and not m/^\s+/ and not m/^=/);' ~/.cpan/Bundle/Snapshot_2005_03_14_00.pm | wc -l): 3_566
        • Number of files under /usr/lib/perl5 (find /usr/lib/perl5 -type f | wc -l): 8_982
        • Number of lines in files under /usr/lib/perl5 (find /usr/lib/perl5 -type f -exec cat "{}" \; | wc -l): 3_298_000
        • Number of files containing '~~' (find /usr/lib/perl5 -type f -exec grep -il '~~' "{}" \; | wc -l): 93
        • Number of lines containing '~~' (find /usr/lib/perl5 -type f -exec grep -i '~~' "{}" \; | wc -l): 170
        • Number of files containing '~~', by type (find /usr/lib/perl5/ -type f -exec grep -il '~~' "{}" \; | xargs file | gawk '{FS=":"; print $2;}' | perl -pe 's/^\s+//;' | sort | uniq -c | sort -n):
          1ASCII Java program text
          1ELF
          1GIF image data, version 87a, 60 x 60
          1GIF image data, version 89a, 60 x 60
          1gzip compressed data, was "ArcBall.pm", from Unix, max compression
          1gzip compressed data, was "Cartography.pm", from Unix, max compression
          1gzip compressed data, was "Char.pm", from Unix, max compression
          1gzip compressed data, was "Complex.pm", from Unix, max compression
          1gzip compressed data, was "Core.pm", from Unix, max compression
          1gzip compressed data, was "FlexRaw.pm", from Unix, max compression
          1gzip compressed data, was "Image2D.pm", from Unix, max compression
          1gzip compressed data, was "Misc.pm", from Unix, max compression
          1gzip compressed data, was "PP.pm", from Unix, max compression
          1gzip compressed data, was "Primitive.pm", from Unix, max compression
          1gzip compressed data, was "Transform.pm", from Unix, max compression
          1gzip compressed data, was "Window.pm", from Unix, max compression
          1ISO-8859 C program text
          1ISO-8859 English text, with CRLF line terminators
          3data
          3Perl5 module source text
          11ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), not stripped
          13ASCII English text, with CRLF line terminators
          20ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), stripped
          25ASCII English text

        In looking through those files (minus those ending in '.so'), all cases I could find of '~~' appeared either in format statements, as delimiters, or in comments.

        I do not know if this information is of use, but I hope it will prove helpful. (The list of modules and versions is below.)

        Update: 14 Mar 2005
        The listing was too long; therefore, the remainder of the list will be appended as a response to this node.

Re^6: Smart match in p5
by Anonymous Monk on Mar 14, 2005 at 17:36 UTC
    CPAN is irrelevant when it comes to getting an estimate. Sure, if it breaks a gazillion CPAN modules, it'll break a lot. But the fast majority of Perl code out there isn't on CPAN, and isn't written by people keeping in touch with Perl development, or the Perl community.

    The fast majority of Perl code is written by anonymous coders. Good coders, bad coders, coders who know a lot of Perl, coders who use a subset, coders who code by example. It's even unknown on which systems perl runs on - so anything estimate on how many programs it will break is just guesswork.

      Sorry, but your argument amounts to "we will never know how likely it is to get cancer from smoking because the vast amount of people aren't in our study, and there are more in people not in our study than are in it thus the data on how many people got cancer is totally useless".

      Good thing you arent a doctor eh?

      The fact is that if you check cpan and find out that 100% of the the modules on it use ~~ then its a pretty darn good suggestion this will break a lot of things. If you check it and find out that it doesnt occur even once then its pretty darn good suggestion it wont be found much in the wild. Personally were I pumpking I would feel a lot more at ease integrating a patch in the later case, but certainly wouldnt in the former.

      Can we stop it with the "1 GB sample of code cant be used to establish usage frequencies" argument? Its a bit tired, and wasnt a good one in the first place anyway.

      ---
      demerphq

        Can we stop it with the "1 GB sample of code cant be used to establish usage frequencies" argument? Its a bit tired, and wasnt a good one in the first place anyway.

        It's a very good sample of code, when you sample modules. That many Perl programs are programmed in an entirely different style, is a fact. These quick sysadmin scripts, screen scrapers, data processors, etcetera, you don't find on CPAN. CPAN is mostly modules, and many people have a very different programming style for those. And since CPAN is public code, I think it is sane to assume people pay more attention to their style when writing for CPAN than for "nobody else will ever read or have to maintain this".

        On topic: there is no ~~ operator in Perl 5 yet. There is an ~ operator, and it can of course be stacked. The ~ operator does bitwise negation, and works on both numbers and strings, preserving the type. If you negate twice, you get the original input back. The side effects are interesting: it forces scalar context, and it stringifies things that aren't a number. Even though this is very interesting, this is a technique found mostly in golf and obfu. The readable, maintainable and documented way of forcing scalar context is the scalar operator. For stringification, "" . $foo and "$foo" are suggested and most commonly used. This is based on experience, not numbers.

        ~ is a unary prefix operator, the proposed ~~ is an infix operator. In practice, this means the only ambiguity you can get is when ~~ (without any whitespace in between the two operators) is abused to force scalar context for the first argument of a subroutine/function that is called without parentheses. The patch xmath made resolves this naturally resolves this to the old behaviour, breaking absolutely nothing. This is in agreement with the existing precedence table, which would get ~~ at the same level as the other equality tests. Ambiguity now only exists when there can be a term between the function's name and the first argument, without anything separating it from the first argument except maybe some whitespace. This is found in print $fh ~~localtime, but given that ~~ is obfuscation and a golf idiom, known by only Perl hackers, not found used in CPAN modules at all, my opinion is that it is safe to assume nothing will break. This time, it's based on actual research instead of "just" our combined experience.

        Still, if despite all the unlikeliness, something still breaks, it can be fixed in at least three ways. The first being writing what you mean instead of abusing ~, meaning you change ~~ to either scalar or "".. The second solution is simply inserting whitespace or the unary noop operator + between the two tildes: ~ ~ or ~+~. The third is adding parens so that ~~ immediately follows (, which means it can no longer be interpreted as an infix operator.

        Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://439335]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-20 02:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found