Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number


by AppleFritter (Priest)
on Apr 27, 2014 at 20:42 UTC ( #1084031=user: print w/ replies, xml ) Need Help??

Howdy, partner! Name's Apple Fritter, pleasure to meet y'all! I use Perl, but I don't know that much about it (yet). I'm trying to change that, so I frequent the Monastery, reading others' answers and code to learn, and providing my own answers and code to hone my skills.

Not affiliated with Tom Owad's

Note: I'm not active on Perlmonks anymore. Here's why, in a nutshell.

For new users:

N.B. when crossposting to several sites, it is considered polite to inform readers of this and provide links to avoid unnecessary/duplicated effort.

For established users:

The Monastery:

General infrastructure:

Perl culture:

Misc. (unordered, unsorted):

  • Earlier versions of this node (Wayback Machine)
  • Perl Reference Card v2 (PDF)
  • Perl Best Practices Reference Guide (PDF)
  • Rebase issues in Cygwin ("child_info_fork::abort: address space needed by 'IO.dll' (...) is already occupied" etc.): rebaseall
  • Dictionary of Algorithms, Data Structures, and Problems
  • Suggestions for working with poor code
  • Higher-order Perl:
  • The Timeline of Perl and its Culture
  • "Evil uses for Perl":
  • Different programming languages:
  • Rants and criticism:
  • Perl forks etc.:
    • dave_the_m, Re: What's the perl5's future?:

      [...] AFAIKT, stableperl isn't intended to be an ongoing fork of perl; its simply a snapshot of 5.22 created by Marc [Lehmann] with whatever commits broke his code removed.

      cperl is a recent fork of perl by Reini Urban who is incapable of working with others, so is writing a perl where he can do whatever he likes, unfettered by the difficulties of having to reach consensus. [...]

    • cperl
    • stableperl
    • RPerl
  • Games:
  • Useful tools, s‎crip‎ts and programs:
  • Perl Obfuscator
  • Seven Levels of Perl Mastery
  • unmaintainable code
  • Hiding the source code for a Perl program:
  • Symbol names: Almost 28 new names for 32 old marks, Re: Almost 28 new names for 32 old marks
  • I want more monkquips
  • Perl Humour
  • Interactive Perl shell (aka Read-Eval-Print Loop): Devel::REPL (includes the shell)
  • perlepigraphs - list of Perl release epigraphs
  • How's your Perl?
  • Re^7: About GD Image Data Output Methods - des‎crip‎tion of the GD image format
  • Dominus's file of Good Advice:

    1. You cannot just paste code with no understanding of what is going on and expect it to work.
    2. You can't just make shit up and expect the computer to know what you mean, Retardo!
    3. You said it didn't work, but you didn't say what it would have done if it *had* worked.
    4. What are you really trying to accomplish here?
    5. Who the fuck cares which one is faster?
    6. Now is the time in our program where you look at the manual.
    7. Look at the error message! Look at the error message!
    8. Looking for a compiler bug is the strategy of LAST resort. LAST resort.
    9. Premature optimization is the root of all evil.
    10. Bad programmer! No cookie!
    11. I see you omitted $! from the error message. It won't tell you what went wrong if you don't ask it to.
    12. You wrote the same thing twice here. The cardinal rule of programming is that you never ever write the same thing twice.
    13. Evidently it's important to you to get the wrong answer as quickly as possible.
    14. Gee, I don't know. I wonder what the manual says about that?
    15. Well, no duh. That's because you ignored the error message, dimwit.
    16. Only Sherlock Holmes can debug the program by pure deduction from the output. You are not Sherlock Holmes. Run the fucking debugger already.
    17. Always ignore the second error message unless the meaning is obvious.
    18. Read. Learn. Evolve.
    19. Well, then get one that *does* do auto-indent. You can't do good work with bad tools.
    20. No. You must believe the ERROR MESSAGE. You MUST believe the error message.
    21. The error message is the Truth. The error message is God.
    22. It could be anything. Too bad you didn't bother to diagnose the error, huh?
    23. You don't suppress error messages, you dumbass, you PAY ATTENTION and try to understand them.
    24. Never catch a signal except as a last resort.
    25. Well, if you don't know what it does, why did you put it in your program?
    26. Gosh, that wasn't very bright, was it?
    27. That's like taking a crap on someone's doorstep and then ringing the doorbell to ask for toilet paper.
    28. A good approach to that problem would be to hire a computer programmer.
    29. First get a book on programming. Then read it. Then write the program.
    30. First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.
    31. Would you like to see my rate card?
    32. I think you are asking the wrong question here.
    33. Holy cow.
    34. Because it's a syntax error.
    35. Because this is Perl, not C.
    36. Because this is Perl, not Lisp.
    37. Because that's the way it is.
    38. Because.
    39. If you have `some weird error', the problem is probably with your frobnitzer.
    40. Because the computer cannot read your mind. Guess what? I cannot read your mind *either*.
    41. You said `It doesn't work'. The next violation will be punished by death.
    42. Of course it doesn't work! That's because you don't know what you are doing!
    43. Sure, but you have to have some understanding also.
    44. Ah yes, and you are the first person to have noticed this bug since 1987. Sure.
    45. Yes, that's what it's supposed to do when you say that.
    46. Well, what did you expect?
    47. Perhaps you have forgotten that this is an engineering discipline, not some sort of black magic.
    48. You know, this sort of thing is amenable to experimental observation.
    49. Perhaps your veeblefitzer is clogged.
    50. What happens when you try?
    51. Now you are just being superstitious.
    52. Your question has exceeded the system limit for pronouns in a single sentence. Please dereference and try again.
    53. In my experience that is a bad strategy, because the people who ask such questions are the ones who paste the answer into their program without understanding it and then complain that it `does not work'.
    54. Of course, this is a heuristic, which is a fancy way of saying that it doesn't work.
    55. If your function is written correctly, it will handle an empty array the same way as a nonempty array.
    56. When in doubt, use brute force.
    57. Well, it might be more intuitive that way, but it would also be useless.
    58. Show the code.
    59. The bug is in you, not in Perl.
    60. Cargo-cult.
    61. So you threw in some random punctuation for no particular reason, and then you didn't get the result you expected. Hmmmm.
    62. How should I know what is wrong when I haven't even seen the code? I am not clairvoyant.
    63. How should I know how to do what you want when you didn't say what you wanted to do?
    64. It's easy to get the *wrong* answer in O(1) time.
    65. I guess this just goes to show that you can lead a horse to water, but you can't make him drink it.
    66. You are a stupid asshole. Shut the fuck up.

  • Wikipedia on Smalltalk-80:

    Even the statement true become: false is valid in Smalltalk, although executing it is not recommended.

Monk quotes:

Do not fear death, you will re-awaken to a world built with Perfect Perl 7 and no Python.
-- boftx, Re^3: Using die() in methods

the moment you try to separate the physical construction of code -- kloc, function points, abstracts test quantities -- from the intellectual processes of gathering requirements; understanding work-patterns and flows; and imagining suitable, appropriate, workable algorithms to meet them; you do not have sufficient understanding of the process involved in code development to be making decisions about it.
-- BrowserUk, Re: Nobody Expects the Agile Imposition (Part VII): Metrics

You were unlucky in the sense that your program seems to have remained valid Perl even with all variables removed.
-- Corion, Re: [OneLiner] What am I doing wrong in my regex?

I insist on being paid to use Windows products, sir!
-- Your Mother, Re^3: PerlWizard - A free wizard for automatic Perl software code generation using simple forms

No further rational discussion is possible here because I find your preferred style utterly abhorrent :)
-- BrowserUk, Re^3: Porting (old) code to something else

Remember the Perl motto: when in doubt, use a hash!
-- Athanasius, Re^3: Need help in extracting timestamp from the line in a file

AppleFritter elsewhere:

Two monks sat together for lunch. The first monk said, "What do you see when you see me?"
The second replied, "I see a reflection of the Buddha."
The first, feeling nasty, said, "When I look at you, I see a pile of shit."
The second just smiled. The first turned angry. "Why are you smiling?"
The second replied, "What comes out of a man is a reflection of what's inside a man. I am filled with the Buddha nature, so everywhere I look, I see a reflection of the Buddha."

Posts by AppleFritter
Size-limited, fitness-based lists in Cool Uses for Perl
3 direct replies — Read more / Contribute
by AppleFritter
on Aug 08, 2015 at 19:05

    Monks and monkettes! I recently found myself wondering, what's the longest words in the dictionary (/usr/share/dict, anyway)?

    This is easily found out, but it's natural to be interested not just in the longest word but (say) the top ten. And when your dictionary contains (say) eight words of length fifteen and six words of length fourteen, it's also natural to not want to arbitrarily select two of the latter, but list them all.

    I quickly decided I needed a type of list that would have a concept of the fitness of an item (not necessarily the length of a word), and try not to exceed a maximum size if possible (while retaining some flexibility). My CPAN search-fu is non-existent, but since it sounded like fun, I just rolled my own. Here's the first stab at what is right now called List::LimitedSize::Fitness (if anyone's got a better idea for a name, please let me know):

    This features both "flexible" and "strict" policies. With the former, fitness classes are guaranteed to never lose items, but the list as a whole might grow beyond the specified maximum size. With the latter, the list is guaranteed to never grow beyond the specified maximum size, but fitness classes might lose items. (Obviously you cannot have it both ways, not in general.)

    Here's an example of the whole thing in action:

    This might output (depending on your dictionary):

    $ perl wordsEn.txt .......... length 21 antienvironmentalists antiinstitutionalists counterclassification electroencephalograms electroencephalograph electrotheraputically gastroenterologically internationalizations mechanotheraputically microminiaturizations microradiographically length 22 counterclassifications counterrevolutionaries electroencephalographs electroencephalography length 23 disestablismentarianism electroencephalographic length 25 antidisestablishmentarian length 28 antidisestablishmentarianism 19 words total (10 requested). $

    If you've got any thoughts, tips, comments, rotten tomatoes etc., send them my way! (...actually, forget about the rotten tomatoes.)

    Also, does anyone think this module would be useful to have on CPAN, in principle if not in its current state?

Resetting a flip-flop operator in Seekers of Perl Wisdom
1 direct reply — Read more / Contribute
by AppleFritter
on Aug 06, 2015 at 06:52

    Greetings, esteemed monks! Allow this humble pony to drink the sweet nectar of knowledge from the font of your collective wisdom. (Or alternatively, how 'bout some hard cider?)

    I need to read a number of files. In each file, each line holds a piece of data, or a marker indicating the beginning or end of a section; I'm interested only in data in a specific section. Normally, I'd do something like this:

    foreach my $HANDLE (@HANDLES) { while(<$HANDLE>) { chomp; next unless /^PP_START$/ .. /^PP_END$/; # process line } }

    However, it turns out that in these log files, the section end marker may be omitted if there is no following section: the end of the file itself indicates the end of the section then.

    This wreaks havoc with the above logic, as the flip-flop operator, not having seen the marker, still evaluates to true when the outer loop moves on to the next file, and wrongly causes lines before the start marker in that file to be processed.

    Of course it would be trivial to add a flag indicating whether I'm in the right section, and reset that for each file. But doing that would essentially manually emulate the flip-flop operator, which strikes me as less than elegant. So I'm wondering -- is there a way to "reset" the flip-flop operator, as it were, so that it starts returning false again at the beginning of each new file?

"Unrecognized character" while use utf8 is in effect in Seekers of Perl Wisdom
2 direct replies — Read more / Contribute
by AppleFritter
on Apr 17, 2015 at 06:03

    Oh monks most tawny and tangy, whose wisdom and knowledge of all things Perl is unalienable and indefeasible, help me out, for I'm very much missing the obvious.

    As you will well know, Perl allows Unicode characters in variable names, so long as use utf8; is in effect. So the following snippet works as expected (apologies for the unresolved HTML entities, Perlmonks itself does not handle Unicode properly):

    my $&#x4EBA; = "World"; say "Hello, $&#x4EBA;";

    However, the following does not:

    my $&#1F310; = "World"; say "Hello, $&#1F310;";

    Perl 5.20.0 complains about this, saying:

    Unrecognized character \x{1f310}; marked by <-- HERE after my $<-- + HERE near column 5 at line 9.

    This is even though the character is in Unicode 6.3.0, which Perl 5.20.0 supports.

    So why isn't it working? Help me out, fellow monks.

perl 5.21.10 released in Perl News
1 direct reply — Read more / Contribute
by AppleFritter
on Mar 20, 2015 at 17:21

    Perl 5.21.10, another development release, came out on March 20th (that's today!). Get it on CPAN or on metaCPAN while it's hot!

    And here's the perldelta as well:

    (This my first time posting a piece of Perl news. If I broke anything, e.g. a link, please /msg me and I'll fix it.)

Identifying scripts (writing systems) in Cool Uses for Perl
2 direct replies — Read more / Contribute
by AppleFritter
on Sep 16, 2014 at 17:32

    Dear monks and nuns, priests and scribes, popes and antipopes, saints and stowaways lurking in the monastery, lend me your ears. (I promise I'll return them.) I'm still hardly an experienced Perl (user|programmer|hacker), but allow me to regale you with a story of how Perl has been helping me Get Things Done™; a Cool Use for Perl, or so I think.

    I was recently faced with the problem of producing, given a number of lines each written in a specific script (i.e. writing system; Latin, Katakana, Cyrillic etc.), a breakdown of scripts used and how often they appeared. Exactly the sort of problem Perl was made for - and thanks to regular expressions and Unicode character classes, a breeze, right?

    I started by hardcoding a number of scripts to match my snippets of text against:

    my %scripts; foreach (@lines) { my $script = m/^\p{Script=Latin}*$/ ? "Latin" : m/^\p{Script=Cyrillic}*$/ ? "Cyrillic" : m/^\p{Script=Han}*$/ ? "Han" : # ... "(unknown)"; $scripts{$script}++; }

    Obviously there's a lot of repetition going on there, and though I had a list of scripts for my sample data, I wasn't sure new and uncontemplated scripts wouldn't show up in the future. So why not make a list of all possible scripts, and replace the hard-coded list with a loop?

    my %scripts; LINE: foreach my $line (@lines) { foreach my $script (@known_scripts) { next unless $line =~ m/^\p{Script=$script}*$/; $scripts{$script}++; next LINE; } $scripts{'(unknown)'}++; }

    So far, so good, but now I needed a list of the scripts that Perl knew about. Not a problem, I thought, I'll just check perluniprops; the list of properties Perl knows about was staggering, but I eventually decided that any property of the form "\p{Script: ...}" would qualify, so long as it had short forms listed (which I took as an indication that that particular property was the "canonical" form for the script in question). After some reading and typing and double-checking, I ended up with a fairly long list:

    my @known_scripts = ( "Arabic", "Armenian", "Avestan", "Balinese", "Bamum", "Batak", "Bengali", "Bopomofo", "Brahmi", "Br +aille", "Buginese", "Buhid", "Canadian_Aboriginal", "Carian", "Chakma", "Cham", "Cherokee", "Coptic", "Cuneiform", "Cypriot", "Cyrillic", # ... );

    Unfortunately, when I ran the resulting script, Perl complained:

    Can't find Unicode property definition "Script=Chakma" at (...) line ( +...)

    What had gone wrong? Versions, that's what: I'd looked at the perluniprops page on, documenting Perl 5.20.0, but this particular Perl was 5.14.2 and didn't know all the scripts that the newer version did, thanks to being built against an older Unicode version. Now, I could've just looked at the locally-installed version of the same perldoc page, but - wouldn't it be nice if the script automatically adapted itself to the Perl version it ran on? I sure reckoned it'd be.

    What scripts DID the various Perl versions recognize, anyway? What I ended up doing (perhaps there's an easier way) was to look at lib/unicore/Scripts.txt for versions 5.8, 5.10, ..., 5.20 in the Perl git repo (I skipped 5.6 and earlier, because a) the relevant file didn't exist in the tree yet back then, and b) those versions are ancient, anyway). And by "look at", I mean download (as scripts-58.txt etc.), and then process:

    $ for i in 8 10 12 14 16 18 20; do perl scripts-5$i.txt >5$ +i.lst; done $ for i in 8 10 12 14 16 18; do diff --unchanged-line-format= --new-li +ne-format=%L 5$i.lst 5$((i+2)).lst >5$((i+2)).new; done $ was a little helper script to extract script information (apologies for the confusing terminology, BTW):

    #!/usr/bin/perl use strict; use warnings; use feature qw/say/; my %scripts; while(<>) { next unless m/; ([A-Za-z_]*) #/; $scripts{$1}++; } $, = "\n"; say sort { $a cmp $b } map { $_ = ucfirst lc; $_ =~ s/(?<=_)(.)/uc $1/ +ge; qq/"$_"/ } keys %scripts;

    I admit, I got lazy at this point and manually combined those files (58.lst, as well as, etc.) into a hash holding all the information, instead of having a script output it. Nonetheless, once this was done, I could easily load all the right scripts for a given Perl version:

    # New Unicode scripts added in Perl 5.xx my %uniscripts = ( '8' => [ "Arabic", "Armenian", "Bengali", "Bopomofo", "Buhid", "Canadian_Aboriginal", "Cherokee", "Cyrillic", "Deseret", "Devanagari", "Ethiopic", "Georgian", "Gothic", "Greek", "Guja +rati", "Gurmukhi", "Han", "Hangul", "Hanunoo", "Hebrew", "Hiragana", "Inherited", "Kannada", "Katakana", "Khmer", "Lao", "Latin", "Malayalam", "Mongolian", "Myanmar", "Ogham", "Old_Italic", "O +riya", "Runic", "Sinhala", "Syriac", "Tagalog", "Tagbanwa", "Tamil", "Telugu", "Thaana", "Thai", "Tibetan", "Yi" ], '10' => [ "Balinese", "Braille", "Buginese", "Common", "Coptic", "Cuneif +orm", "Cypriot", "Glagolitic", "Kharoshthi", "Limbu", "Linear_B", "New_Tai_Lue", "Nko", "Old_Persian", "Osmanya", "Phags_Pa", "Phoenician", "Shavian", "Syloti_Nagri", "Tai_Le", "Tifinagh", "Ugaritic" ], '12' => [ "Avestan", "Bamum", "Carian", "Cham", "Egyptian_Hieroglyphs", "Imperial_Aramaic", "Inscriptional_Pahlavi", "Inscriptional_Parthian", "Javanese", "Kaithi", "Kayah_Li", "Lepcha", "Lisu", "Lycian", "Lydian", "Meetei_Mayek", "Ol_Chik +i", "Old_South_Arabian", "Old_Turkic", "Rejang", "Samaritan", "Saurashtra", "Sundanese", "Tai_Tham", "Tai_Viet", "Vai" ], '14' => [ "Batak", "Brahmi", "Mandaic" ], '16' => [ "Chakma", "Meroitic_Cursive", "Meroitic_Hieroglyphs", "Miao", "Sharada", "Sora_Sompeng", "Takri" ], '18' => [ ], '20' => [ ], ); (my $ver = $^V) =~ s/^v5\.(\d+)\.\d+$/$1/; my @known_scripts; foreach (keys %uniscripts) { next if $ver < $_; push @known_scripts, @{ $uniscripts{$_} }; } print STDERR "Running on Perl $^V, ", scalar @known_scripts, " scripts + known.\n";

    The number of scripts Perl supports this way WILL increase again soon, BTW. Perl 5.21.1 bumped the supported Unicode version to 7.0.0, adding another bunch of new scripts as a result:

    # tentative! '22' => [ "Bassa_Vah", "Caucasian_Albanian", "Duployan", "Elbasan", "Gra +ntha", "Khojki", "Khudawadi", "Linear_A", "Mahajani", "Manichaean", "Mende_Kikakui", "Modi", "Mro", "Nabataean", "Old_North_Arabia +n", "Old_Permic", "Pahawh_Hmong", "Palmyrene", "Pau_Cin_Hau", "Psalter_Pahlavi", "Siddham", "Tirhuta", "Warang_Citi" ],

    But that's still in the future. For now I just tested this on 5.14.2 and 5.20.0 (the two Perls I regularly use); it worked like a charm. All that was left to do was outputting those statistics:

    print "Found " . scalar keys(%scripts) . " scripts:\n"; print "\t$_: " , $scripts{$_}, " line(s)\n" foreach(sort { $a cmp $b } + keys %scripts);

    (You'll note that in the above two snippets, I'm using print rather than say, BTW. That's intentional: say is only available from Perl 5.10 on, and this script is supposed to be able to run on 5.8 and above.)

    Fed some sample data that I'm sure Perlmonks would mangle badly if I tried to post it, this produced the following output:

    Running on Perl v5.14.2, 95 scripts known. Found 18 scripts: Arabic: 21 line(s) Bengali: 2 line(s) Cyrillic: 12 line(s) Devanagari: 3 line(s) Georgian: 1 line(s) Greek: 1 line(s) Gujarati: 1 line(s) Gurmukhi: 1 line(s) Han: 29 line(s) Hangul: 3 line(s) Hebrew: 1 line(s) Hiragana: 1 line(s) Katakana: 1 line(s) Latin: 647 line(s) Sinhala: 1 line(s) Tamil: 4 line(s) Telugu: 1 line(s) Thai: 1 line(s)

    Problem solved! And not only that, it's futureproof now as well, adapting to additional scripts in my input data, and easily extended when new Perl versions support more scripts, while maintaining backward compatibility.

    What could still be done? Several things. First, I should perhaps find out if there's an easy way to get this information from Perl, without actually doing all the above.

    Second, while Perl 5.6 and earlier aren't supported right now, they could be. Conveniently, the 3rd edition of Programming Perl documents Perl 5.6; the \p{Script=...} syntax for character classes doesn't exist yet, I think, but one could write \p{In...} instead, e.g. \p{InArabic}, \p{InTamil} and so on. Would this be worth it? Not for me, but the possibility is there if someone else ever had the need to run this on an ancient Perl. (Even more ancient Perls may not have the required level of Unicode support for this, though I wouldn't know for sure.)

    Lastly, since the point of this whole exercise was to identify writing systems used for snippets of text, there's room for optimization. Perhaps it would be faster to precompile a regular expression for each script, especially if @lines is very large. Most of the text I'm dealing with is in the Latin script; as such, I should perhaps test for that before anything else, and generally try to prioritize so that lesser-used scripts are pushed further down the list. Since I'm already keeping a running total of how often each script has been seen, this could even be done adaptively, though whether doing so would be worth the overhead in practice is another question, one that could only be answered by measuring.

    But neither speed nor support for ancient Perls is crucial to me, so I'm done. This was a fun little problem to work on, and I hope you enjoyed reading about it.

Log In?

What's my password?
Create A New User
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2016-02-08 08:37 GMT
Find Nodes?
    Voting Booth?

    How many photographs, souvenirs, artworks, trophies or other decorative objects are displayed in your home?

    Results (271 votes), past polls