Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Meditations

( #480=superdoc: print w/ replies, xml ) Need Help??

If you've discovered something amazing about Perl that you just need to share with everyone, this is the right place.

This section is also used for non-question discussions about Perl, and for any discussions that are not specifically programming related. For example, if you want to share or discuss opinions on hacker culture, the job market, or Perl 6 development, this is the place. (Note, however, that discussions about the PerlMonks web site belong in PerlMonks Discussion.)

Meditations is sometimes used as a sounding-board — a place to post initial drafts of perl tutorials, code modules, book reviews, articles, quizzes, etc. — so that the author can benefit from the collective insight of the monks before publishing the finished item to its proper place (be it Tutorials, Cool Uses for Perl, Reviews, or whatever). If you do this, it is generally considered appropriate to prefix your node title with "RFC:" (for "request for comments").

User Meditations
Why PerlMonks is an amazing place.
4 direct replies — Read more / Contribute
by Anonymous Monk
on Mar 03, 2015 at 01:34

    Hi,

    I know this is not a question, but as an anonymonk, I cannot post it anywhere else, so here goes.

    I was the same guy who asked a question on "Re learning Perl". I even gave a small code snippet I wrote just to show how much I remember. This was pretty late in the night. I woke up half expecting someone to shoo me away at worst, or un answered post at best, because I was not sure how my question would be perceived. To my amazement and satisfaction, not a *single* sullen response!!!. The monks who responded genuinely wanted to guide. They suggested tips and books. Truly, there is no other forum like this. I've been here so many times, asked so many questions, but not even once, I repeat, not even once was I rudely sent back. I've seen much much worse forums, but there is something different here at the Monastery. Whenever someone asks me why I use Perl, I tell them 1) Because this is the only scripting language I know (and like) and 2) that they should visit PerlMonks. It's the most amazing place to come to ask questions and get genuine answers.

    It's also here at PerlMonks that I learnt that it's ok if you do not remember syntax, you can look up the documentation, no one expects you to remember everything by heart. As long as you have the basic fundamentals clear, you can barge ahead, and then fill in the gaps. You monks may not believe it, but there are other places not so forgiving/understanding/helpful.

    I hope the place stays as wonderful as it is right now. Thank you Monks.

Acknowledgement of Contributions
1 direct reply — Read more / Contribute
by jmlynesjr
on Mar 02, 2015 at 14:24

    Monks:

    Unlike most(a lot?) of you, I am retired and I enjoy learning Perl and wxPerl. As such, my time is free, which gives me a great appreciation for the time contributed to the Monastery by those of you who do make your living by doing Perl.

    A few days ago I got a 911 from my daughter who is writing her Doctoral Dissertation in Economics. She needed data on all the power plants in the US. The available data came from the EIA and EPA. As governments are famous for, the plant identifiers between the agencies are different. She ended up with a 10,000 row spreadsheet to normalize.

    I remembered reading posts on working with Excel. A search of the Monastery turned up Excel To Tab Delimited using Spreadsheet::ParseExcel posted by upallnight.

    Within an hour I had installed the module from CPAN and had the sample code running against her data. Several iterations later, I could extract selected columns into a hash to determine the unique plant names and generate a file of edits compatible with Matlab. I still have 700 rows to manually edit, but Perl has already saved us a lot of time.

    Whether you post a complete solution or just a hint, you never know who might can benefit from your knowledge even years after your post.

    Thanks for all of your contributions.

    Update: Fixed typo in title.

    James

    There's never enough time to do it right, but always enough time to do it over...

Porting (old) code to something else
8 direct replies — Read more / Contribute
by Tux
on Feb 16, 2015 at 09:41

    As I am in the process of porting Text::CSV_XS to perl6, I have learned not only a lot about perl6, but also about loose ends.

    We all know by now that perl6 is reasonable likely to be available in September (of this year), and depending on what you hope or expect, you might be delighted, disappointed, disillusioned or ecstatic (or anything in between: YMMV).

    My goal was to be able to present a module working in perl6 that would provide the user with as much functionality possible of what Text::CSV_XS offers: flexible feature-rich safe and fast CSV parsing and generation.

    For now I have to drop the "fast" requirement, but I am convinced that the speed will pick up later.

    Text::CSV_XS currently offers a testsuite with 50171 tests, so my idea was that if I convert the test suite to perl6 syntax, it good very well serve a point of proof for whatever I wrote in perl6.

    There's a few things that you need to know about me and my attitude towards perl6 before you are able to value what has happened (at least I see this as a valueable path, you might not care at all).

    I do not like the new whitespace issues that perl6 imposes on the code. It strikes *me* as ugly and illogical. That is the main reason why I dropped interest in perl6 very early in the process. In Sofia however, I had a conversation (again) with a group of perl6 developers who now proclaimed that they could meet my needs as perl6 now has a thing called "slang", where the syntax rules can be lexically changed. Not only did they tell me it was possible, but early October 2014, Slang::Tuxic was created just for me and low and behold, I could write code in perl6 without the single big annoyance that drove me away in the first place. This is NOT a post to get you to use this slang, it is (amongst other things) merely a reason to show that perl6 is flexible enough to take away annoyences.

    Given that I now can write beautiful perl (against, beauty in the eyes of the beholder), I regained enjoyment again and John and I were so stupid to take the fact that perl6 actually can be used now, to promise to pick a module in the perl5 area and port it to perl5. XS being an extra hurdle, we aimed high and decided to "do" CSV_XS.

    So, I have 50000+ tests that I want to PASS (ideally) but I soon found out that with perl6 having type checking, some tests are useless, as the perl6 compiler already catches those for you. Compare that to use strict in perl5, so I can just delete all tests that pass wrong arguments to the different methods.

    Converting error-dealing stuff was fun too but I think that if you try to mimic what people expect in perl5, it is not too hard to get the best of both worlds: I'm quite happy already with error-handling as it stands.

    So, the real reason for this post is what I found to have no answer to, as it wasn't tested for or had tests that were bogus.

    • What should be done when parsing a single line of data is valid, but there is trailing data in the line after parsing it.?
      $csv->parse (qq{"foo",,bar,1,3.14,""\r\n"more data});
      parse is documented to parse the line and enable you to get the fields with fields. In contrast with getline, which acts on IO, it will just discard all data beyond the EOL sequence. I am uncertain if that is actually what it should do. Should it "keep" the rest for the next iteration? Should it be discarded? Should it cause a warning? Or an error?
    • What is the correct way to deal with single ESCapes (given that Escape is not the default " or otherwise equal to QUOtation). Here I mean ESCape being used on a spot where it is not required or expected without the option to accept them as literal.
      $csv->escape_char ("+"); $csv->parse ("+"); $csv->parse ("+a"); $csv->parse ("+\n");
      Leave the ESCape as is? Drop it, being special? Warn or Error?

    Questions like those slowed down the conversion as a whole, as I can now take my own decisions with sane defaults (like binary => True) instead of caring about backward compatibility issues.

    Anyway, feel free to check out what I have done so far on my git repo and I welcome comments (other than those on style) in the issues section. Feel free to join #csv on IRC to discuss. Hope to see you (or some of you) at the next Dutch Perl Workshop 2015 in Utecht.


    Enjoy, Have FUN! H.Merijn
Good programming practices and changes in requirements: how to maintain good code
7 direct replies — Read more / Contribute
by DanBev
on Feb 11, 2015 at 09:10

    Hi Monks!

    We know software engineering principles and how to write maintainable code, and it's all well and good.
    AFAIK, we should also know that in the real world, with real projects and real requisites, we have to find a middle between software engineering and "make things work".

    I don't want an ideology war, I know what should be better, but IMHO I think - probably I'm wrong- maintain real code in real world, according the engineering principles it's very very hard. Not impossible, but difficult. That's because in real world, requisites changes too fast, and projects are not "so" descriptive: in real world, in real companies, in the control room there aren't always project manager with competences in IT.

    So, you can try to write your perfect code but after release - no beta testing, it's horrible but in real companies can happen-, control room changes requisites and operation, and this must be done for "yesterday", and they changes again and again and again, because they don't know really what they want.
    In order to satisfy "everything at once" you see your almost-well-written-code fall into WTF-code: you can control it almost, but entropy grows.

    How do you manage this situations, what are your experiences about?

    Greetings.
In place editing without reading further
2 direct replies — Read more / Contribute
by trippledubs
on Jan 27, 2015 at 14:22

    In transitioning Solaris Sparc sun4u to newer sun4v architecture we found that, sometimes, the image of the old server would not install onto the new server. The image file contains 20-30 text lines describing the system that was imaged and then the image itself. This file is quite large in some cases, takes a long time to create, and is made during an outage.

    The fix, once the image is already made, is quite janky. You need to append the string 'sun4v' into the the field 'content_architectures=' at the 20th or so. The other part is you do not want to read the rest of the file. Someone came up with this and saved the day. What do you think? Was there a better approach? Is there a way to do this using command line arguments that makes sense?

RFC: automating modular classes
1 direct reply — Read more / Contribute
by Arunbear
on Jan 26, 2015 at 08:44

    Minions is yet another OOP automation module, roughly similar to Moo, but which has the addtional goal of putting Encapsulation centre stage. I wrote it because after reading How Large Does Your Project Have To Be to Justify Using Moose? (especially the comments by tye and JavaFan), I became increasingly disillusioned with the Moo(se) way of OOP (essentially OOP with no Encapsulation).

    Here is a sample that implements the fixed size queue from Re^5: The future of Perl? (that sub-thread also illustrates limitations of the Moo way of OOP)
    package FixedSizeQueue; use Minions interface => [qw( push pop size )], construct_with => { max_items => { assert => { positive_int => sub { $_[0] =~ /^\d+$/ && $_[0 +] > 0 } }, }, }, implementation => 'FixedSizeQueue::Default', ; 1; package FixedSizeQueue::Default; use Minions::Implementation has => { q => { default => sub { [ ] } }, max_size => { init_arg => 'max_items', }, }, ; sub size { my ($self) = @_; scalar @{ $self->{$__q} }; } sub push { my ($self, $val) = @_; log_info($self); push @{ $self->{$__q} }, $val; if ($self->size > $self->{$__max_size}) { $self->pop; } } sub pop { my ($self) = @_; log_info($self); shift @{ $self->{$__q} }; } sub log_info { my ($self) = @_; warn sprintf "[%s] I have %d element(s)\n", scalar(localtime), $se +lf->size; } 1;
    And a sample of usage:
    % reply -I lib 0> use FixedSizeQueue 1> my $q = FixedSizeQueue->new(max_items => 3) $res[0] = bless( { '932db126-' => 'FixedSizeQueue::__Private', '932db126-max_size' => 3, '932db126-q' => [] }, 'FixedSizeQueue::__Minions' ) 2> $q->can $res[1] = [ 'pop', 'push', 'size' ] 3> $q->push($_) for 1 .. 3 [Mon Jan 26 12:01:53 2015] I have 0 element(s) [Mon Jan 26 12:01:53 2015] I have 1 element(s) [Mon Jan 26 12:01:53 2015] I have 2 element(s) $res[2] = '' 4> $q->pop [Mon Jan 26 12:02:09 2015] I have 3 element(s) $res[3] = 1 5> $q $res[4] = bless( { '932db126-' => 'FixedSizeQueue::__Private', '932db126-max_size' => 3, '932db126-q' => [ 2, 3 ] }, 'FixedSizeQueue::__Minions' ) 6> $q->push($_) for 4 .. 6 [Mon Jan 26 12:02:55 2015] I have 2 element(s) [Mon Jan 26 12:02:55 2015] I have 3 element(s) [Mon Jan 26 12:02:55 2015] I have 4 element(s) [Mon Jan 26 12:02:55 2015] I have 3 element(s) [Mon Jan 26 12:02:55 2015] I have 4 element(s) $res[5] = '' 7> $q $res[6] = bless( { '932db126-' => 'FixedSizeQueue::__Private', '932db126-max_size' => 3, '932db126-q' => [ 4, 5, 6 ] }, 'FixedSizeQueue::__Minions' ) 8> $q->log_info() Can't locate object method "log_info" via package "FixedSizeQueue::__M +inions" at reply input line 1. 9>
    Not all Moo features are supported (for this early release I've focused on those I actually use). Important differences from Moo include
    1. Attributes can be safely accessed inside classes without the overhead of a function call
    2. As a consequence of 1., attributes need not be exposed via methods (unless there is a good reason to do so).
    3. No need to clean up animal droppings
    4. Private subroutines via the mechanism shown in Re: OO - best way to have protected methods (packages)
    5. Class methods are "class only", and object methods are "object only"
    6. No compatibility with Moose
    Feedback is much appreciated.
The Boy Scout Rule
9 direct replies — Read more / Contribute
by eyepopslikeamosquito
on Jan 25, 2015 at 04:59

    You've got your typical company started by ex-software salesmen, where everything is Sales Sales Sales and we all exist to drive more sales.

    On the other extreme you have typical software companies built by ex-programmers. These companies are harder to find because in most circumstances they keep quietly to themselves, polishing code in a garret somewhere, which nobody ever finds, and so they fade quietly into oblivion right after the Great Ruby Rewrite, their earth-changing refactoring-code code somehow unappreciated by The People.

    -- The Developer Abstraction Layer by Joel Spolsky

    Though my natural inclination is to be a bit OCD about keeping code clean, I concede that spending too much time and money on refactoring, writing programmer tools, and endlessly polishing code will likely lead to commercial failure. As will the converse, namely neglecting your developers and their code and architectures in favour of sales and marketing. Successful software companies tend to have a healthy balance.

    Refactoring

    Booking.com, perhaps the most commercially successful Perl-based company, has caused a bit of controversy over the years with their attitude towards refactoring. To give you a flavour, I present a couple of comments below:

    Booking is destroying my career because I am not allowed to do anything new. I am not allowed to use new technologies. I'm not allowed to "design" anything big. I am not allowed to write tests. I am allowed to copy that 500 line subroutine into another module. If people have done that several times before, maybe it should be refactored instead of duplicated? If you do that, you get in trouble. As one boss says, "we do not pay you to write nice code. We pay you to get job done."

    Management, and the term is quite lose when applied to Booking.com, sees no gain in refactoring code. By refactoring I'm talking about taking a few weeks to rewrite an existing piece of software. By definition refactoring doesn't bring new functionality so this is why management is reluctant to go down that road. We're quite lenient about code that gets added to the repo, as long as there's a business reason behind it. If a quick hack can be deployed live and increase conversion then it will be accepted. But rest assured that crappy code doesn't last long, specially if other devs have to use it or maintain it.

    -- from Truth about Booking.com (Blog)

    One of the posts specifically deals with the culture of "get it done and fast" and how they do not encourage refactoring or basic testing. I actually work in a Perl shop where management has the same kind of mentality, and it is slowly killing our efficiency.

    Regarding testing, it's true that we're not very unit testing focused. This is mainly because we've decided to spend most of the time/money/infrastructure that you might usually spend on unit testing on monitoring instead. If you have unit tests you still need monitoring, but in practice if your monitoring is good enough and you have an infrastructure to quickly rollout & rollback systems you can replace much of unit testing with monitoring.

    We're not adverse to refactoring when appropriate. But if you're going to propose rewriting some code here you'll actually have to make a compelling case for it which isn't just "the old code is hairy". Do you actually understand what it does? Maybe it's hairy and complex because it's solving a hairy and complex problem. Are you not aware of where this system fits into the big picture? We've also had code that's looks fantastic, had tests, used lots of best practices that we've had to throw away completely because it was implementing some idea that turned out to be plain stupid.

    -- from What exactly is up with Booking.com? (reddit)

    Opportunistic Refactoring and The Boy Scout Rule

    Some people object to such refactoring as taking time away from working on a valuable feature. But the whole point of refactoring is that it makes the code base easier to work with, thus allowing the team to add value more quickly. If you don't spend time on taking your opportunities to refactor, then the code base gradually degrades and you're faced with slower progress and difficult conversations with sponsors about refactoring iterations.

    There is a genuine danger of going down a rabbit hole here, as you fix one thing you spot another, and another, and before long you're deep in yak hair. Skillful opportunistic refactoring requires good judgement, where you decide when to call it a day. You want to leave the code better than you found it, but it can also wait for another visit to make it the way you'd really like to see it. If you always make things a little better, then repeated applications will make a big impact that's focused on the areas that are frequently visited - which are exactly the areas where clean code is most valuable.

    -- Opportunistic Refactoring (Martin Fowler)

    The Boy Scouts have a rule: "Always leave the campground cleaner than you found it"

    What if we followed a similar rule in our code: "Always check a module in cleaner than when you checked it out"

    -- The Boy Scout Rule (O'Reilly)

    At work, we perform opportunistic refactoring following the Boy Scout rule, trusting the judgement of developers. How do you do it at your workplace?

    Code Reviews

    By way of background, my company went agile about five years ago, at first with great zealotry, nowadays with more maturity and less dogma.

    Before check-in, all code must be reviewed, either continuously via pair programming, or via a lightweight code review (typically over-the-shoulder). We also have a coding standard, though it is not strongly enforced.

    To give a concrete example, during a code review the other day, I persuaded the author to eliminate unnecessary repetition by changing this snippet:

    my $config = <<'GROK'; ADD UDP_LISTENER ( 515 ) ADD UDP_LISTENER ( 616, 657 ) ADD UDP_LISTENER ( 987 ) GROK my @test_cases = ( { desc => "# Test 1", conf => $config, find => [ 'port = 515', 'port = 616', 'port = 657', 'port = 987' + ], }, );
    to:
    my $liststr = 'ADD UDP_LISTENER'; my @ports = ( 515, 616, 657, 987 ); my $config = <<"GROK"; $liststr ( $ports[0] ) $liststr ( $ports[1], $ports[2] ) $liststr ( $ports[3] ) GROK my @test_cases = ( { desc => "# Test 1", conf => $config, find => [ map { "port = $_" } @ports ], }, );

    What would you have done?

    I'm sure some other programmers at my company wouldn't have bothered suggesting any changes at all: after all, the code worked as is, it's pretty clear, plus "it's only a test script", so why bother?

    Though I felt the code was more maintainable with duplication eliminated, I had another motivation in this specific case: training. You see, the programmer in question was very new to Perl and, as I found out during the review, had never used map before! Training (and improved teamwork) are important benefits of code reviews.

    Eliminating unnecessary duplication and repetition is a common discussion topic during code review in my experience. (Note: I did not include this example to argue further about what DRY means exactly in Room 12A :). Other common discussion points during code review are:

    • Commenting.
    • Naming.
    • Clarity vs Cleverness.
    • Encapsulation.
    • Interfaces.
    • Error handling.
    • Testability. Is the code testable in isolation?
    • Supportability.
    • Portability.
    • Security.
    • Performance.
    Note that we do not normally discuss code layout because all code is pushed through Perl::Tidy before review.

    I'm interested to learn about your workplace experiences. In particular:

    • Do you have a coding standard? How is it enforced?
    • Do you do pair programming?
    • Do you do code reviews? Are they heavyweight (e.g. Fagan Inspection) or lightweight (e.g. over-the-shoulder)? Mandatory or optional?
    • What are common discussion points during your code reviews?

    Cleverness

    To finish, here's another one, derived from Clever vs. Readable.

    Would this statement pass your code review?

    my $value = [ $x => $y ] -> [ $y <= $x ];
    If not, would you suggest changing it to:
    my $value = $x < $y ? $x : $y;
    or:
    use List::Util qw(min); my $value = min( $x, $y );
    Or something else?

    References

Empty string but True
1 direct reply — Read more / Contribute
by Yary
on Jan 22, 2015 at 11:30
    I was looking at an old StackOverflow question about generating an explicit null XML namespace and had the thought, "maybe an object that stringifies to the empty string, but is true in every other context, will trick LibXML into doing what the seeker of wisdom required."

    It did not, nor did an earlier attempt using than Scalar::Util's dualvar. Still I liked my little Empty but True module enough to post it here. (Seems too useless/dangerous/not worth the bother to be on CPAN.)

    package MyEmptyTrueVar; our $singleton; use overload fallback => 'TRUE', '""' => sub { "" }, # Return empty string on stringification bool => sub { 1 }, # Return true in boolean context '0+' => sub { 1 }, # Return true in numeric context cmp => sub { !ref $_[1] }; # unequal to empty string, or any other +string bless $singleton=\$singleton;
    usage:
    say "Single str='$MyEmptyTrueVar::singleton', Num=", (0+$MyEmptyTrueVa +r::singleton) if $MyEmptyTrueVar::singleton && $MyEmptyTrueVar::singleton ne '' && '' ne $MyEmptyTrueVar::singleton;
    shows that Single str='', Num=1.
quickness is not so obvious
3 direct replies — Read more / Contribute
by DanBev
on Jan 22, 2015 at 04:00

    Computers can do so many operations per second, so fast, that sometimes are considered omnipotent. But, sometimes an approach a little too direct or dumb can transform our fastest machine in a cart ...

    This is not the discovery of the century, I know, and every good programmer pays attention to the issue, but sometimes I stop to think how is easy to turn a good-performing program in a disaster.

    Just yesterday, I had to check about 10,000 files in a directory from a database of about 20,000 records: the iterating solution a-query-by-file it take an hour; by loading the entire table in a hash and then looking information it take few seconds.
    So far, not the sense of life, but a thing on which I like to meditate.

What 'should' a professional PERL programmer know?
7 direct replies — Read more / Contribute
by perloHolic()
on Jan 09, 2015 at 07:08

    Hello fellow monks and monkesses.

    I am very new to the monastary as far as being a member goes, however have been a frequent part of the 'flock' if you like, for some time.

    My meditation today comes from my recent personal experience of applying for a proffessional role as a PERL programmer - which leads to my pondering, 'what do i have to know/should know in order to be qualified for such a position'? I have read many articles before such as Professional perl which are obviously inciteful in this area.

    I am really looking for specifics such as a minimum amount of required knowledge in certain areas such as OO Perl, or Database management with Perl, Optimisation, Threads or Web programming in Perl etc. I have only self tought knowledge of PERL, and of course with helpful textbooks, tutorials and articles in the monastary have grown this knowledge over time. I don't however, have any professional experience of PERL per say, other than small scripts I have written for my current job as a C language software Engineer.

    SO - I do have a relatively small amount of knowledge in some areas in comparison to, lets say, what I would class as the contrastingly very experienced monks around here, whom provide a vast array of knowledge in most if not all 'Areas' of PERL. I have yet to receive any feedback from my recent application, however 'I' feel like I may be underqualified. It would be very helpful if my fellow peers may be able to offer some incite into 'what the minimum requirements' may be for such a junior PERL programmer role.

    Apologies in advance if my question is unhelpful,in the wrong place, poorly titled or worded.

    Your ever faithfull functioning perloHolic.

RAM: It isn't free . . .
9 direct replies — Read more / Contribute
by sundialsvc4
on Jan 07, 2015 at 11:49

    In the earliest days of digital computing, memory was the most-scarce resource.   (After all, magnetic doughnuts could be made only so small, and they had to be strung upon three tiny wires by hand.)   Thus, when external storage devices – disk and tape and drum – were developed, they were referred to as “mass” storage.   Most of us remember the bad old days, of MS-DOS and even earlier, when programs had to be separated into “overlays” in order to be made to fit into the one-and-only place from which instructions could be executed.

    Well, fast-forward a few decades and “chips are cheap” now.   Gigabytes if not terabytes of semiconductor storage can fit into a space the size of a pencil eraser.   CPU architectures are especially designed to allow gigabytes of storage to be directly addressed.   Developers are eagerly taking advantage of this, because the days of “overlays” and then “virtual memory” (i.e. “paging”) appear to be long-ago and far-away.   After all, RAM has the unique advantage of being instantaneous.   If the data is available in RAM, disk-I/O does not occur.   (And, if the “disk” storage device is made of semiconductors, at least the “seek latency” and “rotational latency” does not happen, even though most semiconductor devices do have a certain form of “latency” of their own.)

    There is a fly in that ointment, however.   RAM capacity is not yet so enormous that concerns about virtual memory can be dismissed outright, especially in production situations where many possibly memory-hungry applications are running on the same box at the same time.   Virtual memory is still with us, and therefore we must be mindful of how to work “with” it and not “against” it, just as we were very-obliged to do in the 1970’s.

    When virtual memory is being used, “in-memory” data structures might involve disk I/O.   As you know, physical RAM is divided into equal-sized chunks called “pages,” and each page might be “resident” in memory or it might be “swapped out.”   When any virtual address is touched, a “page fault” might occur, and if so the process will be delayed until the necessary disk-I/O has been completed.   (And, in order to make room for the page, another page might have to be “stolen” from someone and written out to disk … thus, two or more disk-I/O’s must take place before the faulting process is allowed to proceed.)

    Virtual memory’s success relies upon the assumption that, while page-faults will occur, they will not occur so frequently that the disk-I/O delays add up too much in practice.   The term is “locality of reference,” and it means that programs typically make memory-references in very concentrated groups.   Once a page-fault is satisfied, things should settle-down for a while as the processes continue to refer, again and again, to the page(s) that have most recently been faulted-in.   “Least Recently Used (LRU)” pages, by comparison, are presumed to be good candidates for page-stealing.   The total set of pages that any process requires in order to run without delay, at any instant in time, is referred to as its current “working set” at that instant.

    There is, unfortunately, one data-structure mechanism in particular that flies in the face of “locality of reference,” and therefore of “small and tidy and predictable working-sets.”   That mechanism is:   the hash table.   Perl’s “hashref.”

    Hash tables work by permuting a key across some smaller key-space in order to arrive at a single “bucket” that is searched for the target value.   Hash functions are designed to spread the key values more or less uniformly, but randomly, across the key space.   Thus, the hash structure itself can represent a large working-set (although hash algorithm designers, including Perl’s, do seek to constrain this).   But in any case, the hash buckets also refer, by reference, to outside blocks of memory that are obtained using memory allocation functions e.g. “malloc().”   The memory addresses pointed-to by the (already, large) hash table will, over time, become quite-widely distributed.   And so we have a “very random-access” data structure:   a large hash-table referencing an even larger set of memory blocks whose individual addresses are not predictable.   (A highly volatile very-active data structure becomes less and less predictable as the hours and days go by.   Working-set sizes increase quickly.)

    (Perl’s designers know their stuff.   They know about these issues and carefully designed an industrial-strength system for all of us to enjoy.   We are well cared-for ... but the issues are still there, and, by definition, always will be.)

    Working-set sizes become very large, then.   So, what actually happens when such an application enters service in a production machine that’s using virtual memory?   Unfortunately, it becomes a million-pound elephant … using, shall we say, far more than its fair share of RAM.   A disproportionately large amount relative to the others.   And therefore, both a source of virtual-memory pressure and(!) an especially vulnerable victim of it.   If such a program is to run efficiently (as it was specifically designed to do), it must have “all that RAM.”   But, if it gets what it wants (and must have), the other processes can’t get (and keep) theirs.   Paging activity begins to increase, as does the number of processes that are stuck in page-wait and the frequency that each process is stuck in page-wait.   At a certain point, the processing grinds to a halt.   It “hits the wall.”   It is “thrashing.”   The offending application is especially taking it in the shorts ... being especially big and especially vulnerable, it is “punch-drunk.”   But it’s not the only one.   (If there were any subtle timing-related bugs in this or any other application, this is the time when those kinds of problems will really start to show up.)

    Given that, in a virtual memory setting, any “memory” reference can result in “disk” I/O, “memory” must in fact be treated as a “disk.”   Each memory-access, especially any access that might be widely-dispersed from other recent ones, must be considered as possibly taking several milliseconds to occur; not the microseconds or nanoseconds that are usually bantered-about by programmers who like to use the “time” command and discuss the third or fourth digit to the right of the decimal point.

    Software developers usually don’t experience these things personally when designing their software:   their systems are the biggest, fastest, and especially the fattest of all.   They’ve got two or three large monitors.   Multiple processors.   Gigs of RAM.   As much as the motherboard will support.   In short, a situation altogether unlike the rack mounted boxes where their brainchildren will labor out their appointed business duties.

    To run well, and to do so round-the-clock for days and weeks on end, all applications must be good virtual-memory citizens.   Whether their virtual memory allocations be large or small, their working-set sizes must be small … by design.   There are many possible ways to do that:   storing multiple entries in a single large structure rather than in many small ones; “batching” requests for even in-memory data stores; and, using disk-I/O directly instead of implicitly (as virtual-memory actually does).   All operating systems buffer disk-I/O operations, filling all available RAM with buffers but managing those buffers differently than they do VM page-frames.

    Probably the very worst thing that can happen to your program’s design is for it to be very splendid on your machine, but unworkable in production … or even, “a pain in the asterisk in production.”   This requires thinking of RAM as being “a thing that is not without-delay,” from the very earliest stages of your designs.

    HTH …

Dating .tar Archives
1 direct reply — Read more / Contribute
by oiskuu
on Dec 30, 2014 at 13:32

    Presenting two ways to skim tar format files: via direct parsing and using the specific module.

    The file date of an archive is useful to keep around for chronological listings, or determining its age at a glance. It is however often times lost as the files get downloaded, copied or moved. An obvious fix is to reset the date to that of the most recent member contained within. And a script to this end is what I implemented, years ago. If there is or was a proper tool for that already, I wouldn't know.

    But old TODOs came to my attention again recently. What better time to clean up some old code, perl-based and all? In particular, there was this bit to decompress the files with an external utility:

    ... ? ... : $file =~ /bz2$/i ? open($fh, '-|', 'bzcat', '--', $file) : open($fh, '<', $file);
    DOS-like logic, based on file suffix? Very un-unix and un-cool. IO::Uncompress::AnyUncompress to the rescue!

    Minutes later, there it is — the version II — shorter and neater by a fair bit.

    Hacking on them tar headers is entertaining for sure, but let's try Archive::Tar now — a module purpose-built for tasks like that. And behold: the version III.

    #! /usr/bin/perl # Usage: $0 [-z] [-n] files ... # -z also check gzip archive time # -n don't actually touch, show what would be done use Getopt::Std; getopts('zn', \my %opt); use List::Util q(max); use IO::Uncompress::AnyUncompress; use Archive::Tar; $Archive::Tar::WARN = 0; foreach (grep -f, @ARGV) { my $fh = new IO::Uncompress::AnyUncompress($_) or next; my $zt = $opt{z} && ($fh->getHeaderInfo||{})->{Time} || 0; my $tt = max map $_->{mtime}, Archive::Tar->list_archive($fh, +0, [q(mtime)]); my $t = max $zt, $tt; $t && ($t != (stat)[9]) && ($opt{n} || utime($t, $t, $_)) && printf "%-60s %s (%s)\n", $_, scalar localtime($t), ($t == $tt) ? q(tar time) : q(zip time); }
    One-third of the previous size! Cut loose the reporting, the gzip-time foo, and we'd arrive at a one-liner territory. But this brevity has some gotchas. Let's see:
    • Lots of memory is consumed reading big archives. Apparently list_archive method reads the uncompressed data in full. Is there no "metadata-only" flag one could peruse?
    • Another thing, list_archive has special cased the [q(name)] to return a flat list instead of hashes. Why not support both [qw(...)] and q(item) requests? Then one might simply write:
      my $t = max Archive::Tar->list_archive($file, 1, "mtime");
    • The lzma/xz modules need to be installed separately for those to work. Release 5.20.1 does not (yet?) include them.

    Giving it a second glance, the original script seems to do fine as it was. Some TODOs may stay a while longer, I think.

win32 exit code after a crash guide
2 direct replies — Read more / Contribute
by bulk88
on Dec 28, 2014 at 22:25
    Windows Perl doesn't have SIGSEGV or SIGILL or SIGBUS. This makes diagnosing a non-local crashed process very difficult. For example, on one of those CPANTesters boxes, you see
    t/middleware/connect.t ....... Dubious, test returned 5 (wstat 1280, 0x500) All 1 subtests passed Free to wrong pool 2e8aa90 not 6c4040 at C:/strawberry-perl-5.12.2.0/p +erl/site/lib/AnyEvent/HTTP.pm line 1083, <> line 6. t/middleware/loadbalancer.t .. Dubious, test returned 5 (wstat 1280, 0x500) No subtests run Free to wrong pool 2e145e0 not 34f90 at C:/strawberry-perl-5.12.2.0/pe +rl/site/lib/AnyEvent/HTTP.pm line 1083, <> line 6. t/middleware/rewrite.t ....... Dubious, test returned 5 (wstat 1280, 0x500) No subtests run
    What on earth is 0x500? If you do the perl on Unix routine of 0x500 >> 8, you get 5. What is 5?

    Referring to errno.h
    #define EIO 5
    C:\perl521\bin>perl -E"$! = 5; say $!" Input/output error
    If I didn't tell you it is already is a crash, you would have thought the perl app did "exit($!);".

    The answer is, the bytes selected by the mask 0xFF00, after the child crashed, are truncated NTSTATUS codes AKA EXCEPTION_* codes. I wrote a test script which shows what all the common Win32 crashes look like.
    C:\perl521\srcnewb4opt>perl -Ilib crashtest.pl disable_interrupts $? 9600 CHILD_ERROR_NATIVE 9600 illegal_instruction $? 1d00 CHILD_ERROR_NATIVE 1d00 deref_null $? 500 CHILD_ERROR_NATIVE 500 deref_neg1 $? 500 CHILD_ERROR_NATIVE 500 write_to_ro_mem $? 500 CHILD_ERROR_NATIVE 500 div_by_0 $? 9400 CHILD_ERROR_NATIVE 9400 call_c_debugger $? 300 CHILD_ERROR_NATIVE 300 C:\perl521\srcnewb4opt>
    0x96 = 0xC0000096 STATUS_PRIVILEGED_INSTRUCTION, valid machine op, but only allowed in kernel mode, not user mode

    0x1D = 0xC000001D STATUS_ILLEGAL_INSTRUCTION, this machine op doesn't exist on this CPU, you are probably trying to execute data pointer/garbage as a C function, without DEP

    0x5 = 0xC0000005 STATUS_ACCESS_VIOLATION, SEGV, bad address

    0x94 = 0xC0000094 STATUS_INTEGER_DIVIDE_BY_ZERO

    0x3 = 0x80000003 STATUS_BREAKPOINT explicit software call to C debugger, notice this code starts with 0x8, not 0xC, 0xC0000003 is STATUS_INVALID_INFO_CLASS, which means bad parameter to a function call, and will never cause an exception/crash



    Code used to generate above.

The Top Ten Perl Poems
2 direct replies — Read more / Contribute
by eyepopslikeamosquito
on Dec 26, 2014 at 02:04

    Following on from The Top Ten Perl Obfus, let's count down the top ten highest rated Perl Monks poems of all time.

    Since I cannot super-search by node reputation, please note that this list is based only on some non-exhaustive searching and my fallible memory. So it's quite likely I've overlooked a deserving poem. If so, please let us know, and I'll correct the root node. Note that, to make the top ten, a poem needs a reputation of at least 120.

    That said, please feel free to mention any poem you consider deserving of a wider audience, even if it does not meet the formal reputation criteria. For example, I'd like to recognize and congratulate liverpole for pulling off a brilliant stunt of posting a poem entitled 600000 nodes as the 600000th PerlMonks node!

    Unlike obfus, I discovered the top ten qualification criteria for poetry is not so clear-cut. For example, what many folks believe to be the finest Perl Monk node of all time, namely 1st Monasterians by Erudil, was posted not as a poem, but a meditation. Though somewhat poetic, I judged that this node did not qualify because it was not a Perl poem and was not posted in the Perl Poetry section. Curiously, a response to this node, namely Re: 1st Monasterians by japhy, did qualify because, though it too was not posted in the Poetry section, it was definitely a Perl poem. Conversely, though posted in the Perl Poetry section, I chose to disqualify Aaah, spring (A Very Special Perlmonks Contest) by boo_radley because it was a poetry competition, rather than a poem. Admittedly, these decisions were somewhat arbitrary, and someone else may have decided differently.

    Now to work.

    No 10: Stayin' Alive (with CPAN) by joecamel Feb 05 2004 rep:120+

    Well, you can tell by the way I use File::Lock
    I'm a Perl Monk: no time to talk
    Got DBI and Test::More,
    been reusin' code since version four

    You know it's all right. It's okay.
    With GD::Graph and Class::Flyweight.
    We don't have time to reinvent
    so we get it from CPAN.

    Whether you're a hacker or whether you're a slacker
    You're stayin' alive, stayin' alive.
    net communicatin' and input validatin',
    And we're stayin' alive, stayin' alive.
    Ah, ha, ha, ha, stayin' alive, stayin' alive.
    Ah, ha, ha, ha, stayin' alive.
    ...

    To the tune of Stayin' Alive by the Bee Gees.

Authentication with U2F Two-factor keys
No replies — Read more | Post response
by cavac
on Dec 19, 2014 at 07:43

    NOTE/EDIT: Package name will change to Crypt::U2F::Server (and Crypt::U2F::Server::Simple), because there will also be a client module to access the key itself.

    I just uploaded the first Alpha version of Crypt::U2F, which allows you to handle the server side cryptography of the FIDO alliance's Universal 2nd factor authentication method. See also here.

    This is the same one used by Google services and fully supported in Google Chrome.

    Internally, Crypt::U2F requires Yubico's libu2f-server library installed on your system. I implemented this in two Perl modules: Crypt::U2F is the low level module (sand subject to change), that let's you play around with the underlying library. Crypt::U2F::Simple is the one you should use in most cases.

    Let's have a look into the two examples provided with the tarball. For this to work, you need to install libu2f-server and also install libu2f-host, because we need the u2f-host binary to talk to the actual USB dongle. (I'm currently in the process of making a Perl module for libu2f-host as well, but this will only finish after the hollidays.)

    The whole thing is a two part process: First you have register a new key once, then you can authenticate as often as you like. Each part (registering, authentication) itself is a two-part process as well, first you generate a challenge and send it to the client, then you have to validate the response.

    Ok, let's start with registering a key. For this example, we pass around files to and from u2f-host and also save the registered keyHandle and public key into files as well. In a real world scenario, you will probably use HTTP and Javascript to communicate with the key and save keyHandle and the public key into a database. Here's the code:

    The reason we use Base64 is simple, yet annoying: Everything except the public key is either some sort of text or even ASCII JSON. The public key on the other hand is a binary blob. It's just a matter of convenience to turn it into Base64, because that we it works in textfiles and text columns in databases as well. It don't convert directly in the library, because that might make it problematic to cooperate with other implementations of U2F authentications that also use the original C library (which delivers a binary blob), including the u2f-server example binary that comes with it.

    All of the calls to Crypt::U2F::Simple may fail for one reason or another (including new() and DESTROY()), so make sure you check all the return values!

    Let's tackle the authentication part. We'll use the keyHandle.dat and publicKey.dat generated in the previous step:

    As you can see, the process is quite similar: We load the keyHandle.dat and publicKey.dat (the second one we decode_base64()) and initialize Crypt::U2F::Simple with it. Then we generate a challenge and verify it.

    If you want to make sure the verification step actually works, you can comment out the call can try to fuss the result of u2fhost in authReply.dat. Or just comment out the call to u2fhost after you you did one successfull authentication, this one should give you a u2fs_authentication_verify (-6): Challenge error.

    Limitations and Bugs: Currently (Version 0.10), each Challenge/Verify combo has to run in the same instance of the module. I'm still working on finding out how to fix that. Also, sometimes the USB keyfob seems to be in a strange state after plugging in, returning wrongly calculated authentication replies (at least mine does). Unplugging and replugging solves that problem.

    "For me, programming in Perl is like my cooking. The result may not always taste nice, but it's quick, painless and it get's food on the table."

Add your Meditation
Title:
Meditation:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":


  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others surveying the Monastery: (4)
    As of 2015-03-05 02:40 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      When putting a smiley right before a closing parenthesis, do you:









      Results (134 votes), past polls