Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Meditations

( #480=superdoc: print w/ replies, xml ) Need Help??

If you've discovered something amazing about Perl that you just need to share with everyone, this is the right place.

This section is also used for non-question discussions about Perl, and for any discussions that are not specifically programming related. For example, if you want to share or discuss opinions on hacker culture, the job market, or Perl 6 development, this is the place. (Note, however, that discussions about the PerlMonks web site belong in PerlMonks Discussion.)

Meditations is sometimes used as a sounding-board — a place to post initial drafts of perl tutorials, code modules, book reviews, articles, quizzes, etc. — so that the author can benefit from the collective insight of the monks before publishing the finished item to its proper place (be it Tutorials, Cool Uses for Perl, Reviews, or whatever). If you do this, it is generally considered appropriate to prefix your node title with "RFC:" (for "request for comments").

User Meditations
Data-driven Programming: fun with Perl, JSON, YAML, XML...
6 direct replies — Read more / Contribute
by eyepopslikeamosquito
on Apr 19, 2015 at 04:41

    The programmer at wit's end for lack of space can often do best by disentangling himself from his code, rearing back, and contemplating his data. Representation is the essence of programming.

    -- from The Mythical Man Month by Fred Brooks

    Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

    -- Rob Pike

    As part of our build and test automation, I recently wrote a short Perl script for our team to automatically build and test specified projects before checkin.

    Lamentably, another team had already written a truly horrible Windows .BAT script to do just this. Since I find it intolerable to maintain code in a language lacking subroutines, local variables, and data structures, I naturally started by re-writing it in Perl.

    Focusing on data rather than code, it seemed natural to start by defining a table of properties describing what I wanted the script to do. Here is a cut-down version of the data structure I came up with:

    # Action functions (return zero on success). sub find_in_file { my $fname = shift; my $str = shift; my $nfound = 0; open( my $fh, '<', $fname ) or die "error: open '$fname': $!"; while ( my $line = <$fh> ) { if ( $line =~ /$str/ ) { print $line; ++$nfound; } } close $fh; return $nfound; } # ... # -------------------------------------------------------------------- +---- # Globals (mostly set by command line arguments) my $bldtype = 'rel'; # -------------------------------------------------------------------- +---- # The action table @action_tab below defines the commands/functions # to be run by this program and the order of running them. my @action_tab = ( { id => 'svninfo', desc => 'svn working copy information', cmdline => 'svn info', workdir => '', logfile => 'minbld_svninfo.log', tee => 1, prompt => 0, run => 1, }, { id => 'svnup', desc => 'Run full svn update', cmdline => 'svn update', workdir => '', logfile => 'minbld_svnupdate.log', tee => 1, prompt => 0, run => 1, }, # ... { id => "bld", desc => "Build unit tests ${bldtype}", cmdline => qq{bldnt ${bldtype}dll UnitTests.sln}, workdir => '', logfile => "minbld_${bldtype}bldunit.log", tee => 0, prompt => 0, run => 1, }, { id => "findbld", desc => 'Call find_strs_in_file', fn => \&find_in_file, fnargs => [ "minbld_${bldtype}bldunit.log", '[1-9][0-9]* errors +' ], workdir => '', logfile => '', tee => 1, prompt => 0, run => 1, } # ... );

    Generally, I enjoy using property tables like this in Perl. I find them easy to understand, maintain and extend. Plus, a la Pike above, focusing on the data first usually makes the coding a snap.

    Basically, the program runs a specified series of "actions" (either commands or functions) in the order specified by the action table. In the normal case, all actions in the table are run. Command line arguments can further be added to specify which parts of the table you want to run. For convenience, I added a -D (dry run) option to simply print the action table, with indexes listed, and a -i option to allow a specific range of action table indices to be run. A number of further command line options were added over time as we needed them.

    Initially, I started with just commands (returning zero on success, non-zero on failure). Later "action functions" were added (again returning zero on success and non-zero on failure).

    As the table grew over time, it became tedious and error-prone to copy and paste table entries. For example, if there are four different directories to be built, rather than having four entries in the action table that are identical except for the directory name, I wrote a function that took a list of directories and returned an action table. None of this was planned, the script just evolved naturally over time.

    Now is time to take stock, hence this meditation.

    Coincidentally, around the same time as I wrote my little script, we inherited an elaborate testing framework that specified tests via XML files. To give you a feel for these, here is a short excerpt:

    <Test> <Node>Muss</Node> <Query>Execute some-command</Query> <Valid>True</Valid> <MinimumRows>1</MinimumRows> <TestColumn> <ColumnName>CommandResponse</ColumnName> <MatchesRegex row="0">THRESHOLD STARTED.*Taffy</MatchesRegex> </TestColumn> <TestColumn> <ColumnName>CommandExitCode</ColumnName> <Compare function="Equal" row="0">0</Compare> </TestColumn> </Test>

    Now, while I personally detest using XML for these sorts of files, I felt the intent was good, namely to clearly separate the code from the data, thus allowing non-programmers to add new tests.

    Seeing all that XML at first made me feel disgusted ... then uneasy because my action table was embedded in the script rather than more cleanly represented as data in a separate file.

    To allow my script to be used by other teams, and by non-programmers, I need to make it easier to specify different action tables without touching the code. So I seek your advice on how to proceed:

    • Encode the action table as an XML file.
    • Encode the action table as a YAML file.
    • Encode the action table as a JSON (JavaScript Object Notation) file.
    • Encode the action table as a "Perl Object Notation" file (and read/parse via string eval).
    • Turn the script and action table/s into Perl module/s.

    Another concern is that when you have thousands of actions, or thousands of tests, a lot of repetition creeps into the data files. Now dealing with repetition (DRY) in a programming language is trivial -- just use a function or a variable, say -- but what is the best way of dealing with unwanted repetition in XML, JSON and YAML data files? Suggestions welcome.

    References

Effective Automated Testing
No replies — Read more | Post response
by eyepopslikeamosquito
on Apr 18, 2015 at 04:50

    I'll be giving a talk at work about improving our test automation. Initial ideas are listed below. Feedback on talk content and general approach are welcome along with any automated testing anecdotes you'd like to share. Possible talk sections are listed below.

    Automation Benefits

    • Reduce cost.
    • Improve testing accuracy/efficiency.
    • Regression tests ensure new features don't break old ones. Essential for continuous delivery.
    • Automation is essential for tests that cannot be done manually: performance, reliability, stress/load testing, for example.
    • Psychological. More challenging/rewarding. Less tedious. Robots never get tired or bored.

    Automation Drawbacks

    • Opportunity cost of not finding bugs had you done more manual testing.
    • Automated test suite needs ongoing maintenance. So test code should be well-designed and maintainable; that is, you should avoid the common pitfall of "oh, it's only test code, so I'll just quickly cut n paste this code".
    • Cost of investigating spurious failures. It is wasteful to spend hours investigating a test failure only to find out the code is fine, the tests are fine, it's just that someone kicked out a cable. This has been a chronic nuisance for us, so ideas are especially welcome on techniques that reduce the cost of investigating test failures.
    • May give a false sense of security.
    • Still need manual testing. Humans notice flickering screens and a white form on a white background.

    When and Where Should You Automate?

    • Testing is essentially an economic activity. There are an infinite number of tests you could write. You test until you cannot afford to test any more. Look for value for money in your automated tests.
    • Tests have a finite lifetime. The longer the lifetime, the better the value.
    • The more bugs a test finds, the better the value.
    • Stable interfaces provide better value because it is cheaper to maintain the tests. Testing a stable API is cheaper than testing an unstable user interface, for instance.
    • Automated tests give great value when porting to new platforms.
    • Writing a test for customer bugs is good because it helps focus your testing effort around things that cost you real money and may further reduce future support call costs.

    Adding New Tests

    • Add new tests whenever you find a bug.
    • Around code hot spots and areas known to be complex, fragile or risky.
    • Where you fear a bug. A test that never finds a bug is poor value.
    • Customer focus. Add new tests based on what is important to the customer. For example, if your new release is correct but requires the customer to upgrade the hardware of 1000 nodes, they will not be happy.
    • Documentation-driven tests. Go through the user manual and write a test for each example given there.
    • Add tests (and refactor code if appropriate) whenever you add a new feature.
    • Boundary conditions.
    • Stress tests.
    • Big ones, but not too big. A test that takes too long to run is a barrier to running it often.
    • Tools. Code coverage tools tell you which sections of the code have not been tested. Other tools, such as static (e.g. lint) and dynamic (e.g. valgrind) code analyzers, are also useful.

    Test Infrastructure and Tools

    • Single step, automated build and test. Aim for continuous integration.
    • Clear and timely build/test reporting is essential.
    • Quarantine flaky failing tests quickly; run separately until solid, then return to main build. No broken windows.
    • Make it easy to find and categorize tests. Use test metadata.
    • Integrate automated tests with revision control, bug tracking, and other systems, as required.
    • Divide test suite into components that can be run separately and in parallel. Quick test turnaround time is crucial.

    Design for Testability

    • It is much easier/cheaper to write automated tests for systems that were designed with testability in mind in the first place.
    • Interfaces Matter. Make them: consistent, easy to use correctly, hard to use incorrectly, easy to read/maintain/extend, clearly documented, appropriate to audience, testable in isolation.
    • Dependency Injection is perhaps the most important design pattern in making code easier to test.
    • Mock Objects are also frequently useful and are broader than just code. For example, I've written a number of mock servers in Perl (e.g. a mock SMTP server) so as to easily simulate errors, delays, and so on.
    • Consider ease of support and diagnosing test failures during design.

    Test Driven Development (TDD)

    • Improved interfaces and design. Especially beneficial when writing new code. Writing a test first forces you to focus on interface. Hard to test code is often hard to use. Simpler interfaces are easier to test. Functions that are encapsulated and easy to test are easy to reuse. Components that are easy to mock are usually more flexible/extensible. Testing components in isolation ensures they can be understood in isolation and promotes low coupling/high cohesion.
    • Easier Maintenance. Regression tests are a safety net when making bug fixes. No tested component can break accidentally. No fixed bugs can recur. Essential when refactoring.
    • Improved Technical Documentation. Well-written tests are a precise, up-to-date form of technical documentation.
    • Debugging. Spend less time in crack-pipe debugging sessions.
    • Automation. Easy to test code is easy to script.
    • Improved Reliability and Security. How does the code handle bad input?
    • Easier to verify the component with memory checking and other tools (e.g. valgrind).
    • Improved Estimation. You've finished when all your tests pass. Your true rate of progress is more visible to others.
    • Improved Bug Reports. When a bug comes in, write a new test for it and refer to the test from the bug report.
    • Reduce time spent in System Testing.
    • Improved test coverage. If tests aren't written early, they tend never to get written. Without the discipline of TDD, developers tend to move on to the next task before completing the tests for the current one.
    • Psychological. Instant and positive feedback; especially important during long development projects.

    References

the sorry state of Perl unit testing framework
6 direct replies — Read more / Contribute
by bulk88
on Apr 07, 2015 at 02:51
    Updated: Test::Tiny was benchmarked and analyzed, Test::More alpha release 1.301001_101 with new backend tried

    Soon I will be going to QAH 2015 in Berlin. My number one goal is to get parallel testing for TAP::Harness working on Win32 Perl but this post isn't about parallel testing. TAP::Harness stymied me many times trying to touch the codebase. It is militantly OOP, and obfuscated with its own implementation of method dispatch, and is written in a "declarative language" similar to a makefile. It has layers of faux-abstraction, that superficially purports to support pluggability, yet doesn't allow anything but the current implementation to fit the abstraction layers. I belive it is a pedagogical exercise in a ivory tower, which in plain terms, it is a homework assignment for appealing to a professor's ego that shows off all the skills itemized on your syllabus.

    Researching the bloated design of TAP::Harness also brings me to investigate the other side of the TAP connection, Test::Simple/Test::More. It is equally inefficient I discovered.

    All these tests were done a 2.6 ghz 2 core machine, running 32 bit WinXP. The primary tool I use for benchmarking in this post is called timeit.exe. It is a simples times() like benchmark tool that asks the NT kernel for its counters after the process exits. The resolution of these counters is 15.625 ms but the workloads I use to benchmark take seconds or minutes to complete, so 15.625 ms resolution isn't an issue.

    The workload I use is always 1 million tests, the tests are run by GenTAP.pm, which is from http://github.com/bulk88/Win32-APipe. GenTAP.pm and fastprint.t, fasttinyok.t and fastok.t should be portable and run on any Perl platform if you want to try reproducing these benchmarks yourself.

    I refrained from using nytprof in this write up since nytprof has overhead, and questioning individual subs, and lines of code, is pointless, if slowness is a conscious systemic design rule, not a couple bad drive-by patches over the years. The output is redirected to a file, since this way there is no overhead of the Win32 console in writing to STDOUT.

    Test::More

    fastok.t calls Test::More's ok() from version 1.001014, 1 million times in a loop with each test always passing, and a randomly randomized test name.
    timeit perl t\t\fastok.t > t.txt Version Number: Windows NT 5.1 (Build 2600) Exit Time: 2:11 am, Saturday, April 4 2015 Elapsed Time: 0:02:00.671 Process Time: 0:01:59.234 System Calls: 3664127 Context Switches: 320686 Page Faults: 948528 Bytes Read: 3339101868 Bytes Written: 100048020 Bytes Other: 73765993

    which leads to 0.000119 seconds to do 1 ok() call, or 0.1 milliseconds. This is signifigant, for each 10000 tests, 1 second of overhead. It also means, you can't run more than 10000 tests per second on per core no matter what you do. How often do you run "make test" and wait for it to finish and it feels like filling a gas tank, how slow is travis or whatever CI solution you use?

    If your unit testing consists of code generated permutations of parameters to your module, 10,000 tests or 100,000 is very easily reachable on 1 software project/module. Some very popular CPAN modules do this style of code generated permutations of tests.

    Now what could be the fastest possible TAP generation? "type file_of_tap.txt" or "cat file_of_tap.txt" is cheating. Under the same conditions, which is the best case scenario for the overhead of test to compare Test::More against, is a simple "print "ok ".(++$counter)." - $testname\n";" in a loop instead of "ok(1, $testname)", which is what fastprint.t does.
    timeit perl t\t\fastprint.t > t.txt Version Number: Windows NT 5.1 (Build 2600) Exit Time: 2:19 am, Saturday, April 4 2015 Elapsed Time: 0:00:02.156 Process Time: 0:00:02.109 System Calls: 21413 Context Switches: 6818 Page Faults: 14663 Bytes Read: 270297 Bytes Written: 60606608 Bytes Other: 6971963


    0.0000021 seconds per the DIY ok().

    0.000119/0.0000021 57x more time. FIFTY SEVEN times more CPU. To summarize, if you use Test::More, you might as well imagine there is an gig ethernet cable and a UDP socket between your TAP emitter and TAP consumer.

    Now about memory use of Test::More. I modified fastok.t as such
    unshift(@INC, '.'); require 't/t/GenTAP.pm'; require Test::More; #load but dont call anything in Test::More #we want runtime mem overlead, not loadtime, otherwise the require wil +l happen #inside GenTAP if it isn't done here system('pause');#sample memory GenTAP(0, 0, 'ok', 1000000); system('pause');#sample memory

    before 3,828KB, after 397,828 KB, peak 397,840 KB.

    (397840-3828)/1000000=0.394012 KB per test emitted. 400 bytes per test. What on earth is in those 400 bytes? My test name passed to T::M::ok() is always 42 bytes long. Lets round that to the next 16 byte win32 malloc boundary, 48+12(perl's win32 ithread malloc wrapper in vmem.h)+16(sv head)+8(svpv body)+4(SV * somewhere else)=88 bytes for storing the test name. Where did the other 300 bytes go? Why is Test::More saving the names of passing tests? showing off your skills in LOC per hour for your CV? writing job-for-life unmaintable code? The TAP parser is responsible for maintain TAP state, not the TAP emitter. A TAP emitter should have no memory increase between sucessive calls to ok().

    Test::More has no competitors on CPAN except for Test::Tiny which makes no attempt at API compatibility but has a similar ok() sub. So using fasttinyok.t, which calls Test::Tiny's ok() sub 1 million times, I get
    timeit perl t\t\fasttinyok.t > t.txt Version Number: Windows NT 5.1 (Build 2600) Exit Time: 7:42 pm, Tuesday, April 7 2015 Elapsed Time: 0:00:05.218 Process Time: 0:00:05.140 System Calls: 49612 Context Switches: 24005 Page Faults: 17396 Bytes Read: 146216 Bytes Written: 57859498 Bytes Other: 17639334
    Test::Tiny's ok() is (5.140/2.109=2.437) 2.4x slower than my ideal DIY ok() implementation. which compared to Test::More's 57x slower, 2.4x is a rounding error. Test::Tiny is a real working CPAN module remember.

    About Test::Tiny's memory usage, using the same breakpoint positions, before 3,008KB, after 3,028KB, 3028-3008=20 KB. 20000/1000000=0.02 bytes per test, which means unmeasurable small. Basically no increase in memory usage, unlike the 100s of MBs seen with Test::More.

    Even a drop in replacement for Test::More that is 10x slower than the ideal DIY ok() implementation above, or in other words 4x slower than Test::Tiny, is still 5x times faster than Test::More. Just about anything is faster than a sloth pulling a wagon. Something needs to be done about Test::More, the entire perl community relies on it, and it is unworkably slow. Either a drop in replacement, or replacing all of the internals of Test::More with a simplied architecture.

    I was told to try a alpha release (1.301001101) of Test::More which included a new backend which hoped to improve its performance. I will therefore benchmark it.
    timeit perl t\t\fastok.t > t.txt Version Number: Windows NT 5.1 (Build 2600) Exit Time: 11:37 pm, Tuesday, April 14 2015 Elapsed Time: 0:02:10.859 Process Time: 0:02:09.375 System Calls: 2399091 Context Switches: 238722 Page Faults: 543275 Bytes Read: 3031284 Bytes Written: 103643867 Bytes Other: 87114348
    The results are bad. 10 seconds more old stable 1.001014 Test::More, or 9% slower.

    TAP::Harness

    Onto the TAP consumer, TAP::Harness. For the next example, remember fastprint.t takes 2 seconds of CPU to print its 1 million tests. I dont think fastprint.t's process time is included by timeit.exe tool but with the numbers shown, 2 seconds is a rounding error if it is included.

    C:\sources\Win32-APipe>timeit C:\perl521\bin\perl.exe "-MExtUtils::Com +mand::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0 +, 'blib\li b', 'blib\arch')" t\t\fastprint.t t\t\fastprint.t .. ok All tests successful. Files=1, Tests=1000000, 66 wallclock secs (65.86 usr + 0.13 sys = 65. +98 CPU) Result: PASS Version Number: Windows NT 5.1 (Build 2600) Exit Time: 3:00 am, Saturday, April 4 2015 Elapsed Time: 0:01:06.406 Process Time: 0:01:06.203 System Calls: 483839 Context Switches: 182749 Page Faults: 78950 Bytes Read: 62566241 Bytes Written: 53961404 Bytes Other: 11595189 C:\sources\Win32-APipe>

    (60+6)/1000000 = 0.000066 seconds for TAP::Harness to process 1 TAP test. It is better than Test::More, with TAP::Harness taking, for parsing 1 test, 55% of the time it takes, for Test::More to emit 1 test.

    Now about the memory usage of TAP::Harness. Checking the process memory with Task Manager with a breakpoint right before "test_harness" sub is called, shows 5,868 KB, Windows OS shows the process peaked at 106,368 KB, and a breakpoint right after test_harness sub, shows 96,636 KB. There are 2 memory problems here that need to be broken down.

    Problem 1, TAP::Harness uses about 100 bytes of memory for each *passing* test ((106368000-5868000)/1000000). The internal state of tests results isn't a sparse array or linked list of failed tests, or a vec() or C bitfield, heck, it isn't even a @array with undef/uninit slices for successful tests, which would be 4 bytes per test on a 32bit OS. It is 100 bytes per passing test. What is 100 bytes? 100/4 is 25 pointers/slice members. For reference, each scalar you create, on 32 bit OSes, is 4 slice members. And I will guess it uses a blessed hash with 1 or 2 hash keys for each test without even looking at TAP::Harness's implementation.

    Problem 2, in the breakpoint, after test_harness() was executed, memory dropped from 106,368 KB to 96,636 KB. Only about 10MB. What is inside the 96,636-5,868=90,768KB?

    Here is the console log with the breakpoints (the "Press any key to continue . . ." lines) to show where memory was sampled.

    C:\sources\Win32-APipe>C:\perl521\bin\perl.exe "-MExtUtils::Command::M +M" "-MTest ::Harness" "-e" "undef *Test::Harness::Switches; system 'pause'; test_ +harness(0, 'blib\lib', 'blib\arch'); system 'pause';" t\t\fastprint.t Press any key to continue . . . t\t\fastprint.t .. ok All tests successful. Files=1, Tests=1000000, 65 wallclock secs (65.05 usr + 0.11 sys = 65. +16 CPU) Result: PASS Press any key to continue . . . C:\sources\Win32-APipe>
    Notice
    All tests successful. Files=1, Tests=1000000, 65 wallclock secs (65.05 usr + 0.11 sys = 65. +16 CPU) Result: PASS

    was already printed, so what do those 90 MB contain? Why is TAP::Harness holding onto state after printing the final line? surely this can't all be malloc fragmentation preventing a release of memory? Or was TAP::Harness written to leak memory since "the process will exit soon, dont waste CPU freeing anything", perhaps a legitimate reason? I doubt that was the intention of the person who designed TAP::Harness's OOP API.

    In combination, TAP::Harness+Test::More take 0.185 ms per ok() call, of overhead. is() which is more common than ok(), will probably take even more than 0.185ms, so 0.185 ms per test, is the current best case scenario using the existing unit testing framework.

    Solutions

    Test::More

    Rewrite Test::Simple and Test::More back into standalone modules like they were 20 years ago, and remove their usage of Test::Builder would be best, but that requires Test::More's authors and maintainers to agree that the code is deeply flawed and agree to replacing it.

    Summarizing some code in Test::Builder, do we really need to ever implement this, anywhere?
    sub in_todo { my($todo, $caller); $todo = defined((caller++$caller)[0])?(caller$caller)[0]->can('is_ +todo')?(caller$caller)[0]->is_todo?1:undef:undef:0 until defined $tod +o; $todo; }

    Since I expect signifigant protests from the peoples whoses CVs depend on protecting their precious snowflakes (see this incredulous post https://github.com/Test-More/test-more/issues/252 from 2012, leaking memory is by design and will never change by the then author), a drop in replacement for Test::More under a different namespace and patching dists away from T::M/T::B, and removing T::M/T::B from Perl 5 core is probably the easiest way forward.

    TAP::Harness

    Rewriting TAP::Harness from scratch is probably the only solution, since a couple 3rd party modules are crazy enough to integrate themselves, like TAP::Harness::Archive. The typical "make test" has no use for TAP::Harness OOP bloat, with the only 2 options being TEST_VERBOSE on or off, and parallel or not.

    I have done nytprof-ing of TAP::Harness, but nothing is fixable there without admitting all the design rules are a list of what not to do.

    A simple design for a new harness would be, a TAP source class (usually a process class) that returns a stream class. The stream class returns a string name() (filename of .t, or a http:// URL of TAP or a disk path of TAP), and returns multi-KB blocks and eventually undef as EOS indicator, just 2 methods. For passing tests store nothing (undef), or store a "sparse" range of passing tests unblessed objects, store failed tests and unknown lines and diag in a linked list for dumping/summing at the end. Rewrite the parser in XS to quickly search for newline and "not ok" and "ok" token, maybe use BM algo. Even for a PP version, index and substr for a 1 pass parser through the block. For passed tests, store nothing. If a TAP stream has all passing tests, and reaches the end of stream, all the passing tests are represented by 1 hash with 2 keys (start range, end range). This is a long shot, but ideally, pipes shouldn't even be used between a TAP consumer and TAP emmitter. Future Test::More can communicate through shared memory through XS to future TAP::Harness, with the IPC buffer maybe looking like a stream, or the TAP "parsing" (no TAP is generated) is "done" (there is no TAP, just an array of test records) on the client side in Test::More, which also gives the benefit of "out of sequence" TAP being impossible due to perl thread safety in Test::More::XS.

    Why is TAP::Harness's design flawed? I saw all of this being done with stepping and nytprof inside TAP::Harness.

    Things not do include, no method accessors, callbacks, method dispatchers in PP, declarative syntax, no pluggable tokenizers, no roll your own RTTI, not RTTI at all, just 3 classes, and absolutly no "Harness::Data::Integer" class since Perl isnt Javascript, and will not JIT your Integer class into a scalar, always use hash keys, do not use classes and constructors if you can use an integer bitfields or integer bools/!!0, this is Perl, not C++, not Java. Do not build bool methods that should be bitfields, that are aggregations of other bool methods, since you wind up exponential method calls, and there is no caching since caching is evil since it cant be plugged later on according the OOP dogma, so the result is to parse a TAP line over and over. Do not use "class factory" classes. There is no reason to have closured, dynamically generated, anonymous classes. If you need 2 classes, both named "Record" since you are too incompetent to prefix "Customer::" or "Inventory::" to the word "Record", you should name classes after your pets, I hope your house doesn't have 2 Rustys. Perl has "packages", do not invent our own. Do not nest hashes inside hashes. Perl hashes aren't C structs where all the "."s in "pointer->a.b.c.d" optimized away. Do not write classes, where the ctor, does nothing except blesses an empty hash, and then each method of the class checks if the object was "really ctored" and conditionally calls the real ctor, in some crazy attempt to optimize for bad callers that unnecessarily create objects, never call a single method on them, then dtor them. Do not ask an object if it can() do anything, that means your objects are lying as to their abstract base class. If you bought a car on ebay, and the seller mails you 4 tires with a $1K shipping charge, do you timidly do nothing and buy another car online? Do not implement has_pending() with return !!scalar(shift->pending()); where pending() is return map{ref($_)->new($_)} @{shift->{queue}}. Do not implement meaningless sort calls, such as in $new->{$_} = $self->{$_} foreach sort @clone;. Do not collect error/fail state diagnostics, build error message strings, when there is no error, and you will never use that error message or state snapshot again.

Is pushing strict and warnings still relevant?
7 direct replies — Read more / Contribute
by stevieb
on Apr 06, 2015 at 17:47

    I've been out of using Perl for some time now. After I decided to leave the Network Engineering field, many things changed.

    Before I left, I wrote some tutorials et-al (v5.10-ish) and did a preliminary examination report on Perl6, but since, I've found a job where I've been pushed into Python.

    My question is, as I dabble here on Perlmonks and some of my older code, I wonder if it is still important to remind people to use the warnings and strictures, or am I getting old?

    -stevieb

Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)
7 direct replies — Read more / Contribute
by BrowserUk
on Apr 05, 2015 at 07:16

    The basic premise of (the titled) fast string matching algorithms, is that you preprocess the needle and construct a table of shifts or skips that allow you to skip over chunks of the haystack and thus speed things up.

    The following, I hope clear, explanation (they are few and far between) describes a variant called Quick Search.

    You start with a haystack and needle:

    0000111010011100101110110111000111101011110011111011111110000000100010 +0100001010010101000001101000110010011000101101010110011011000000 000010100101010000011010

    You inspect the needle and build the skip table:

    00001010 01010100 00011010 00001010 shift 24 01010100 shift 16 00011010 shift 8 xxxxxxxx shift 32 (all other patterns not found in the needle allow ma +ximum shift)

    And apply it. Try the pattern at the start of the haystack; doesn't match, so look up the next 8 bits of the haystack in the table, and find the shift = 32:

    000011101001110010111011 01110001 111010111100111110111111100000001000 +100100001010010101000001101000110010011000101101010110011011000000 000010100101010000011010 ???????? shift = 32

    So, apply the shift and try the needle at the new position. No match, lookup the next 8 bits of the haystack to find a shift of 32:

    0000111010011100101110110111000111101011110011111011111110000000100010 +0100001010010101000001101000110010011000101101010110011011000000 000010100101010000011010???????? shift + = 32

    Apply the shift, try the needle at the new position. No match, look up the next 8 bits, get the shift of 8:

    0000111010011100101110110111000111101011110011111011111110000000100010 +0100001010010101000001101000110010011000101101010110011011000000 000010 +100101010000011010???????? shift = 8

    Apply the shift, try the match, And success. Needle found.

    0000111010011100101110110111000111101011110011111011111110000000100010 +0100001010010101000001101000110010011000101101010110011011000000 + 000010100101010000011010 Found at 72

    Now lets try that with the same haystack but another needle:

    10000101 00101010 00001101 10000101 shift 24 00101010 shift 16 00001101 shift 8 xxxxxxxx shift 32 0000111010011100101110110111000111101011110011111011111110000000100010 +0100001010010101000001101000110010011000101101010110011011000000 100001010010101000001101???????? shift 32 100001010010101000001101???????? shift + 32 100001 +010010101000001101???????? shift 32 + 100001010010101000001101???????? shift 32 + 10000101001 +0101000001101 >>> Not found.

    Great! Four compares & four skips to discover the needle isn't in the haystack.

    Except that it is!

    0000111010011100101110110111000111101011110011111011111110000000100010 +0100001010010101000001101000110010011000101101010110011011000000 + 100001010010101000001101

    And that's why Boyer-Moore, Horspool, Alpha-Skip, Quick Search et. al. don't work for bit-strings.

    It doesn't matter what size you make the chunks of bits -- above I used 8 -- the skip table will only ever consider aligned groups of bits, but bit-strings inherently consist entirely of unaligned bits!

    (Don't believe me; just try it for yourself.)

    And the point of this meditation?

    To prove my instincts were right?

    Maybe, but mostly because I wanted to be wrong. I wanted there to be a better than brute force mechanism. And I'm still hoping that someone will point me at one.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
A Big "Thank You" To Strawberry Perl Folks.
No replies — Read more | Post response
by Anonymous Monk
on Mar 31, 2015 at 03:14

    Big Thank You to all the Strawberry Perl creators/Maintainers/Developers. You have created an awesome distribution. What is even more awesome is Strawberry Portable Perl. It has made my life simpler. No Admin Rights needed to install it. I have it running on our production servers. There are some applications which are using Perl, and I treat that as "System Perl". So no fiddling there. I first downloaded the portable perl to my workstation, installed a few modules, and simply copied the folder to the production server, where I had some scripts running. Worked like a charm. Beautiful.

    It also ended up installing gmake/dmake etc which were extremely useful. I use gVim on windows and was recently playing around with some plugins which required vimproc. Compiling it was easy peasy. All thanks to the extra goodies you folks have provided.

    Thank you all once again.

MJDs Contract Warnings - courtesy of Perlweekly
4 direct replies — Read more / Contribute
by ww
on Mar 30, 2015 at 08:10
RFC: Automatic logger module
2 direct replies — Read more / Contribute
by balajiram
on Mar 23, 2015 at 22:51
    I have written a Perl module that redirects the the output of STDOUT and STDERR to a file very easily. For example:
    use Auto::Log; my $log = new Auto::Log, "some.log"; ...Write some code here... print "INFO: This gets printed to the log file" $log->DESTROY;

    One could also use the log file in tee mode using a simple option.

    I wanted to hear from users as to how they feel about this. What do you think? Do you think you'd like to use such a Perl module?

    The principal use of this module is that all that a user ever needs to do is to use the print, or warn statements. One doesn't have to open a new file using open() or anything. In my opinion this saves effort for a regular programmer who wants to get code written fast.

    Let me know what you think though.

    Also, please do give some suggestions for the module name. Do you like the name 'Auto::Log'?

    Balaji

Perl and the Future
6 direct replies — Read more / Contribute
by hangon
on Mar 19, 2015 at 02:22

    Good evening Monks, I thought I'd try to stimulate some discussion on advocating and passing Perl along to the next generation.

    While Perl is not dead as some advocates of other languages like to point out, it sems to have lost ground in some areas. Web development is one example, when PHP became popular. Another is as a "scripting" interface for other applications. Apparently Python is now the language du jour for that purpose, as well a good number of standalone open source programs.

    I get it that Perl is no longer the "cool" new language, that distinction was taken by Python, then Ruby and possibly Lua will be next. I see an evolution here, where in the public perception these languages go from shiny & new, to mainstream, to old and somewhat unknown or overlooked.

    Today I had an interesting conversation with the technical programs coordinator for the public library system of a mid-sized U.S. city. Among other things, they hold classes on programming for people of all ages. There are visually oriented languages for teaching young children, and they teach Ruby, Python and Javascript to people from early teenagers through older adults. However, she seemed to be unfamiliar with Perl, I don't think she was more than vaguely aware of Perl's existence.

    So, it occurs to me that this may be a good place to advocate Perl. Perhaps experienced Perl programmers could volunteer to teach the language at public libraries or similar venues, where to that audience Perl would be new again. This could be a win-win where you advocate and teach your favorite language, while giving back to your community and the Perl community as well.

Using serialized data structure to change hash key
3 direct replies — Read more / Contribute
by docdurdee
on Mar 18, 2015 at 10:13
    I have on occasion had the need to dig down into a complex data structure to change a key (e.g. s/SOMEKEY/SomeKey/, etc. ). The usual digging approach is using loops. Here is a useful trick using a serialized data structure that has saved me some time here and there:
    use YAML::XS; my $yaml = Dump $hash; $yaml =~ s/SOMEKEY/SomeKey/; $hash = Load $yaml;
    I posted this on Stack Overflow to an old question, but it didn't get many votes. So either it's a bad idea or the question was too old. Anyway, I post it here because I like this trick. Clearly, you should dump the structure and be sure that your replacement won't mess something else up. With power comes responsibility...
Happy PI day!
1 direct reply — Read more / Contribute
by docdurdee
on Mar 14, 2015 at 17:16
    perl -e ' $runs = 10000000 ; do { $x = rand(1); $y = rand(1); $cnt++ i +f $x*$x + $y*$y>1 } foreach (1 .. $runs); print 4*(($runs-$cnt)/$runs +)."\n"'
Perl 5, Python, Rakudo, C/C++, Java, Lua?
6 direct replies — Read more / Contribute
by raiph
on Mar 11, 2015 at 22:44
Why PerlMonks is an amazing place.
4 direct replies — Read more / Contribute
by Anonymous Monk
on Mar 03, 2015 at 01:34

    Hi,

    I know this is not a question, but as an anonymonk, I cannot post it anywhere else, so here goes.

    I was the same guy who asked a question on "Re learning Perl". I even gave a small code snippet I wrote just to show how much I remember. This was pretty late in the night. I woke up half expecting someone to shoo me away at worst, or un answered post at best, because I was not sure how my question would be perceived. To my amazement and satisfaction, not a *single* sullen response!!!. The monks who responded genuinely wanted to guide. They suggested tips and books. Truly, there is no other forum like this. I've been here so many times, asked so many questions, but not even once, I repeat, not even once was I rudely sent back. I've seen much much worse forums, but there is something different here at the Monastery. Whenever someone asks me why I use Perl, I tell them 1) Because this is the only scripting language I know (and like) and 2) that they should visit PerlMonks. It's the most amazing place to come to ask questions and get genuine answers.

    It's also here at PerlMonks that I learnt that it's ok if you do not remember syntax, you can look up the documentation, no one expects you to remember everything by heart. As long as you have the basic fundamentals clear, you can barge ahead, and then fill in the gaps. You monks may not believe it, but there are other places not so forgiving/understanding/helpful.

    I hope the place stays as wonderful as it is right now. Thank you Monks.

Acknowledgement of Contributions
1 direct reply — Read more / Contribute
by jmlynesjr
on Mar 02, 2015 at 14:24

    Monks:

    Unlike most(a lot?) of you, I am retired and I enjoy learning Perl and wxPerl. As such, my time is free, which gives me a great appreciation for the time contributed to the Monastery by those of you who do make your living by doing Perl.

    A few days ago I got a 911 from my daughter who is writing her Doctoral Dissertation in Economics. She needed data on all the power plants in the US. The available data came from the EIA and EPA. As governments are famous for, the plant identifiers between the agencies are different. She ended up with a 10,000 row spreadsheet to normalize.

    I remembered reading posts on working with Excel. A search of the Monastery turned up Excel To Tab Delimited using Spreadsheet::ParseExcel posted by upallnight.

    Within an hour I had installed the module from CPAN and had the sample code running against her data. Several iterations later, I could extract selected columns into a hash to determine the unique plant names and generate a file of edits compatible with Matlab. I still have 700 rows to manually edit, but Perl has already saved us a lot of time.

    Whether you post a complete solution or just a hint, you never know who might can benefit from your knowledge even years after your post.

    Thanks for all of your contributions.

    Update: Fixed typo in title.

    James

    There's never enough time to do it right, but always enough time to do it over...

Porting (old) code to something else
8 direct replies — Read more / Contribute
by Tux
on Feb 16, 2015 at 09:41

    As I am in the process of porting Text::CSV_XS to perl6, I have learned not only a lot about perl6, but also about loose ends.

    We all know by now that perl6 is reasonable likely to be available in September (of this year), and depending on what you hope or expect, you might be delighted, disappointed, disillusioned or ecstatic (or anything in between: YMMV).

    My goal was to be able to present a module working in perl6 that would provide the user with as much functionality possible of what Text::CSV_XS offers: flexible feature-rich safe and fast CSV parsing and generation.

    For now I have to drop the "fast" requirement, but I am convinced that the speed will pick up later.

    Text::CSV_XS currently offers a testsuite with 50171 tests, so my idea was that if I convert the test suite to perl6 syntax, it good very well serve a point of proof for whatever I wrote in perl6.

    There's a few things that you need to know about me and my attitude towards perl6 before you are able to value what has happened (at least I see this as a valueable path, you might not care at all).

    I do not like the new whitespace issues that perl6 imposes on the code. It strikes *me* as ugly and illogical. That is the main reason why I dropped interest in perl6 very early in the process. In Sofia however, I had a conversation (again) with a group of perl6 developers who now proclaimed that they could meet my needs as perl6 now has a thing called "slang", where the syntax rules can be lexically changed. Not only did they tell me it was possible, but early October 2014, Slang::Tuxic was created just for me and low and behold, I could write code in perl6 without the single big annoyance that drove me away in the first place. This is NOT a post to get you to use this slang, it is (amongst other things) merely a reason to show that perl6 is flexible enough to take away annoyences.

    Given that I now can write beautiful perl (against, beauty in the eyes of the beholder), I regained enjoyment again and John and I were so stupid to take the fact that perl6 actually can be used now, to promise to pick a module in the perl5 area and port it to perl5. XS being an extra hurdle, we aimed high and decided to "do" CSV_XS.

    So, I have 50000+ tests that I want to PASS (ideally) but I soon found out that with perl6 having type checking, some tests are useless, as the perl6 compiler already catches those for you. Compare that to use strict in perl5, so I can just delete all tests that pass wrong arguments to the different methods.

    Converting error-dealing stuff was fun too but I think that if you try to mimic what people expect in perl5, it is not too hard to get the best of both worlds: I'm quite happy already with error-handling as it stands.

    So, the real reason for this post is what I found to have no answer to, as it wasn't tested for or had tests that were bogus.

    • What should be done when parsing a single line of data is valid, but there is trailing data in the line after parsing it.?
      $csv->parse (qq{"foo",,bar,1,3.14,""\r\n"more data});
      parse is documented to parse the line and enable you to get the fields with fields. In contrast with getline, which acts on IO, it will just discard all data beyond the EOL sequence. I am uncertain if that is actually what it should do. Should it "keep" the rest for the next iteration? Should it be discarded? Should it cause a warning? Or an error?
    • What is the correct way to deal with single ESCapes (given that Escape is not the default " or otherwise equal to QUOtation). Here I mean ESCape being used on a spot where it is not required or expected without the option to accept them as literal.
      $csv->escape_char ("+"); $csv->parse ("+"); $csv->parse ("+a"); $csv->parse ("+\n");
      Leave the ESCape as is? Drop it, being special? Warn or Error?

    Questions like those slowed down the conversion as a whole, as I can now take my own decisions with sane defaults (like binary => True) instead of caring about backward compatibility issues.

    Anyway, feel free to check out what I have done so far on my git repo and I welcome comments (other than those on style) in the issues section. Feel free to join #csv on IRC to discuss. Hope to see you (or some of you) at the next Dutch Perl Workshop 2015 in Utecht.


    Enjoy, Have FUN! H.Merijn

Add your Meditation
Title:
Meditation:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":


  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others making s'mores by the fire in the courtyard of the Monastery: (4)
    As of 2015-04-20 01:55 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Who makes your decisions?







      Results (364 votes), past polls