Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

The Monastery Gates

( #131=superdoc: print w/replies, xml ) Need Help??

Donations gladly accepted

If you're new here please read PerlMonks FAQ
and Create a new user.

New Questions
Using negative lookahead
4 direct replies — Read more / Contribute
by ibm1620
on Oct 18, 2017 at 21:14
    I want to create a regex that will identify a string surrounded by quotes, and remove the quotes. If the quote symbol appears within the string, the match should fail. The quotes can be either ' or ". Eventually they might be multi-character strings (e.g. ''). I'm not concerned at this point about recognizing escaped embedded quotes. This is slightly contrived .. I mostly want to understand why a negative lookahead isn't working the way I thought it would.

    I sure would appreciate being shown what I'm misunderstanding.

    #!/usr/bin/env perl use warnings; use strict; my @cases = ( q{'abc"def'}, q{'abc'}, q{"abc"}, q{''}, q{'abc'def'}, # Want this to fail matching q{'This shouldn't match'}, # Want this to fail matching q{"This isn't a problem"}, q{"abc}, q{abc"}, q{abc}, q{'abc"}, q{'ab''}, # Want this to fail matching ); strip_quotes($_) for @cases; # If we can remove a matching pair of single or double quotes from # a string, without the quote symbol also appearing within the string, # do so. Otherwise don't change the string. sub strip_quotes { my $line = shift; print "\n$line\n"; # NO NEGATIVE LOOKAHEAD # This works except it allows an embedded delimiter if ( $line =~ m{^ # anchor ( # capture delimiter in pos 1 ["'] # delim is single or double quote ) (.*) # anything \g1$}x # finally, the delim ) { print " 1- Got a match: delimiter was {$1}, body was {$2}\n"; } else { print " 1- No match.\n"; } # ATTEMPTING NEGATIVE LOOKAHEAD # This should fail if the delimiter is found in non-terminal pos. if ( $line =~ m{^ # anchor ( # capture delimiter in pos 1 ["'] # delim is single or double quote ) (.*(?!\g1)) # neg lookahead for delim \g1$}x # finally, the delim ) { print " 2- Got a match: delimiter was {$1}, body was {$2}\n"; } else { print " 2- No match.\n"; } }
    Result:
    'abc"def' 1- Got a match: delimiter was {'}, body was {abc"def} 2- No match. 'abc' 1- Got a match: delimiter was {'}, body was {abc} 2- No match. "abc" 1- Got a match: delimiter was {"}, body was {abc} 2- No match. '' 1- Got a match: delimiter was {'}, body was {} 2- No match. 'abc'def' 1- Got a match: delimiter was {'}, body was {abc'def} 2- No match. 'This shouldn't match' 1- Got a match: delimiter was {'}, body was {This shouldn't match} 2- No match. "This isn't a problem" 1- Got a match: delimiter was {"}, body was {This isn't a problem} 2- No match. "abc 1- No match. 2- No match. abc" 1- No match. 2- No match. abc 1- No match. 2- No match. 'abc" 1- No match. 2- No match. 'ab'' 1- Got a match: delimiter was {'}, body was {ab'} 2- No match.
Why doesn't this die with "Can't use an undefined value as an ARRAY reference"?"
1 direct reply — Read more / Contribute
by kikuchiyo
on Oct 18, 2017 at 13:49

    Consider the following script:

    #!/usr/bin/perl use strict; use warnings; use Test::More; use Data::Dumper; my $hash = { '50' => [ 1 ] }; print Dumper $hash; is(keys %{$hash}, 1, q/keys %{$hash} is 1/); is(scalar @{$hash->{'50'}}, 1, q/$hash->{'50'} is 1/); is(scalar @{$hash->{'100'}}, 0, q/$hash->{'100'} is 0/); print Dumper $hash; done_testing();

    With Perl 5.24.3 it runs to the end and all tests pass, even though I would expect that it dies with an "Can't use an undefined value as an ARRAY reference" error when it tries to dereference $hash->{'100'} which indeed does not exist.

    Compare with

    #!/usr/bin/perl use strict; use warnings; my $hash = { '50' => [ 1 ] }; print scalar @{$hash->{'100'}};

    which dies with the expected error.

    Under Perl 5.16 the first program also dies with the expected error. (This is how we initially noticed the problem: a program that was developed on 5.22+ needed to be ported to Centos 7 which has 5.16, and the tests began to fail there.)

    What is going on here?

    (Errata: Now I've ran with more Perl versions (perversions), and it doesn't die under perl 5.22 and above, but dies as expected under perl 5.20 and below)

Puzzle Regex: Letter Frequency Arithmetic Sequence
4 direct replies — Read more / Contribute
by QM
on Oct 17, 2017 at 11:03
    I ran across a blog post about an interesting word characteristic, and wondered if a regex can be written to match this (e.g., with the plan to search a dictionary file). I suspect the answer is no, without invoking the magic "code in a regex".

    Unfortunately, I don't have time now to try my hand at it, but I thought I'd post it here for everyone to have a go.

    Update: Possible puzzles:

    1) Find the longest words where each letter used has a different frequency.
    2) Find the longest words where letter frequencies are sequential (e.g., 3,4,5,6).
    3) Find the longest words where letter frequencies are sequential starting from 1.

    Use any dictionary you like. If you have a mahvelous dictionary, drop a link here.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

"no warnings 'uninitialized'" failing
5 direct replies — Read more / Contribute
by jest
on Oct 16, 2017 at 10:05

    An older part of our codebase has a function for converting a data structure to XML, using XML::Simple (for some legacy reason), and saving it to a directory.

    At some point, we upgraded XML::Simple to 2.24, and it started throwing "Use of uninitialized value" errors. Whatever caused this was irrelevant to our purposes, so I put this in a block with "no warnings 'uninitialized';", and all was good.

    For no reason that I can tell--I didn't upgrade any package, or my Perl version (5.16.3 for this)--our test suite is now throwing the error again, though the "no warnings" is right there:

    use XML::Simple; my $xml; { # Suppress warnings from XML::Simple 2.24 no warnings 'uninitialized'; my $xmlout = XMLout($data, NoAttr => 1, RootName => undef, Suppres +sEmpty => 0); $xml = '<xml version="1.0" encoding="UTF-8">' . "\n$xmlout</xml>"; }
    Result from test suite:
    Use of uninitialized value at /[path/to]/Data.pm line 205. Use of uninitialized value at /[path/to]/Data.pm line 205. Use of uninitialized value at /[path/to]/Data.pm line 205. [...]
    Where line 205 is the "my $xmlout..." from above.

    Can someone explain why I am getting a warning that I explicitly shut off on the immediately preceding line? And how I stop this? The tests pass, but it's distracting as hell to get 50 lines of this error every time we run it.

XML::Smart - undesired decoding of special XML characters
3 direct replies — Read more / Contribute
by NeedForPerl
on Oct 16, 2017 at 04:34

    Hi, Monks I'm using XML::Smart and running into some problems if there a special characters in the XML document:

    If i create an XML::Smart object that encapsulates an XML document which contains special characters like "š" or "Ł" the function data() uses Base64 encoding but i don't want that. So i decided to use the function with the argument "decode => 1".

    After that change everthing works fine unless there are special XML characters like "<", ">" or "&" inside an XML element . I guess that the call of data(decode => 1) results in encoding "&amp;" to "&" for instance. Is it possible to avoid that behaviour?

    I used the function set_binary('FALSE') but somehow it didn't work:

    my $log = `svn log http://... --xml --revision 123`; my $test = XML::Smart->new($log); $test->{log}->{logentry}[0]->{msg}->set_binary('FALSE'); print $test->data();

    I'm using version 1.78 of the module.

    I have tried to contact the author via mail using the e-mail address which can be found on the FAQ of XML::Smart. The E-Mail address doesn't exist anymore.

    Many thanks in advance.

Adding values in a variable
3 direct replies — Read more / Contribute
by bisimen
on Oct 15, 2017 at 14:23
    Hi, Monks. I'm very new to Perl and programing. But I'm stuck right now. First, the code:
    #!/usr/bin/perl use warnings; no warnings 'uninitialized'; #Calculating Protein Mass ($input) = $ARGV[0]; open ($in, "<", $input) || die "Can't open \"$input\".\nError = $!.\n" +; $protein = <$in>; my(%mass_values) = ( 'A' => 71.03711, 'C' => 103.00919, 'D' => 115.02694, 'E' => 129.04259, 'F' => 147.06841, 'G' => 57.02146, 'H' => 137.05891, 'I' => 113.08406, 'K' => 128.09496, 'L' => 113.08406, 'M' => 131.04049, 'N' => 114.04293, 'P' => 97.05276, 'Q' => 128.05858, 'R' => 156.10111, 'S' => 87.03203, 'T' => 101.04768, 'V' => 99.06841, 'W' => 186.07931, 'Y' => 163.06333, ); my $mass = ""; for(my $i=0;$i<length($protein);$i+=1) { $codon = substr($protein,$i,1); $mass .= "$mass_values{$codon}\n"; } print $mass; ($result) = $ARGV[1]; open ($out, ">", $result) || die "Can't write to \"$result\".\nError = + $!.\n"; print $out "$mass"; close "$out"; exit;

    This program is for this problem: http://rosalind.info/problems/prtm/

    In order to run it, you need a protein string file, but you need to be registered to get one from rosalind, but here is the data string I'm using: https://justpaste.it/1cfuo

    Now, this program I've written is just a modification of some codes I found here and there (And I kinda understand how it works). There is probably better ways of doing it, so just say if you see a quicker and easier solution!

    Anyway, I'm stuck at the part were I have to sum up all the numbers. Right now, if I run it, I will just get a long list of numbers like this:

    103.00919

    131.04049

    113.08406

    57.02146

    186.07931

    128.09496

    156.10111

    97.05276

    etc.

    But I've no idea how to sum it all up. I've tried to make it into an array, but then I get errors saying "this is not a number", because I think perl thinks all of the different numbers is one element, and not 1000 different elements...

    Anyway, I did complete the problem by "writing"(stole and modified) a separate program to sums it all up:

    #!/usr/bin/perl use strict; use warnings; # given a file with a number on each row, print the sum of the numbers my $sum = 0; my $filename = "result.txt"; open(my $fh, "<", $filename) or die "Could not open '$filename'\n"; while (my $line = <$fh>) { $sum += $line; } print "The total value is $sum\n";

    (I did try to put the above code into the first program, so that the program writes to the results file, then opens the results file, and then does the addition. But for some reason, I get the wrong answer doing it this way, like it skips adding some numbers for some reason...

    But I feel that this is something that can be done easily with just one perl program. But it was just my way of bodging it.

    Anyway, hope someone can correct me and explain my mistakes!

Filehandle with DKIM::Verifier
2 direct replies — Read more / Contribute
by nifu
on Oct 14, 2017 at 19:32
    Hello, thanks for reading my question and sorry for my bad english. I use a perl script to analyze differed headers from emails (saved as txt file). The script search every $x hours via cronjob in the directory for new emails. Like perl script.pl $email.filename.txt The perl script must now check for a valid dkim signature. I did some tests with Mail:: DKIM:: Verifier. But it only works when i read the file from stdin like perl dkim.pl < email.txt. I donīt know how to use the perl module with $dkim->load(*STDIN);. I have tried to load the file with a filehandle like $dkim->load(<fh>); or $dkim->load($filename);. But it doesn't work. Has someone an idea how i can solve this?
Almost certainly a module documentation issue
3 direct replies — Read more / Contribute
by dd-b
on Oct 13, 2017 at 17:37

    The Email::Sender::Transport::Print module has very sparse documentation. It does say you can pass a file-handle object in as the attribute "fh". Looking at other Email::Sender::Transport implementations, it appears to me that passing a hash to new with the attributes as name/value pairs is the way it's done (there's a rather startlingly complete shortage of examples of Email::Sender::Transport::Print usage on the web).

    The code is:

    open (my $sm, '>> :encoding(UTF-8)', 'out.txt') or die "Failed to open out.txt: $!"; $transport = Email::Sender::Transport::Print->new({ fh => $sm, });

    ($transport is declared outside the bit quoted, but it is declared.)

    When I do that, I get the error "isa check for "fh" failed: GLOB(0x806897840) is not blessed". This seems to imply that it noticed that I was in fact passing something as attribute fh, at least!

    In the debugger, it seems to me that $sm, the thing passed in as the value of attribute fh, is in fact blessed.

    DB<3> x $sm 0 GLOB(0x806897840) -> *main::$sm FileHandle({*main::$sm}) => fileno(7) DB<4> n isa check for "fh" failed: GLOB(0x806897840) is not blessed at (eval 7 +8)[/home/ddb/perl5/lib/perl5/Sub/Quote.pm:3] line 59, <$if> line 2. eval {...} called at (eval 78)[/home/ddb/perl5/lib/perl5/Sub/Quote +.pm:3] line 58 Email::Sender::Transport::Print::new("Email::Sender::Transport::Pr +int", HASH(0x80213c8e8)) called at /home/ddb/smbshare/Documents/Fando +m/Minicon/Mc53/mc53_pr1_001.pl line 178M

    I am now wondering WTF. A preliminary attempt to look at the source code of the module foundered; it appears there is none, it's generated on-the-fly using moo or something? Which I'm not familiar with, and while in theory I could study it up and then figure out for sure what's expected, that's starting to look like a large amount of work.

    Ideas?

Questions regarding mixing up all the data from Image::ExifTool
3 direct replies — Read more / Contribute
by Buttonzz
on Oct 13, 2017 at 16:18

    Hello everybody and thanks for reading my problem.

    Im new in here and need some advice to overwhelm some trouble programming my exif-editor Tool with Perl and Perl-Tk.

    Im dealing with two problems.

    I)

    I have a scrollable GridColumn with 2 columns. Before Im creating it, Im creating a menu where you can pick a file or directory. So I cant call $gc->refresh() in the menu's subs.

    If I start the application and pick a file, the gridcolumns data sofore wont update.

    Is there any way creating an event that "watches" if $file or $folder swap from undef to something else? - I 'overread' some sites dealing with events but cant find anything that really fits. (Or I didnt get deep enough.

    II)

    Dealing with Image:ExifTool - is there any 'easy' way to get all the info of more than one file (all files with exif info in a specific directory) together?

    If I manage those 2 problems I might eb able to stick to my appointment next friday. Its an exercise I chose myself for the last course before my bachelor thesis.

    Unfortunate my kid had lots of trouble with teeth - so we with sleep - and after that we all got ill, so it gets close to the deadline now and Im missing some things. Im sure my lecturer would give me another week. But I wont use this if I dont have to.

    In case you need more information, feel free to ask. I wanted to keep it quite theoretically because I dont want anyone to do my work - I just need some 'slaps on the back of my head' as we German say. :) Have a nice evening and thx for anyone dealing with my post. :)

Screen scraping complex tables and divs
3 direct replies — Read more / Contribute
by parser
on Oct 13, 2017 at 14:53

    I have been screen scraping for a few years with WWW::Mechanize and HTML::TokeParser and they have served me well. However, I recently encountered a set of pages which use complex table structures and numerous tab divs. I need a module (or methodology) which will allow me to search for sections of HTML in a more jQuery find()-like manner rather than simply consuming tokens from a stream of HTML.

    I read through the post The State of Web spidering in Perl and, while helpful, the focus is more on spidering than scraping. I am interested in recommendations from the Monks if there are higher-order methods of finding contructs in HTML using Perl besides regular expressions and token parsing.

    I read Mahmoud's jquery module on CPAN with interest but it appears not to have been maintained since 2013 and and I am uncertain it can query on table structures. To be fair, jQuery is limited on querying unlabeled table structures as well.

    Here is a small example of what I am trying to accomplish:
    1) Find the 6th and 9th rows in a named table (given an id) and pull out the name and value pairs.
    2) Slurp in every row in a named table and parse out the name value pairs.

    Cheers!
How to turn "HASH(0x1234567)" into a real HASH
3 direct replies — Read more / Contribute
by rodd
on Oct 13, 2017 at 12:45
    Monks,

    While debugging some code that's printing "HASH(0x9999999)" to stdout, I'd like to be able to parse that out and convert it to a real Perl hash to be able to identify the hash's origin from looking at its contents.

    I found I can peek into its data structure using the following:

    my ( $addr ) = 'HASH(0x9999999)' =~ /HASH\(0x(.*)\)/; $addr = hex $addr; my $hash_dump = pack 'L', $addr;

    But unfortunately $hash_dump is not a HASH but just memory garbage.

    How can I convert the address into a valid Perl HASH (or HASH ref)?

New Meditations
[Perl 6]: Small discoveries V, True / False / FileNotFound
1 direct reply — Read more / Contribute
by holli
on Oct 19, 2017 at 13:54
    Omg, I love this. Did you ever have a clear, slick little function that needs to return a boolean, and you also want to communicate an error condition? You basically have the choice of returning two values, reversing the consuming condition (meaning an empty return value be considered true), or using a string reference as an argument to the function.

    Witness Perl 6:
    sub slick() { if do-stuff { return "SomeValue"; } else { return "Some error message" but False; } } if my $result = slick { process( $result ); } else { log-error( $result ); }


    holli

    You can lead your users to water, but alas, you cannot drown them.
Be prepared for CSV injections in spreadsheet
3 direct replies — Read more / Contribute
by Tux
on Oct 18, 2017 at 07:34

    Read this article to get an idea of how dangerous it can be to blindly accept macro's in spreadsheets. Be it MS Excel or Google spreadsheets, they all suffer.

    You cannot blame CSV for it. CSV is just passive data.

    Once you load or open a CSV file into something dangerous as a spreadsheet program that allows formula's to be execcuted on open, all bets are off. Or are they?

    The upcoming Text::CSV_XS has added a new feature to optional take actions when a field contains a leading =, which to most spreadsheet programs indicates a formula.

    On both parsing and generating CSV, you will be able to specify what you want to do (where "formula" does not go beyond the fact that the field starts with a =):

    • Do nothing special (default behavior) and leave the text as-is
    • Die whenever a formula is seen
    • Croak when a formula is seen
    • Give a warning where a formula is seen
    • Replace all formulas with an empty string
    • Remove all formulas (replace with undef

    Code speaks loader than words ...

    I'm pretty pleased with the diagnostics

    $ cat formula.csv a,b,c 1,=2+3,4 6,,7,=8+9, $ perl -MCSV -e'$_ = dcsv (in => "formula.csv", bom => 1, formula => " +diag")' Field 2 (column: 'b') in record 1 contains formula '=2+3' Field 4 in record 2 contains formula '=8+9'

    Expect this to be available by next week.


    Enjoy, Have FUN! H.Merijn
Perl6 discoveries ó floating-point
2 direct replies — Read more / Contribute
by Grimy
on Oct 18, 2017 at 06:59
    Anonymous Monk brought up a really interesting discovery here. Unfortunately, that thread got derailed, so Iím making a separate one, as suggested by Your Mother. One of the first things I found while testing is this really interesting tidbit:
    $ perl6 -e 'say 0.99999999999999999000001' 1.000000000000000073886090 $ perl6 -e 'say 0.99999999999999999000001 > 1' True
    But then I realized I was using an outdated Rakudo (2017.04). So I updated to 2017.09, and now those print 1 and False, respectively. Thereís still some interesting behavior in 2017.09, though:
    $ perl6 -e 'say 0.7777777777777777777770' 0.77777777777777785672697 $ perl6 -e 'say 0.7777777777777777777771' 0.777777777777777767909129
    Note that the second number printed is strictly smaller than the first one, even though the second source number is strictly larger than the first one, spelled in the same fashion and to the same number of significant digits! However, comparison and subtraction still return exact results:
    $ perl6 -e 'say 0.7777777777777777777771 > 0.7777777777777777777770' True $ perl6 -e 'say 0.7777777777777777777771 - 0.7777777777777777777770' 1e-22
    Okay, thatís probably because one is a Num and the other is a Rat, so letís convert everything to Num explicitly:
    $ perl6 -e 'say Num(0.7777777777777777777770)' + 0.777777777777778 $ perl6 -e 'say Num(0.7777777777777777777771)' 0.777777777777778 $ perl6 -e 'say Num(0.7777777777777777777770) > Num(0.7777777777777777 +777771)' True $ perl6 -e 'say Num(0.7777777777777777777770) - Num(0.7777777777777777 +777771)' 1.11022302462516e-16
    Huh. Now they print the same, but theyíre still different numbers when compared. Note that the sign of the difference got switched:
    $ perl6 -e 'my $a = 0.7777777777777777777770; my $b = 0.77777777777777 +77777771; say $a <=> $b; say Num($a) <=> Num($b)' + Less More
    Also interesting is that many Nums donít survive a round-trip to Str:
    $ perl6 -e 'my $a = Num(1/9); say $a == Num(Str($a))' False
    Can anyone point me to the Perl6 specs/docs/whatever that explain those behaviors?
Parsing HTML/XML with Regular Expressions
8 direct replies — Read more / Contribute
by haukex
on Oct 16, 2017 at 07:48

    Your employer/interviewer/professor/teacher has given you a task with the following specification:

    Given an XHTML file, find all the <div> tags with the class attribute "data"1 and extract their id attribute as well as their text content, or an empty string if they have no content. The text content is to be stripped of all non-word characters (\W) and tags, text from nested tags is to be included in the output. There may be other divs, other tags, and other attributes present anywhere, but divs with the class data are guaranteed to have an id attribute and not be nested inside each other. The output of your script is to be a single comma-separated list of the form id=text, id=text, .... You are to write your code first, and then you will be given a test file, guaranteed to be valid and standards-conforming, for which the expected output of your program is "Zero=, One=Monday, Two=Tuesday, Three=Wednesday, Four=Thursday, Five=Friday, Six=Saturday, Seven=Sunday".

    1 Update: Clarification: The class attribute should be exactly the string data (that is, ignoring the special treatment given to CSS classes). Examples below updated accordingly.

    Ok, you think, I know Perl is a powerful text processing language and regexes are great! And you write your code and it works well for the test cases you came up with. ... But did you think of everything? Here's the test file you end up getting:

    I encourage everyone to try and write a parser using your favorite module, be it:

    Honorable mention: Grimy for a regex solution ;-)

    I'll kick things off with Mojo::DOM (compacted somewhat, with potential for a lot more golfing or verboseness):

    Update 2017-10-18: Thank you very much to everyone who has replied and posted their solutions so far, keep em coming! :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the monks are mute...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2017-10-20 03:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My fridge is mostly full of:

















    Results (258 votes). Check out past polls.

    Notices?