Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

The Monastery Gates

by gods
on Mar 23, 1999 at 10:47 UTC ( #131=superdoc: print w/replies, xml ) Need Help??

Donations gladly accepted

If you're new here please read PerlMonks FAQ
and Create a new user.

New Questions
how to remove an empty line from a xml file??
1 direct reply — Read more / Contribute
by ankit.tayal560
on Oct 01, 2016 at 01:13
    use strict; use warnings; use Data::Dumper; use XML::DOM; my $parser=new XML::DOM::Parser; my $doc=$parser->parsefile('C:\perl\perl_tests\xmlin2.xml') or die$!; my $root=$doc->getDocumentElement(); my @address=$root->getElementsByTagName("address"); foreach my $address(@address) { if($address->getAttribute("name") eq "Joey") { if(my $item=$address->getElementsByTagName("item")->item(0)) { if(my $data=$item->getElementsByTagName("data")->item(0)) { $item->removeChild($data); } } } } $doc->setXMLDecl($doc->createXMLDecl('1.0','UTF-8')); $doc->printToFile("C:/perl/perl_tests/xmlin2.xml"); $doc->dispose; XML FILE ORIGINAL : <?xml version="1.0" encoding="UTF-8"?> <config logdir="var/log/foo/" debugfile="tmp/foo.debug"> <server name="sahara" osname="solaris" osversion="2.6"> <address name="ankit" id="70888"/> <address name="Joey" id="67890" flags="4"> <item used="1" order="0"> <data typeid="4"/> </item> </address> </server> <server name="gobi" osname="irix" osversion="6.5"> <address name="anshul" id="70689"/> </server> <server name="kalahari" osname="linus" osversion="2.0.34"> <address name="raghu" id="45678"/> <address name="lucky" id="67895"/> </server> </config> XML FILE AFTER MODIFICATION: <?xml version="1.0" encoding="UTF-8"?> <config logdir="var/log/foo/" debugfile="tmp/foo.debug"> <server name="sahara" osname="solaris" osversion="2.6"> <address name="ankit" id="70888"/> <address name="Joey" id="67890" flags="4"> <item used="1" order="0"> </item> </address> </server> <server name="gobi" osname="irix" osversion="6.5"> <address name="anshul" id="70689"/> </server> <server name="kalahari" osname="linus" osversion="2.0.34"> <address name="raghu" id="45678"/> <address name="lucky" id="67895"/> </server> </config> IDEAL MODIFIED XML FILE SHOULD BE : <?xml version="1.0" encoding="UTF-8"?> <config logdir="var/log/foo/" debugfile="tmp/foo.debug"> <server name="sahara" osname="solaris" osversion="2.6"> <address name="ankit" id="70888"/> <address name="Joey" id="67890" flags="4"> <item used="1" order="0"> </item> </address> </server> <server name="gobi" osname="irix" osversion="6.5"> <address name="anshul" id="70689"/> </server> <server name="kalahari" osname="linus" osversion="2.0.34"> <address name="raghu" id="45678"/> <address name="lucky" id="67895"/> </server> </config>

    I need to remove the empty line which is created due to deletion of data element in the xml file to achieve the ideal modified xml file. how can I do that? any suggestions?

POE / Win32 / select ?
1 direct reply — Read more / Contribute
by soliplaya
on Sep 30, 2016 at 17:04
    Dear Monks (and maybe rcaputo in particular),
    Platform: osname=MSWin32, osvers=6.3, archname=MSWin32-x86-multi-thread uname='Win32 strawberry-perl-no64 5.18.4.1 #1 Thu Oct 2 16:30:08 +2014 i386'
    This code :
    log_msg(0,"$pfx: starting child with command [$trigger]") +if $DEBUG; eval { $kid = POE::Wheel::Run->new( 'CloseEvent' => 'kidclose', 'Program' => $trigger, 'StderrEvent' => 'kidstderr', 'StdoutEvent' => 'kidstdout', 'ErrorEvent' => 'errorstate', ); };
    results in a program crash (after 5 seconds), and in the following error in the Windows Application Event log :

    Application Failure perl.exe 5.18.4.1 in encoding.dll 0.0.0.0 at offset 0000374

    (in several versions of Perl, ActiveState and Strawberry). In the logfile of the program calling the above, there is this :
    [2016/09/30-21:53:09] [I] process(): starting child with command [E:\M +IRA\strawberry\perl\bin\perl C:/EFS/bin/(myWheel).pl --debug 2 (...) + "E:\MIRA\Migration\medfolio_test\NxExport\output\Alles/FOLDER_414860 +93_177671_4/OVERVIEW.xml"] [2016/09/30-21:53:14] [E] select: at E:/MIRA/strawberry/perl/site/lib +/IO/Pipely.pm line 127.
    Which would correspond to this section in IO::Pipely.pm :
    120: my $hits = select( my $out_read = $in_read, 121: my $out_write = $in_write, 122: undef, 123: 5 124: ); 125: unless ($hits) { 126: next if ($! and ($! == EINPROGRESS) or ($! == EWOULDBLOCK)); 127: die "select: $!" unless $hits; 128: }
    (which (timeout value) "5" above seems to be the time between the 2 log messages above)(and indeed if I change this value in Pipely.pm, the crash happens correspondingly later). The above happens on a Windows 2003 R2 Server. It is quite troubling, because the same calling program and the same "Wheel" program, running on the same data, under the same version of perl (and of POE), on another machine (Windows XP SP3), works perfectly fine. To add to the puzzle : another older version of the same calling program, using the same Wheel starting code as shown above, on the same Win2003 server, works perfectly fine too, since years. The only link that I can think of, with the Event error mentioning "encoding.dll", is that the Wheel program at some point opens a file with UTF-16 encoding, like this :
    my $fh; unless (open($fh,'<:encoding(UTF-16)',$ov_file)) { ...
    but again, under Windows XP, this works fine, with the same file opened and then parsed. (and before it does that, the Wheel prints some log messages, which normally appear in the same application log above, but in this case do not). Humbly awaiting any idea or suggestion, cause I don't know where to look anymore.
Converting MIME From header to is0-8859-1
No replies — Read more | Post response
by ultranerds
on Sep 30, 2016 at 09:05
    Hi,

    I'm writing a system that will parse an email sent into a catchall email address. It's all working ok, but I'm having some fun with the base64 and QuotedPrint emails. Instead of trying to work it out myself (based on the enc type passed in), I've decided to opt for an existing module to work it out for me, and normalise it.

    Here is a sample script I have:
    my @emails = split /\n/, q|foo <andy@bar.com> =?utf-8?B?UGF1c2UgRG9yw6ll?= <pausedoree@gggg.com> =?UTF-8?Q?Village_Bambous_=2D_Chambre_d=27_H=C3=B4tes?= <village.bambo +us@ddd.com> =?utf-8?B?YmVybmFyZCB2ZXJpdMOp?= <naturedetente@fdd.fr> =?ISO-8859-1?B?TGHrdGl0aWE=?= Picot <villagabrielle@ffsdfsd.net> =?iso-8859-1?Q?Ancie_chambres_d=27h=F4tes?= <ancie.ha@dfdd.fr>|; use utf8; use Encode qw(encode decode); foreach (@emails) { $_ = decode('MIME-Header', $_); print "FOO: $_\n"; print $IN->header; use Data::Dumper; print Dumper($_); print "FOO: " . utf8::is_utf8($_) . "\n"; if (utf8::is_utf8($_)) { print "content..\n"; $_ =~ s/([\200-\377]+)/from_utf8({ -string => $1, -cha +rset => 'ISO-8859-1'})/eg; } print Dumper($_); print "FOO: " . utf8::is_utf8($_) . "\n"; print "\n\n"; }


    Here is a sample output:
    FOO: foo <andy@bar.com> Content-type: text/html; charset=iso-8859-1 $VAR1 = 'foo <andy@bar.com>'; FOO: $VAR1 = 'foo <andy@bar.com>'; FOO: FOO: Pause Dorée <pausedoree@gggg.com> $VAR1 = "Pause Dor\x{e9}e <pausedoree\@gggg.com>"; FOO: 1 convert.. $VAR1 = "Pause Dor\x{e9}e <pausedoree\@gggg.com>"; FOO: 1 FOO: Village Bambous - Chambre d' Hôtes <village.bambous@ddd.com> $VAR1 = "Village Bambous - Chambre d' H\x{f4}tes <village.bambous\@ddd +.com>"; FOO: 1 convert.. $VAR1 = "Village Bambous - Chambre d' H\x{f4}tes <village.bambous\@ddd +.com>"; FOO: 1 FOO: bernard verité <naturedetente@fdd.fr> $VAR1 = "bernard verit\x{e9} <naturedetente\@fdd.fr>"; FOO: 1 convert.. $VAR1 = "bernard verit\x{e9} <naturedetente\@fdd.fr>"; FOO: 1 FOO: Laëtitia Picot <villagabrielle@ffsdfsd.net> $VAR1 = "La\x{eb}titia Picot <villagabrielle\@ffsdfsd.net>"; FOO: 1 convert.. $VAR1 = "La\x{eb}titia Picot <villagabrielle\@ffsdfsd.net>"; FOO: 1 FOO: Ancie chambres d'hôtes <ancie.ha@dfdd.fr> $VAR1 = "Ancie chambres d'h\x{f4}tes <ancie.ha\@dfdd.fr>"; FOO: 1 convert.. $VAR1 = "Ancie chambres d'h\x{f4}tes <ancie.ha\@dfdd.fr>"; FOO: 1
    I'm a bit confused as to what encoding the string is in now though, as utf8::is_utf8($_) still seems to be giving me a positive, as to it being a utf8 string?

    Basically, the end game is to have ANY encodings converted into iso-8859-1 format (I know I know, not ideal, but I'm dealing with a legacy system, and it would be months of work to convert the whole site into utf8)

    Thanks for any suggestions

    UPDATE: Interesting. The output of:

    $_ = decode('MIME-Header', $_);


    Seems to actually give back the string in internal encoding. If I encode it after using:
    if (utf8::is_utf8($the_from)) { $the_from = encode('iso-8859-1', $_); }
    That seems to do it. Does that look OK? I just don't want to bugger it up :)

    Cheers

    Andy
Hypothesis: some magic string optimalization in perl kills my server from time to time
1 direct reply — Read more / Contribute
by leszekdubiel
on Sep 30, 2016 at 08:34
    Hello Perl Monks!

    Writing Perl feels like riding a vintage VW bus. Things don’t work the way you expect, but you can always feel the love. (Learned from article)

    I have a program that parses big strings -- 30MB of data. It intensively uses "\G" to continue parsing from the poit it has matched previously. Normally that program runs 6 (six) seconds. But every few days it is running 4 hours (yes... four hours) and consumes all computation power on server.

    My program reads $big_string from file, encloses that string in parentheses (creating a new string), then passes reference to that newly created string to function "list_extr" which does parsing and returns deserialized data structure.

    Function "list_extr" gets reference to the big string it should parse. I have found that when called like this (interpolate, take reference):

    list_extr(\"($big_string)")

    or like this (combine with dot, get reference to whole combination):

    list_extr(  \( "(" . $big_string . ")" )      )

    or like this (interpolate, save in new variable, take ref to variable):

    my $s = "($big_string)"; list_extr(\$s)

    it is sometimes very, very slow.

    To solve the problem I have to pass that through spritnf:

    list_extr(\sprintf("%s", $big_string))

    This makes function "list_extr" work very fast (six seconds instead of a few hours).

    My goal is to get $big_string, add parentheses at the begining and end of that string, pass reference of newly created string (enclosed in parentheses) to function list_extr. I hope that's clear.

    I think the problem is with some string optimalizations in perl. When using string interpolated by perl it doesn't create new string, but somehow computes positions of parsing (pos $big_string, \G in regex) -- this takes a lot of computations. When using sprintf perl doesn't do optimalization, but creates new, plain, non interpolated, non combined, simple string. I think that optimalization is sometimes done, sometimes not -- this is why the problem occurs only once a few days. Below are parsing functions.

    I have found that this solution sometimes is fast, sometimes slow:

    list_extr(\( "(" . $big_string . ")" ));

    and this solution is ALWAYS slow:

    my $ttt = "(" . $big_string . ")"; list_extr(\$ttt);

    # \G(?:\s|#.*$)* -- means start from last position \G, # skip spaces and comments # till the # end of line # ([[:alpha:]](?:_?[[:alnum:]])*) -- my identifier # restrictions; start with letter, then # letters, underscores, digits; but # two underscores in a row not allowed, # underscore at the end not allowed sub list_extr { my ($a) = @_; ref $a eq 'SCALAR' or croak "wrong ref"; my @l; $$a =~ /\G(?:\s|#.*$)*\(/mgc or croak "parse err"; while ($$a =~ /\G(?:\s|#.*$)*([[:alpha:]](?:_?[[:alnum:]])*)(? +:\s|#.*$)*/mgc) { push @l, {'name' => $1, 'parm' => parm_extr($a)}; } $$a =~ /\G(?:\s|#.*$)*\)(?:\s|#.*$)*/mgc or croak "parse err"; return \@l; } sub parm_extr { my ($a) = @_; ref $a eq 'SCALAR' or croak "wrong ref"; my %p; $$a =~ /\G(?:\s|#.*$)*\(/mgc or croak "parse err"; while ($$a =~ /\G(?:\s|#.*$)*([[:alpha:]](?:_?[[:alnum:]])*)(? +:\s|#.*$)*/mgc) { my $n = $1; if ($$a =~ /\G([[:alpha:]](?:_?[[:alnum:]])*|"(?:[^\\" +[:cntrl:]]+|\\[\\"nt])*")/mgc) { $p{$n} = $1; } elsif ($$a =~ /\G(?=[-+.\d])/mgc) { $p{$n} = numb_extr($a); } elsif ($$a =~ /\G(?=\()/mgc) { $p{$n} = parm_extr($a); } else { croak "parse err"; } } $$a =~ /\G(?:\s|#.*$)*\)(?:\s|#.*$)*/mgc or croak "parse err"; return \%p; } sub numb_extr { my ($a) = @_; ref $a eq 'SCALAR' or croak "wrong ref"; $$a =~ /\G(?:\s|#.*$)*([-+]?\d*(\.\d*)?)/mgc or croak "parse e +rr"; my $n = $1; $n eq '0.0' and return 0; $n =~ /\A[-+](?!0.0\z)(?=[1-9]|0\.)\d+\.\d+(?<=[.\d][1-9]|\.0) +\z/ or croak "parse err"; length $n <= 15 + 2 or croak "numb too long"; $n = 0 + $n; # 1234567890.12345 abs $n > 99999999999999.9 and croak "numb out of range"; return 0 + $n; }

dies with no message
2 direct replies — Read more / Contribute
by kp2a
on Sep 29, 2016 at 16:49
    program dies first line afer use statement even a blank line with no message except line number how to debug? perl -c reports systax OK both calling and called program
eval system timeout child process
3 direct replies — Read more / Contribute
by dasibre
on Sep 29, 2016 at 15:22

    New to perl trying to fix zombie process bug.I have a script that makes a timed system call in an eval block.

    my $timeout = timed_system_call("/subprocess_a.rb", 3); if ($timeout == 0) { print "success\n"; } else { print "timed out\n"; } sub timed_system_call { my $command = shift; my $timeout = shift; # seconds my $alarm = 0; eval { local $SIG{ALRM} = sub {$alarm = 1; kill -15, $child_pid;}; alarm $timeout; system($command); alarm 0; }; die $command . " failed with " . $@ if $@ && $alarm != 1; #if alar +m is not 1, then something else caused exit e.g(ctrl-c) alarm 0; return $alarm; }
    I need to, ensure the system() call subprocess is killed after timeout, without killing the parent process. On timeout the program should continue to the else block.
Loop through array or filehandle
3 direct replies — Read more / Contribute
by markdibley
on Sep 29, 2016 at 09:57
    Hello

    I have a subroutine that loops through some data in a file until it has found the correct outcome and then stops. I simply pass a filehandle reference to the subroutine to do it.

    However, I now have a new source of data that gives it to me in a smaller string. To apply the some routine I just need to split the data by \n and could pass the array reference to the subroutine.

    But is there a way to write a subroutine that takes either a filehandle ref or array ref to loop through?

    I am looking for a way that does not involve reading the whole file into a string (although I admit this may because I am old and started programming computers that had never heard of GB memory).

    Is it possible?
MCE -- how to know which function to use
1 direct reply — Read more / Contribute
by 1nickt
on Sep 28, 2016 at 17:57

    Hello all,

    I've read through the docs and made some experiments with basic usage of MCE, but I'm not sure if I'm barking up the wrong tree.

    I'm unclear on:

    • When to use mce_loop() vs. mce_map() vs. MCE::Shared
    • How to know how many workers to set as max

    I have an arrayref of hashrefs, and am outputting an arrayref of hashrefs. Processing each hashref is quite slow: takes about 0.1s. There are 7,500 hashes in the arrayref; that could grow to some tens of thousands.

    The code is running on an Ubuntu AWS instance whose lscpu outputs:

    Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 1

    The MCE manager is splitting the array into 25 chunks using 'auto'.

    I am seeing almost no difference in time taken to execute using MCE versus a sequential foreach loop, in fact the sequential loop appears faster, which I would not have expected.

    Although the CPU usage looks quite different:

    Benchmark: timing 5 iterations of MCE loop , MCE map , Se +quential loop... MCE loop : 83 wallclock secs ( 3.28 usr 0.19 sys + 30.94 cusr 2 +4.89 csys = 59.30 CPU) @ 0.08/s (n=5) MCE map : 75 wallclock secs ( 4.48 usr 0.28 sys + 41.29 cusr 3 +7.93 csys = 83.98 CPU) @ 0.06/s (n=5) Sequential loop: 76 wallclock secs (37.79 usr + 28.91 sys = 66.70 CPU) + @ 0.07/s (n=5)

    Am I missing something obvious? Or non-obvious? Doing something wrong? Can anyone shed any light please?


    The way forward always starts with a minimal test.
Parsing NE Command Line Arguments
2 direct replies — Read more / Contribute
by Scully4Ever
on Sep 28, 2016 at 10:45

    I wrote a script that uses Getopt::Long to get the command line arguments. Is there a way to allow the user to use “!=“ in addition to "=“ so the user could for example request data not equal to year 2016 or data equal to year 2016?

One-liner's quoting and subroutine
3 direct replies — Read more / Contribute
by reisinge
on Sep 27, 2016 at 01:42

    Dear Monks :-), I have two questions related to the following one-liner:

    find /opt/splunk/syslog/ -iname "*log*" -type f -mtime +30 | perl -wne + 'BEGIN { $sum = 0 }; chomp; $sum += (stat)[7]; END { print "$sum\n" +}'

    1) How should I quote it (escape the single quotes) when I want to execute it on a remote machine via SSH? Ex. ssh root@HOST 'ONE-LINER'

    2) Is there is a (smart) way to incorporate the following subroutine into the one-liner?

    sub scaleIt { my $size_in_bytes = shift; return unless defined $size_in_bytes; my ( $size, $n ) = ( $size_in_bytes, 0 ); ++$n and $size /= 1024 until $size < 1024; return sprintf "%.0f%s", $size, (qw[ B KB MB GB TB ])[$n]; }

    You can talk and talk and have great ideas, but if you don't have a way of sharing those ideas then you are just saying it in you own bubble. -- Tom Limoncelli
Putting an SQLite DB file into memory
3 direct replies — Read more / Contribute
by stevieb
on Sep 25, 2016 at 18:56

    Perlmonks is usually slow on weekends, so I thought I'd fire off another question.

    I have an SQLite database that will be used to store data every 3-10 seconds. This will be happening on a Raspberry Pi, so the 'disk' is an SD card. What I'm hoping to do is load that file into memory, and use it there, then create an event to write it to disk every X minutes, making it permanent (if data is somehow lost in the meantime, it isn't critical).

    Everything writing to the DB will be in Perl, and all processes will be within a single process umbrella.

    Is this a Linux question, or can this somehow be done (copy the db file to memory) within Perl, at the onset of my application load? If so, can I please get some recommendations on a practical way of doing this, or perhaps pointers to modules that may already do this?

shared array while use ForkManager
1 direct reply — Read more / Contribute
by mlin
on Sep 25, 2016 at 05:21
    Hi all,

    I am writing a script using fork to speed up my process of N files. It works well now. However, I need output information pushed in an array in my serial process, which will be used later. Will there be some conflict while I use fork? It looks like this:
    foreach my $i (@group) { $pm->start and next; foreach my $term (@$i) { warn "No such file. - $term\n" and next if ! -e $term; ... $IsGood = ...; our @quality; push @quality $IsGood; } $pm->finish; } $pm->wait_all_children;
New Meditations
Don't post bad code!
5 direct replies — Read more / Contribute
by afoken
on Sep 26, 2016 at 16:46

    The thread Declaring and checking content of variables with consecutive names and especially the answer Re: Declaring and checking content of variables with consecutive names triggered this meditation.

    There is nothing special about this thread or this answer, it's just one of those FAQs (in this case "How can I get variable variable names"). People explain how to do the job properly, and some other people can't resist showing that there is a way how to force perl into using the stupid variant. I vagely remember threads where people tried to find even more stupid ways to "solve" a cleanly solveable problem.

    Yes, we can force perl into doing the most stupid things. Yes, it's cool to know how to mess with the inner workings of perl. But no, we should not show beginners the most dirty ways first. Not even with warnings not to use the dirty ways in production code. "Just do as I say, don't do as I do" is no good motto, not for teaching beginners.

    Why? Because there are lots of beginners out there, who either don't have time to or are unwilling to learn how to use perl. They just want fast results, they don't care about maintainability or improving their skills. Imagine what happens when they get ten answers linking to FAQs, HOWTOs, documentation, or showing examples of the right way, but require a little bit of thinking; and one or two answers showing how to abuse perl into the way they are currently thinking.

    "There is nothing humans would not do to avoid thinking."
    -- Found on a pinboard in a computer laboratory in my univerity

    This way, bad practices propagate, resulting in crappy perl scripts.

    Let's make it hard for the unwilling and the people in a hurry to find bad code, bad examples. Posting sane examples is good, improving a code example posted is even better.


    But what about golf?

    Perl golf is fun, true. Replacing pages and pages of code with 20 characters of "line noise" with the same result is deeply impressive. But do we have to show our golf skills in beginner threads? I don't think so. Create a new thread, link to the beginner's thread, and name it "Golf Challenge". Or at least start a golf posting by explaining that this is not a real answer, but a golf challenge for the experts.


    Are one-liners bad?

    (A meditation in a meditation)

    It depends. Short one-liners for one-time use are ok. But for anything more complex than one or two, maybe three explicit instructions (not counting the implicit loops in perl -p and perl -n) should be in a script. And if the one-liner is to be reused, it should instead be a script, too.

    Why?

    • One-liners require that you remember quite complex "line noise". So you very likely store them in a file. What's the difference to using a real script? It's just harder to use, as you need to copy-and-paste.
    • Quoting rules differ with different shells (and different operating systems), so you need to adapt the one-liners, especially the arguments to perl -e and perl -E, to the shell currently in use. Scripts don't have this problem.
    • Scripts can have nice names. cleanup-foo is easier to remember, easier to type, and shows the intention more clearly than perl + 10 perl command line options + -e plus 80 characters of "line noise" in quotes differing from shell to shell.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2016-10-01 09:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Extraterrestrials haven't visited the Earth yet because:







    Results (575 votes). Check out past polls.