http://www.perlmonks.org?node_id=479

If you have a question on how to do something in Perl, or you need a Perl solution to an actual real-life problem, or you're unsure why something you've tried just isn't working... then this section is the place to ask. Post a new question!

However, you might consider asking in the chatterbox first (if you're a registered user). The response time tends to be quicker, and if it turns out that the problem/solutions are too much for the cb to handle, the kind monks will be sure to direct you here.

User Questions
reading input into a number of arrays
2 direct replies — Read more / Contribute
by Dannypje
on Feb 25, 2020 at 10:57
    Hi,

    probably the code below is not the best way to do it, but it's something I understand (or at least I thought I understood). Intention is to read 3 lines of characters (say 1111111, 2222222, 3333333) and store them in a 2 dimensional array $a(1,1) through $a(3,7) (I know, I know, I should start from 0, but I don't think that's the issue). The thing is, when I print the array _inside_ the loop, I nicely get 1111111, 2222222, 3333333 as output. However, when I try to print outside the loop, I get 3 times my last input (3333333, 3333333, 3333333). I don't understand how this happens. Please enlighten me. TIA
    Please note, the square brackets around the indices were lost in translation somewhere, the syntax is hence not the problem.

    for ($i=1;$i<=3;$i++)
    {
    $ingang=<>;
    chomp $ingang;

    ($a$i,1,$a$i,2,$a$i,3,$a$i,4,$a$i,5,$a$i,6,$a$i,7)=split('',$ingang);
    }

    print "Resultaat\n";

    for ($p=1;$p<=3;$p++)
    {
    for ($j=1;$j<=7;$j++)
    {
    print $a$p,$j;
    }
    print "\n";
    }

ne vs. ! eq
1 direct reply — Read more / Contribute
by boerni
on Feb 25, 2020 at 08:34
    Hey Monks I never realized perl behaves like this. There is a difference between (! 'a' eq 'b') and ('a' ne 'b').
    #!perl use strict; use warnings; use Data::Dumper; my $s1 = 'bla'; my $s2 = 'blu'; my $r = $s1 eq $s2; print Dumper $r; if (! $s1 eq $s2) { print "$s1 and $s2 are not the same\n"; } else { print "$s1 and $s2 are the same\n"; } exit 0;
    prints:
    $VAR1 = ''; bla and blu are the same

    The "correct" ($s1 ne $s2) works as expected. But why does (! $s1 eq $s2) not work?

    Probably there is a simple explanation but I don't know it... Maybe one of you Monks can explain this.

    Thank you

Comments regex
1 direct reply — Read more / Contribute
by kepler
on Feb 25, 2020 at 04:45

    Hi

    I'm trying to make a simple perl code that reads a code file and removes all comments (javascript style), like //... or /*... */ and /*....new line (number of time unknown)

    Can someone give me an hand? Removing patterns like // or /*...*/ can be done, I think, with /\/+\*?\.*/gi but there are weird things on a random code. For instance, line breaks or the lines beginning with spaces or tabs. So I could do /([\ |\t]*)\/+\*?\.*/gi perhaps

    Best regards

Filtering out stop words
5 direct replies — Read more / Contribute
by IB2017
on Feb 25, 2020 at 04:43

    Hello

    I used to check if a work needs to be exluded from processing checking if it is contained in a stop words list. I used this method:

    my $CkDiscardCommonwords=1;#check if use stopwords or not my $term="word"; my $commonwordsRX = loadCommonWords (); if ($CkDiscardCommonwords eq 1){ if ($term =~ /^(?:$commonwordsRX)$/){ return (0); } } sub loadCommonWords { my @commonwords; my $filename="commonWords.txt"; if (open $FH, "<:encoding(UTF-8)", $filename) { while (my $line = <$FH>) { chomp $line; push @commonwords, $line; } close $FH; } my $commonwordsRX = join "|", map quotemeta, @commonwords; return $commonwordsRX; }

    Now my sooftware has changed and the list of common words saved in commonWords.txt may grow exponencially. It used to be small (~300 words), now it could reach x-thousands.

    I would like to hear what expert monks think about this implementation. Would a Regex constructed in this way cause problems when it grows? Should I choose another approach?

Converting XLSX to CSV with Perl while maintaining the encoding
3 direct replies — Read more / Contribute
by AALB
on Feb 25, 2020 at 02:42

    I'm a BI developer working with perl scripts as my ETL - I receive data over email, take the file, parse it and push it into the DB. Most of the files are CSV, but occasionally I have an XLSX file.

    I've been using Spreadsheet::XLSX to convert, but I've noticed that the CSV output comes out with the wrong encoding (needs to be UTF8, because accents and foreign languages).

    That's the sub I'm using ($input_file is an Excel file), but I keep getting the data with the wrong characters.

    WHAT am I missing?
    Thanks a lot all!

    sub convert_to_csv { my $input_file = $_[0]; my ( $filename, $extension ) = split( '\.', $input_file ); open( format_file, ">:**encoding(utf-8)**", "$filename.csv" ) or d +ie "could not open out file $!\n"; my $excel = Spreadsheet::XLSX->new($input_file); my $line; foreach my $sheet ( @{ $excel->{Worksheet} } ) { #printf( "Sheet: %s\n", $sheet->{Name} ); $sheet->{MaxRow} ||= $sheet->{MinRow}; foreach my $row ( $sheet->{MinRow} .. $sheet->{MaxRow} ) { $sheet->{MaxCol} ||= $sheet->{MinCol}; foreach my $col ( $sheet->{MinCol} .. $sheet->{MaxCol} ) { my $cell = $sheet->{Cells}[$row][$col]; if ($cell) { my $trimcell; $trimcell = $cell->value(); print STDERR "cell: $trimcell\n"; ## Just for the +tests so I don't have to open the file to see if it's ok $trimcell =~ s/^\s+|\s+$//g; ## Just to make sure + I don't have extra spaces $line .= "\"" . $trimcell . "\","; } } chomp($line); if ($line =~ /Grand Total/){} ##customized for the files else { print format_file "$line\n"; $line = ''; } } } close format_file; }
perl imlementation of IPFS
No replies — Read more | Post response
by igoryonya
on Feb 25, 2020 at 01:55
    Hello, I am wondering, if anybody knows of attempts of implementing or binding an IPFS (InterPlanetary File System) in perl?
    https://ipfs.io/
Leaking a file descriptor into a child to use with /proc/self/fd/3
4 direct replies — Read more / Contribute
by ewheeler
on Feb 24, 2020 at 19:42

    We are trying to open a file descriptor and access it from a forked + exec child. It works in bash as follows:

    # exec 3< /etc/passwd # grep root /proc/self/fd/3 root:x:0:0:root:/root:/bin/bash

    When we test with perl using the code below, grep complains about fd/3 not existing.

    open(my $in, '/etc/passwd'); if (!fork()) { exec("grep", "root", "/proc/self/fd/" . fileno($in)); } close($in);

    What am I missing, is fork or exec closing the leaked file descriptor?

    -Eric

Finding dates from web pages
3 direct replies — Read more / Contribute
by cormanaz
on Feb 24, 2020 at 14:46
    Greetings monks. I have a bunch of URLs of news articles and need to get the publication dates from these, if available. There is a python library designed especially for this purpose. I'm wondering if there is any similar Perl module. I've searched around and the only thing I found was Web::Scraper which would take quite a bit of rules development to do the job. Am hoping maybe someone has done that work already.
HTML::HTML5::Parser weirdness
2 direct replies — Read more / Contribute
by djh
on Feb 23, 2020 at 11:06

    I'm trying to use HTML::HTML5::Parser to parse some HTML pages I stored in files. My program seems to work just fine except with one file and I'm baffled as to what's happening. My program sits in a loop processing files from a list. I've added debugging so it prints the name of the file and then a dump of the document as parsed and then it goes on to process the document except in this one case. So the relevant bit of code is:

    for my $filename (@files) { print "$BASE_DIR$filename\n"; #next if $filename eq '2020-02-17-00:10:01.html'; my $doc = $parser->parse_file($BASE_DIR . $filename); print "doc=", Dumper($doc), $doc->toString;

    and the output for the problematic file is:

    {my home directory}/met-office-datahub/met-office-forecasts/2020-02-17 +-00:10:01.html doc=$VAR1 = bless( do{\(my $o = '93912432739248')}, 'XML::LibXML::Docu +ment' ); <?xml version="1.0" encoding="windows-1252"?> <html xmlns="http://www.w3.org/1999/xhtml"><head/><body/></html> Can't call method "toString" on an undefined value ...

    Now I've checked the contents of that file and it actually starts (just like all the others):

    <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"/> <meta http-equiv="X-UA-Compatible" content="IE=Edge">

    I can't figure out where that strange alleged file contents is coming from or why it affects just that file. In particular the weird <head/> and <body/> tags. I've searched for those strings in my home directory and in /usr/lib/perl5 and done a web search but haven't found anything.

    So I'd be very grateful if anybody has any ideas on techniques to figure out what the problem is, or happens to recognize it :)

CGI URL simple
2 direct replies — Read more / Contribute
by Anonymous Monk
on Feb 21, 2020 at 13:23

    Hello

    I have a very naive question about CGI. I have a small CGI script that generates a basic HTML page. The script is called by a URL such as

    http://mydomain.com/cgi-bin/mobile/generateHTML.pl

    Everything works fine. What I do not like is that in the URL bar I see this monster URL (http://mydomain.com/cgi-bin/mobile/generateHTML.pl). Is there any way to maybe reduce it to http://mydomain.com or something similar (of course without changing page)?


Add your question
Title:
Your question:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":


  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.