Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Matching specific strings (are subs a good idea?)

by Laurent_R (Canon)
on Sep 07, 2017 at 20:21 UTC ( [id://1198886]=note: print w/replies, xml ) Need Help??


in reply to Matching specific strings (are subs a good idea?)

Hi zarath,

in addition to what my fellow monks have already said above (with which I totally agree), your algorithm is very inefficient. Basically, for every single XML record in your XML files, you're reading sequentially the full QR file (and spend quite some effort decoding many times each of its lines). If your XML and/or QR file are large, this will take ages.

The best way to solve this type of problem in Perl is usually to start by reading the QR file and decoding its lines only once, to store its content in a hash and to close it. Then only you read the XML file and look up for names, first name, and email in the hash to get the missing information (birth date). Hash lookup is very fast.

There are various ways to store the data into a hash, but the simplest implementation for a beginner of the hash would probably be to concatenate first name, name and email (with separators) into a string to form the keys of the hash and to store the birth date as the value. Maybe something like this:

( "john|doe|john.d@hismail.com" => 20020924, "liz|schmoe|lizschmoe@hermail.uk" => 20040318, ... )
Then, when you read the XML file, you pick up first name, name and email address, construct a string the same way you built the hash keys, and use it to lookup the hash; you'll find instantly the birth date that you need to add to the output XML files.

With relatively large files, this will be literally orders of magnitude faster than your repeated loops through the QR file. And the code will actually be simpler.

I can't show you this in detail right now because you haven't provided data examples. Please provide a sample of the QR file (just a few lines), and I (or some other monks) will be happy to show you how to do that. We'll also most probably be able to show you much simpler ways to to extract the relevant information from the lines of the QR file (what I called "decoding the lines" above).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1198886]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-04-23 22:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found