Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Search for account number in a file name

by kcott (Archbishop)
on Jun 21, 2013 at 01:31 UTC ( [id://1040050]=note: print w/replies, xml ) Need Help??


in reply to Search for account number in a file name

For that specific task, I'd probably use index rather than a regex.

if (index($filename, $account) >= 0) { print $filename."\n"; } else { print "No match for $account\n"; }

One consideration may be where $account appears in $filename.

You can Benchmark if speed is important.

By the way, the code you posted doesn't compile: you possibly meant \b instead of /b (which gives the syntax error: Unknown regexp modifier "/b" ...); however, fixing that gives: No match for 7766541.

-- Ken

Replies are listed 'Best First'.
Re^2: Search for account number in a file name
by Anonymous Monk on Jun 21, 2013 at 02:19 UTC
    Would "index" be faster than using regular expression?
    Yes I meant "\b \b".
      Would "index" be faster than using regular expression?

      That would depend on a number of factors. I provided the Benchmark link so that you could determine this for yourself.

      Before worrying too much about speed, ask yourself how important that is. If you can process all your data in 100ms, how much effort are you prepared to put in to get it to run in, say, half that time; and would anyone notice the difference. What are you doing with the results? Sending them to a terminal, a file, a database, a printer, across a network: all of these will probably take much longer than any processing occurring in the CPU.

      If you're just looking for a function that searches for one string inside another, that's what index does and what I'd probably choose for that task; if patterns are involved, that's what a regexp engine does and, in that case, that's what I'd probably use.

      If you do decide to optimise, you need to start with a regex that works correctly and consistently. toolic has provided code (in Re: Search for account number in a file name) that returns a correct result for your single example based on $account appearing anywhere within $filename. You didn't specify anything beyond this; however, I raised the issue that its position might be meaningful.

      The regexp engine will typically find an anchored pattern faster than an unanchored one. If you know that $filename will always start with zero or more 0s immediately followed by $account, then you can write /^0*$account/ which would probably be faster than /$account/; if you know it will always start with exactly three zeros, then /^000$account/ may be faster still.

      Similarly, you'll need to look at the code you're using with index. If you have information regarding the position of $account, then maybe, instead of index($filename, $account) >= 0, you'd want index($filename, $account) >= 3 or index($filename, $account) == 3 or something else. Perhaps you'd use the optional third argument: index($filename, $account, $position).

      When you have two (or more) pieces of code that are working correctly, then you can compare them. That's when you'd use Benchmark.

      -- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1040050]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-24 12:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found