Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: Need Speed:Search Tab-delimited File for pairs of names

by kcott (Chancellor)
on Dec 16, 2013 at 19:21 UTC ( #1067362=note: print w/replies, xml ) Need Help??

in reply to Need Speed:Search Tab-delimited File for pairs of names

G'day mnnb,

Welcome to the monastery.

I suspect you're very new to Perl and have guessed at most of the code you've posted here. Various inconsistencies also suggest you've borrowed code from other sources without understanding what they do.

Commenting out strict and warnings is a big mistake: it doesn't fix the reported issues; you've simply stuck your head in the sand and pretended they're not there.

I suggest you read perlintro. Follow the links you find there for further information on the various topics that are relevant to your current task and ask here if you don't understand.

From your problem description, here's how I might have tackled it.

#!/usr/bin/env perl use strict; use warnings; my @match_letters = 'A' .. 'E'; my @argv = qw{q w a s z x qwe wer zxcvb xcvbn}; my @re_pairs = map { [ qr{$argv[$_ * 2]}, qr{$argv[$_ * 2 + 1]} ] } 0 +.. 4; while (<DATA>) { my $first_col = (split /\t/)[0]; my $match_code = ''; for my $i (0 .. 4) { if ($first_col =~ $re_pairs[$i][0] && $first_col =~ $re_pairs[ +$i][1]) { $match_code .= $match_letters[$i]; } } print "$match_code: $_" if length $match_code; } __DATA__ qwerty blah1 asdfgh blah2 zxcvbn blah3


AD: qwerty blah1 B: asdfgh blah2 CE: zxcvbn blah3

As you can see, I've dummied up file and command line input and have only produced basic output. This may not be exactly what you want but should provide some direction: note, for instance, that I've only captured the first column not every tab-separated element; used a for loop instead of your deeply nested if statements; and, printed the original line read rather than attempting to recreate it from the split elements.

[For subsequent questions, please follow the guidelines in "How do I post a question effectively?": a better question gets better answers.]

-- Ken

Replies are listed 'Best First'.
Re^2: Need Speed:Search Tab-delimited File for pairs of names
by Laurent_R (Canon) on Dec 16, 2013 at 19:48 UTC

    Hi Ken,

    your code is obviously much shorter and cleaner than the original post, but using regexes rather than the index function is rather unlikely to improve performance, which is the OP's primary request. Or did I miss something?

      The journey to a better program (for some definition of 'better', in this case faster) begins with a program that works and that one can understand. As suggested elsewhere, the OP code is a spaghetti monster that dare not enable strictures and warnings lest it reveal a host of naughty practices and lurking bugs.

      kcott's shorter and cleaner code, assuming it actually does what mnnb wants, is much more likely to be a good starting point for improvement. I haven't studied it closely, but it seems to me that the regexes, if insufficiently speedy, could fairly easily be replaced by the use of index. In any event, while the use of regexes will not improve performance, it is also unlikely, IMHO, to significantly degrade it versus index in this case. But only benchmarking will determine the trade-offs.

      Update: Minor wording changes; no semantic change.

        I second the notion that regular expressions are a better choice, especially using precompiled patterns.

        I vaguely recall that a RE serach without metacharacters should be fast. There is a short statement implying this in my camel book in the Efficiency section.

        You can always do some performance benchmarking to verify.

        I definitely agree with you, AnomalousMonk, and my very first comment in my post above was that kcott's code was much cleaner and shorter.

      You are quite correct in that I haven't addressed mnnb's primary request; however, I did state my intention: "This may not be exactly what you want but should provide some direction: ...".

      There were so many issues with the posted code (e.g. "sub name_search(@_, $search_string) { ... }" and "$run_time = time() - our $start_run;") that I chose not to attempt to make this code (in its present form) faster as that didn't seem like a useful exercise.

      Beyond that, I can only echo what ++AnomalousMonk wrote in the first response to your comment.

      -- Ken

Re^2: Need Speed:Search Tab-delimited File for pairs of names
by educated_foo (Vicar) on Dec 17, 2013 at 03:48 UTC
    Commenting out strict and warnings is a big mistake: it doesn't fix the reported issues;
    Why not? I can't diagnose this, because I'm not running on Windows, but it seems like you're chanting boilerplate at the questioner. Which of his mistakes would be caught by adding those strings?

      The bad sub prototype and the interesting use of 'our'. That is assuming that variables are declared appropriately.

      True laziness is hard work

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1067362]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2018-06-24 09:46 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.