Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Pattern Matching Confusion

by Anonymous Monk
on Aug 11, 2005 at 13:22 UTC ( #482948=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks:

I'm having a problem trying to pattern match. What I'm trying to do, is look through a file containing records, each record contains multiple XML tags. I want to find the record containing a tag that has data of a specific length (in this case, 58 characters).

My basic code is:
# The filename should be the first arguement $file = $ARGV[0]; # The search string should be the second arguement $searchString = $ARGV[1]; $matchesFound = 0; if (-e $file) { open (INPUTFILE, "<$file") or die "Can't open $file for input.\n"; while ($record = <INPUTFILE>) { $_ = $record; if (/$searchString/g) { print $record; $matchesFound = $matchesFound + 1; } } close INPUTFILE; print "$matchesFound found.\n"; } else { print "$file doesn't exist.\n"; }

The problem is, that I'm not quite sure exactly what expression I should be feeding to the script. Is there a way to find, say, 58 matches of a alphanumberic or space?

Thanks in advance for your help.

Replies are listed 'Best First'.
Re: Pattern Matching Confusion
by DrWhy (Chaplain) on Aug 11, 2005 at 13:49 UTC
    If you are only looking for any item that is 58 characters long among XML tags, you could use the following regular expression:

    />[a-zA-Z0-9 ]{58}</

    And forget about passing in a search string as a command line argument.

    In this case it may or may not be useful to actually parse the XML, but you might explore XML::Simple to see if it would help you here.


    "If God had meant for us to think for ourselves he would have given us brains. Oh, wait..."

Re: Pattern Matching Confusion
by bofh_of_oz (Hermit) on Aug 11, 2005 at 14:04 UTC
    Your code is trying to match a specific string, not a string of certain length. If you want any string between any two tags(the same tags) that is of specific length, then use a regex like this:

    if ($record =~ /<(.+)>([^<>]{58})<\/\1>/g) {print "$2\n";}

    An idea is not responsible for the people who believe in it...

Re: Pattern Matching Confusion
by Codon (Friar) on Aug 11, 2005 at 17:03 UTC
    I'd first recommend using some XML parser. Check CPAN for one that will easily handle the XML structure that you are looking at. Some are better than others for deep data structures, others are better at flatter data structures; some parse the entire document as a whole (loads the whole thing into memory), others parse it as you go. TMTOWTDI.

    On a completely separate note, but attempting to pass on more general Perl wisdom, you have a couple of lines of code that literally make me cringe.

    while ($record = <INPUTFILE>) { $_ = $record; if (/$searchString/g) { print $record; $matchesFound = $matchesFound + 1; } }
    Here, you explicitly assign the next line in your file to a variable, then assign that variable to $_ (the default variable) so you can do a pattern match on the default variable and then print your explict variable. You only need one of these two variables. Since you do not need to do anything specific with $record, you could just as easily do this:
    while (<INPUTFILE>) { ++$matchesFound && print if (/$searchString/); }
    This will automatically assign to $_. print() will default to $_. You will increment $matchesFound and print $_ if $_ matches you pattern. Also note that you do not need the /g modifier since you do not need to find all matches, just any match.

    Another way to do this (TMTOWTDI) is to assign the line to a real variable (as you do) but then use the =~ operator to pattern match against that variable:

    while ($record = <INPUTFILE>) { if ($record =~ /$searchString/) { print $record; $matchesFound++; } }

    Ivan Heffner
    Sr. Software Engineer, DAS Lead, Inc.
Re: Pattern Matching Confusion
by kirbyk (Friar) on Aug 11, 2005 at 17:14 UTC
    Also, standard advice: insert at the top of all your perl files:
    use strict; use warnings;
    It won't be a problem here, but it's going to save you a lot of time and headaches later, so build the habit now.

    -- Kirby,

Re: Pattern Matching Confusion
by phaylon (Curate) on Aug 11, 2005 at 13:31 UTC
    Is there a reason you don't want to use a parsing module for XML? There should be something on CPAN to fit your needs.

    Ordinary morality is for ordinary people. -- Aleister Crowley
Re: Pattern Matching Confusion
by newroz (Monk) on Aug 11, 2005 at 14:09 UTC

      You're trying to link to the length function at That site is tremendously unstable these days, which is why the Monsatery's POD linking mechanism defaults to Using square brackets you can link to the pod like this: [doc://length] and get this result: length. Do be sure to spell it right when you link.

      if (length($var) eq 58) { #do something }

      ...should use ==, not eq since you're doing a numeric comparison.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://482948]
Approved by Limbic~Region
[tye]: my experience with python so far is more like finding a moderately useful error message is a shocking surprise.
Corion disappears
[tye]: I get an error string but no reason and no context. Like die "Could not read file.\n" but with a stack trace where all of the useful context is stripped.
[thezip]: Heya tye!
[tye]: hey, thezip!
[RonW]: Hi, tye, thezip
[RonW]: I see useless errors liek that from Perl programs, too.

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (10)
As of 2017-09-21 20:21 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (252 votes). Check out past polls.