Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Counting the start and end elements in a line

by rsriram (Hermit)
on Jun 27, 2006 at 10:06 UTC ( #557749=perlquestion: print w/replies, xml ) Need Help??

rsriram has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am writing a script, in which I have to check whether the number of start and end elements match with each other in a tagged file. I have written a function in my script as:

sub counttags()
{
   $stag=0;
   $etag=0;
   while ($_ =~ /<$_[0]>/g) {$stag++}
   while ($_ =~ /<\/$_[0]>/g) {$etag++}
   if($stag != $etag)
   {
      print "Number of Start and End tags for element $_[0] does not match";
   }
}

Using this, I have to match for elements I, B etc. But I have two element + and -. I am calling this function inside the program,

counttags(I);
counttags(B);

There is no problem with this part, but if I call the + and - in the same way, it is not working. Can anyone tell me what is wrong in this?

Replies are listed 'Best First'.
Re: Counting the start and end elements in a line
by bart (Canon) on Jun 27, 2006 at 10:33 UTC
    Can anyone tell me what is wrong in this?
    quotemeta.

    You're probably trying to use special metacharacters for regexes as normal matching characters. BTW you should limit the scope of those variables. And don't use sub prototypes!! (the parens)

    You can do this:

    sub counttags { my $stag=0; my $etag=0; my $tag = quotemeta($_[0]); while ($_ =~ /<$tag]>/g) {$stag++} while ($_ =~ /<\/$tag>/g) {$etag++} if($stag != $etag) { print "Number of Start and End tags for element $_[0] does not m +atch"; } }
    And call you sub with quoted strings, not barewords.

    But as a test, this isn't sufficient. You probably should check whether your tags are balanced, not tag soup. I think you'd best use a stack.

    sub check_tags_balance { my @tagstack; while(/<(\/)?([^>]+)>/g) { unless($1) { # opening tag push @tagstack, $2; } else { # closing tag unless(@tagstack and $2 eq pop @tagstack) { print "Found an unbalanced closing tag $2\n"; return; } } } if(@tagstack) { print "Missing closing tags for @tagstack\n"; } else { print "Tags balanced.\n"; } } $_ = "I assume <b>everything</b> is <i><b>ok</b></i>"; check_tags_balance();
Re: Counting the start and end elements in a line
by GrandFather (Sage) on Jun 27, 2006 at 10:27 UTC

    There are a couple of problems with your sub. You say it takes no parameters with sub counttags() and Perl will enforce that (if it notices at the right time). But you use $_[0] which accesses the first parameter.

    You use $_ but don't give it a value anywhere.

    The common wisdom is that parsing markup is hard and if it conforms to HTML or XML specifications you are much better to use one of the numerous modules designed for the purpose such as XML::Twig or HTML::TreeBuilder. All that aside, the following clean up of your code and expansion into a runable sample may be a good basis for rephrasing your actual question with some sample code to demonstrate your problem:

    use strict; use warnings; my $str = '<start> </begin> </this> </that> <begin> </start>'; counttags ($str, 'begin'); sub counttags { my ($test, $target) = @_; my $stag = $test =~ /<$target>/g; my $etag = $test =~ /<\/$target>/g; if($stag != $etag) { print "Number of Start and End tags for element $target does n +ot match"; } }

    Prints:

    Number of Start and End tags for element begin does not match

    DWIM is Perl's answer to Gödel
Re: Counting the start and end elements in a line
by davorg (Chancellor) on Jun 27, 2006 at 10:15 UTC
    it is not working

    Please give more information. "It is not working" isn't very helpful. What happens? Do you get the wrong answer? Does the program not compile? Does the program crash? Give an error message? Or does your computer burst into flames?

    If you're really calling your function like this:

    counttags(I); counttags(B);

    then I expect you'd get a warning about barewords under use strict. If you go on to use:

    counttags(+); counttags(-);

    Then I strongly suspect that it won't even compile (I haven't tried it). You probably want to rewrite these lines as:

    counttags('I'); counttags('B'); counttags('+'); counttags('-');

    And if that fixes it (which it won't for the reasons that rminner points out), then you should probably get into the habit of using use strict and use warnings in your programs.

    Of course, this is all guesswork as I don't really have enough information to go on.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      When I call the function

      counttags(+);

      When the script is compiled, I get a error,

      Syntax error at vtag.pl at line 275, near "+)"
      Execution of vtag.pl aborted due to compilation errors.

      On the otherhand, if I have the code as counttags("+");, the $stag and $etag values remain 0.

      For you to understand better, the content in the file goes like:
      Die Errichtung <I>einer</I> Zweigniederlassung ist durch die Geschäftsführer anzumelden. Der Anmeldung ist eine öffentlich beglaubigte Abschrift des Gesellschaftsvertrages und der Liste der Gesellschafter beizufügen.<+>2</+>

      In my function, I am trying to find the number of <I> and </I>, <+> and </+> and check whether the opening and the closing tag count matches.

        When I call the function

        counttags(+);

        When the script is compiled, I get a error,

        Syntax error at vtag.pl at line 275, near "+)"
        Execution of vtag.pl aborted due to compilation errors.

        Yes. That's what I expected. You can't just drop an operator in the middle of some source code at random and expect the compiler to know what to do.

        On the otherhand, if I have the code as counttags("+");, the $stag and $etag values remain 0.

        This is probably because of one of the other errors that have been pointed out to you in this thread. Developing with use strict and use warnings will almost certainly help you track down problems like this.

        --
        <http://dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

Re: Counting the start and end elements in a line
by rminner (Chaplain) on Jun 27, 2006 at 10:13 UTC
    At least the plus has a special meaning in a regex, therefore you should quote your variables by putting an \Q in front of them (if you want to match the string literally):
    sub counttags() { $stag=0; $etag=0; while ($_ =~ /<\Q$_[0]>/g) {$stag++} while ($_ =~ /<\/\Q$_[0]>/g) {$etag++} if($stag != $etag) { print "Number of Start and End tags for element $_[0] does not m +atch"; } }
Re: Counting the start and end elements in a line
by Fletch (Chancellor) on Jun 27, 2006 at 11:59 UTC

    Syntax and quotemeta issues aside, this is kind of a meaningless check. Your code will accept </B>foo<B> which (presuming this is some form of (SG|X)ML) I'm guessing isn't at all valid (let alone well formed :). Perhaps another tool would be better suited for this?

Re: Counting the start and end elements in a line
by esskar (Deacon) on Jun 27, 2006 at 11:47 UTC
    well, it does not directly answer your question, but please use "use strict;" and also "use warnings;"

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://557749]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2019-11-20 23:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Strict and warnings: which comes first?



    Results (103 votes). Check out past polls.

    Notices?