Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

section nesting

by anniyan (Monk)
on Jul 26, 2005 at 04:20 UTC ( #478066=perlquestion: print w/ replies, xml ) Need Help??
anniyan has asked for the wisdom of the Perl Monks concerning the following question:

I am new to perl. I am doing a validation tool to check the section nesting. If no proper nesting, error should be thrown. Yesterday i posted the same question asked assistance with twig coding. Is it possible to find the proper section nesting without using XML::Twig? If so please assist me.

$q ='<sec id="00005"><no>1.1</no> sfasfasdfadsfsdaf <sec id="00010"><no>1.1.1</no> sahfjsahdfasddfj </sec <sec id="00015"><no>1.1.2</no> sahfjsahdfasddfj <sec id="00020"><no>1.1.2.1</no> sahfjsahdfasddfj </sec> <sec id="00025"><no>1.1.2.2</no> safksajdklfasd </sec> </sec> <sec id="00015"><no>1.1.3</no> sahfjsahdfasddfj </sec> </sec> <sec id="00005"><no>1.2</no> sfasfasdfadsfsdaf ........';

Regards,
Anniyan
(CREATED in HELL by DEVIL to s|EVILS|GOODS|g in WORLD)

Comment on section nesting
Download Code
Re: section nesting
by GrandFather (Cardinal) on Jul 26, 2005 at 04:41 UTC

    It's still not clear what you are trying to achieve. Do you want to validate that the XML is well formed (regardless of the content), or do you want to check that the content is consistent with the structure (so that 1.1.1 is nested two levels for example), or do you want to check that there is some semantic consistency (for example between id="xxx" and <no>yyy</no>)?

    Can you provide an example including well formed data and bad data with the output you wish to generate?


    Perl is Huffman encoded by design.

      GrandFather thanks for your assistance. Here the section closings are correctly nested.

      <sec id="00005"><no>1.1</no> sfasfasdfadsfsdaf <sec id="00010"><no>1.1.1</no> sahfjsahdfasddfj </sec <sec id="00015"><no>1.1.2</no> sahfjsahdfasddfj <sec id="00020"><no>1.1.2.1</no> sahfjsahdfasddfj </sec> <sec id="00025"><no>1.1.2.2</no> safksajdklfasd </sec> </sec> <sec id="00015"><no>1.1.3</no> sahfjsahdfasddfj </sec> </sec> <sec id="00005"><no>1.2</no>....

      This is bad example

      <sec id="00005"><no>1.1</no> sfasfasdfadsfsdaf <sec id="00010"><no>1.1.1</no> sahfjsahdfasddfj </sec <sec id="00015"><no>1.1.2</no> sahfjsahdfasddfj </sec> #it should not be closed here (beca +use next sublevel is opening here) <sec id="00020"><no>1.1.2.1</no> sahfjsahdfasddfj </sec> <sec id="00025"><no>1.1.2.2</no> safksajdklfasd </sec> #that should be closed here <sec id="00015"><no>1.1.3</no> sahfjsahdfasddfj </sec> </sec> <sec id="00005"><no>1.2</no>....

      Whereas here the second level section should close after third level is closed. This is wrong nesting. Actually the levels are identified by number with dots.

      1.1 - first level 1.1.1 - second level (first section's sub level) 1.1.1.1 - third level (second section's sub level) 1.1.1.1.(i) - fourth level (third section's sub level) 1.1.1.1.(i).(a) - fifth level (fourth section's sub level)

      Here the elements are same for all levels <sec, so parser not showing errors. If the elements are different for subsections like <sec, <sec1 , <sec2, the parser would have showed error. so here the nesting we are finding with the help of numbers 1.1, 1.1.1,...

      updated

      Regards,
      Anniyan
      (CREATED in HELL by DEVIL to s|EVILS|GOODS|g in WORLD)

Re: section nesting
by GrandFather (Cardinal) on Jul 26, 2005 at 07:15 UTC

    The following may be a useful starting point for you:

    use strict; use warnings; my @sectionCount = (1, 1); while (<DATA>) { chomp; if (/<\/sec>/) { pop @sectionCount; ++$sectionCount [-1]; next; } next if ! /<sec id=/; push @sectionCount, 1; my ($secNum) = /<no>(.*?)<\/no>/; print "Section nesting error in or before $secNum\n" if $secNum ne join ".", @sectionCount[0..(@sectionCount-2)]; }

    Perl is Huffman encoded by design.

      GrandFather i am stunned with your nice solution. Thanks a lot for spending such long time to get the solution. ++. It works very well. Definitely i will go through your logic, which might help me in my future.

      Regards,
      Anniyan
      (CREATED in HELL by DEVIL to s|EVILS|GOODS|g in WORLD)

        Warning: there is a reason why people pointed you to XML::Twig and the like. That's because parsing XML is hard. This may make it look easy, but it won't work unless the data is formatted to one tag per line, and no tags are allowed to span lines. Which may work today, but may not tomorrow. XML::Twig will handle it either way, GrandFather's solution will not. This isn't a slight on GrandFather's perl skill - you asked for something that didn't involve XML::Twig, and that's what he did. I just don't want you going out, using it, and then coming back and complaining about GrandFather's supposed lack of coding skill when it doesn't work. It works for the example given, you just need to know its assumptions.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://478066]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2014-07-12 06:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (238 votes), past polls