periapt has asked for the wisdom of the Perl Monks concerning the following question:
This is a difficult question to ask since I'm not sure of the terminology. Basically I am looking for a solution to parse what I would call a "loose" XML grammar. This means that data is contained between nested tags just as XML but without the requirement to specify the sequence of subtags.
I'm a novice with regards to XML but it seems that what I'm looking for a more generalized grammar parser?
For example, this would be allowed:
The trouble is that the subtags could occur in any order and in any number from 0 to unbounded.
Essentially, I want to build a hash of these tag elements and then parse through the hash to build an XML compliant output.
This is kind of out of my area and I'm not sure of that I'm asking the right questions when I research this. Any suggestions would be appreciated.
Further clarification:
Maybe this will help clarify. Consider it this way. A person is writing a text document. They will tag various words or phrases of that document using a predefined set of tags. Different parts of the document may contain related tags. For example,
The person {name/age} sub-elements could occur in any order. In fact, the parent/person elements could occur in any order. There might also be multiple person tag sets.
Ultimately, I want to parse the final document, build a hash from the tags and then process the hash to combine all the elements associated with person id="001" into a single data structure.
Update:
I've received several good suggestions and some good advice. XML::Simple seems the most promising at the moment. Of course, I'm open to more suggestions and I'd love to hear from someone who has tackled this problem before.
Well, I've got some exploration to do ...
PJ
I'm a novice with regards to XML but it seems that what I'm looking for a more generalized grammar parser?
For example, this would be allowed:
<toptag> <subtag1>element #1</subtag1> <subtag2>element #2</subtag2> <subtag3>element #3</subtag3> <toptag> <toptag> <subtag1>element #3</subtag1> <subtag2>element #2</subtag2> <subtag1>element #1</subtag1> <toptag> <toptag> <subtag2>element #2</subtag2> <subtag2>element #2</subtag2> <subtag2>element #2</subtag2> <toptag>
The trouble is that the subtags could occur in any order and in any number from 0 to unbounded.
Essentially, I want to build a hash of these tag elements and then parse through the hash to build an XML compliant output.
This is kind of out of my area and I'm not sure of that I'm asking the right questions when I research this. Any suggestions would be appreciated.
Further clarification:
Maybe this will help clarify. Consider it this way. A person is writing a text document. They will tag various words or phrases of that document using a predefined set of tags. Different parts of the document may contain related tags. For example,
<statement> This is the statement of <person id="001"><name>Joe Smith</nam +e></person>. His mothers name is <parent><name>Betty</name></parent>. Joe is <person id="001">< +age>15</age></person> years old. </statement>
The person {name/age} sub-elements could occur in any order. In fact, the parent/person elements could occur in any order. There might also be multiple person tag sets.
Ultimately, I want to parse the final document, build a hash from the tags and then process the hash to combine all the elements associated with person id="001" into a single data structure.
Update:
I've received several good suggestions and some good advice. XML::Simple seems the most promising at the moment. Of course, I'm open to more suggestions and I'd love to hear from someone who has tackled this problem before.
Well, I've got some exploration to do ...
PJ
use strict; use warnings; use diagnostics;
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: tagged text parser
by ikegami (Patriarch) on Oct 05, 2009 at 16:54 UTC | |
Re: tagged text parser
by bart (Canon) on Oct 05, 2009 at 17:13 UTC | |
by periapt (Hermit) on Oct 07, 2009 at 16:43 UTC | |
Re: tagged text parser
by BioLion (Curate) on Oct 05, 2009 at 16:50 UTC | |
Re: tagged text parser
by roboticus (Chancellor) on Oct 06, 2009 at 13:56 UTC | |
by periapt (Hermit) on Oct 07, 2009 at 16:51 UTC | |
by Your Mother (Archbishop) on Oct 07, 2009 at 17:34 UTC |
Back to
Seekers of Perl Wisdom