in reply to
Re: RegEx Against Arbitrary XML Tags
in thread RegEx Against Arbitrary XML Tags
Yes my intention has never been to roll my own...and I am using Twig in parts of my code. The main reason I am trying to roll my own is due to all the xml differencing engines I have encountered share a same logic premise. And I have looked at a bunch of them from C to Java based. Unfortunately they all appear to have something in common that doesn't provide what I am truely looking for and that is not to cascade changes through sibling elements when an element is deleted. What I have found is that if I have multiple siblings in a element tree and you delete one element somewhere in the middle of the tree and add a new element to the sibling tree at the same time the differencing engine doesn't merely remove the deleted element and add the new element...it changes the element below the deleted element to reflect the deleted element as being changed and that then cascades down the sibling tree showing the newly added element as a change of the previously last element in the sibling tree. This is very difficult to deal with when trying to maintain representations of this data in a RDBMS. So instead of a simple delete record and add record you end up with multiple changes to existing records cascaded down the sibling tree...with never indicating that an element was deleted and an element was added. Somehow all the diffrencing engines appear to maintain sibling element order as a key aspect of watching for changes...thus my intention to try and roll my own.
The reason I am trying to understand the RegEx is to be able to detect tag patterns without having to know the contents of the tags...thus I don't want to write the matching pattern for ever possible tag...that is possible but I want it to function regardless of the tag name.
As far as the RegEx...what I am wanting is to have a RegEx pattern that matches ^<ANYTHING>$ only with no attributes, but when you try to RegEx against xml that may look like <ANYTHING port="7777"> or <ANYTHING>someValue</ANYTHING> or <ANYTHING></ANYTHING> matching a pattern like /^<(.*)>$/ doesn't just get the first example...it also grabs the second, third and fourth. The RegEx I am trying to understand is to only grab the first <ANYTHING>...and its become harder than I have imagined.
So I have tried variations such as:
if($line =~ /^\s*<(\w+)>[^.+]/)
if($line =~ /^\s*<(\w+)>[^(\w*|\d*|<*)]/)
if($line =~ /^\s*<(\w+)>([^\w*]|[^\d*]|[^<*])/)
Just not sure how to overcome with a RegEx pattern...more complex patterns are easier because you have more items to anchor against...but the simplest tag <ANYTHING> is my harder than I thought.