Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Finding parentheses that won't match

by haj (Vicar)
on Nov 26, 2024 at 11:21 UTC ( [id://11162892]=note: print w/replies, xml ) Need Help??


in reply to Finding parentheses that won't match

Without any known structure of the text there is no easy solution.

(Example: A non-matching closing parenthesis could be
  • 1.) Some sort of a list or
  • 2.) ASCII smileys like ':-)'.
Which ones are the non-matching ones?)

This block of text has one opening and four closing parentheses. A naive matching could match '(Example' with '1.)' and mark those in '2.)', ':-)' and 'ones?)' as non-matching, but that's probably not what you want. Here are some other edge cases:

  • <elem attr="(">)</elem> - matching or non-matching?
  • )( - matching or non-matching?

With a module like XML::LibXML::Reader you can traverse an XML document and keep track of the current parentheses level, reporting whenever the level is decreases below 0. However, in my opinion such XML processing does not qualify as "easy".

  • Comment on Re: Finding parentheses that won't match

Replies are listed 'Best First'.
Re^2: Finding parentheses that won't match
by LexPl (Sexton) on Nov 26, 2024 at 14:09 UTC

    Thanks for your thoughts! I fully agree that it's a rather tricky issue, especially given the fact that you will find listings with "a)", "b)", "c)" as item markers.

    How would do it with a module like XML::LibXML::Reader?

      First: To work with XML::LibXML::Reader, your XML document has to be valid. Your example has a bad entity (as has been already pointed out by LanX), also the named entities are not declared.

      Without any attempt to make it pretty, here's a hack with your (corrected) example text. It shows a minimal context around each parenthesis found and indents according to level.

      use 5.032; use warnings; use XML::LibXML::Reader qw( :types ); use open ':std', OUT => ':encoding(UTF-8)'; my $data = \*DATA; my $reader = XML::LibXML::Reader->new( IO => $data ); my $paren_level = 0; while ($reader->read) { next unless $reader->nodeType == XML_READER_TYPE_TEXT; my $text = $reader->value; while ($text =~ /([^() ]*\s*)([()])(\s*[^() ]*)/g) { if ($2 eq '(') { say " " x ($paren_level++ * 4), "opening: '$&'"; } else { if ($paren_level < 1) { say "non-matching: '$&'"; } else { say " " x (--$paren_level * 4), "closing: '$&'"; } } } } if ($paren_level) { say "At the end, $paren_level parentheses were left unclosed."; } __DATA__ <?xml version="1.0"?> <!DOCTYPE rn [ <!ENTITY auml "ä"> <!ENTITY Auml "Ä"> <!ENTITY ouml "ö"> <!ENTITY Ouml "Ö"> <!ENTITY uuml "ü"> <!ENTITY Uuml "Ü"> <!ENTITY sect "§"> <!ENTITY emsp14 "&#8197;"> <!ENTITY ldquor "„"> <!ENTITY rdquor "”"> ]> <rn> <rnnum>52</rnnum> <p>In Anlehnung an den Ansatz von Tokio 2013 <emph>definiert</emph> &s +ect;&emsp14;1 II&emsp14;XXX die nachhaltige Bewirtschaftung als eine +Bewirtschaftung, die die sozialen und wirtschaftlichen Anspr&uuml;che + an den Boden mit seinen &ouml;kologischen Funktionen in Einklang bri +ngt. Es soll eine gleichberechtigte Berücksichtigung von &ouml;kologi +schen, &ouml;konomischen und sozialen Aspekten (&ldquor;Bedarfstripel +&rdquor;<fn id="xxx"> <p>Mayer, JJV 2022, 28, 29.</p> </fn>) f&uuml;r die Nutzung angestrebt werden. Ziel ist ein Ausgleich +der drei &ldquor;S&auml;ulen&rdquor;, die prinzipiell gleichwertig si +nd. <fn id="yyy"> <p>Reiter, in: BoGB, &Ouml;ffentliche Regeln, &sect;&emsp14;4 Rn.&emsp +14;6&emsp14;f.; Mayer, NatBl. 1996, 1082, 1083.</p> </fn> Dabei muss allerdings der Grundgedanke der St&auml;rkung der &ou +ml;kologischen Funktionen im Sinne der Ressourcenschonung ber&uuml;ck +sichtigt werden. Die &ouml;kologischen Funktionen m&uuml;ssen mit den + wirtschaftlichen und sozialen Interessen in Einklang gebracht werden +.<fn id="zzz"> <p>So auch in seiner Einleitung der Beschluss der 72. ROK vom 27.&emsp +14;06. 2017, der in (&ldquor;Leitbilder und Handlungsstrategien f&uum +l;r die Bodennutzung (in l&auml;ndlichen Gebieten)&rdquor;) sehr gut +die aktuellen Herausforderungen beschreibt, die unterschiedlichen Asp +ekte nach dem Topziel der Nachhaltigkeit strukturiert und sie mit bei +spielhaften Handlungsans&auml;tzen versieht.</p> </fn> Wenn die &Ouml;kologie von vornherein gegen&uuml;ber den wirtsch +aftlichen und sozialen Anspr&uuml;chen zur&uuml;cktritt, verliert die + Nachhaltigkeit ihre Orientierungsfunktion, und die Begrifflichkeit w +ird ad absurdum gef&uuml;hrt.</p> </rn>
      This example has no non-matching parentheses, the output reads:
      opening: 'Aspekten („Bedarfstripel”' closing: ') für' opening: 'in („Leitbilder' opening: 'Bodennutzung (in' closing: 'Gebieten)”' closing: ') sehr'

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11162892]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2025-03-21 20:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    When you first encountered Perl, which feature amazed you the most?










    Results (63 votes). Check out past polls.