Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

regular expression help

by pip9ball (Acolyte)
on Jun 03, 2009 at 19:21 UTC ( #768125=perlquestion: print w/replies, xml ) Need Help??
pip9ball has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm in need of some regular expression help.

I'd like to create a regular expression that would match the following strings:

<*2>H<3:0>,<*2>I<3:0>,...,<*2>Z<2:0>
H<3:0>,<*2>I<3:0>,...,<*2>Z<2:0>

So the <*K> is optional in each string and there could be numerous tokens (seperated by commas).

I have the following:
$str =~ /<\*\d+>([^<]+)<(\d+):(\d+)>,<\*\d+>([^<]+)<(\d+):(\d+)>/
but this will only match
<*K>A<N:M>,<*K>B<N:M>
and not
A<N:M>,<*K>B<N:M>.

Is such a regular expression possible?

Thanks!

Replies are listed 'Best First'.
Re: regular expression help
by kennethk (Abbot) on Jun 03, 2009 at 19:31 UTC
    Assuming I understand your spec, the solution to your issue would appear to be quantifiers, specifically '?' which matches 0 or 1 occurrences of a patten, combined with the extended pattern '(?:...)', which will allow grouping without capturing. For your provided examples, this works for me:

    $str =~ /(?:<\*\d+>)?([^<]+)<(\d+):(\d+)>,<\*\d+>([^<]+)<(\d+):(\d+)>/

Re: regular expression help
by ikegami (Pope) on Jun 03, 2009 at 19:54 UTC

    To match "X,X,X", the pattern will look something like /X(?:,X)*/. Since your "X" is rather long, you might want to consider splitting on commas first, then parsing each item in the list.

      ...consider splitting...
      Or define your X subpattern as a separate regex, then build the pattern with it:
      my $subpat = qr/(?:<\*\d+>)?([^<]+)<(\d+):(\d+)>/; $str =~/$subpat(?:,$subpat)*/;
      However, the capturing parentheses are not going to capture everything this way.

      Caution: Contents may have been coded under pressure.

        However, the capturing parentheses are not going to capture everything this way.

        That's why I didn't recommend it. Perl's match operator can match extremely complex expressions (the initial purpose of regular expressions) and it can extract data from strings, but it's not so good at doing both at once. (At least not before 5.10. 5.10 added a bunch of tools that might help.)

        Thanks for the suggestion...I will try doing something like this!
Re: regular expression help
by Perlbotics (Chancellor) on Jun 03, 2009 at 20:01 UTC

    Hi, if I understand your question correct, you want to extract information from each token. So splitting the string into tokens first and then extracting any useful information one by one might be a suitable approach for you?

    use strict; sub extract { chomp(my $orig = shift); foreach ( split(/,/, $orig) ) { # scan list of tokens if ( / <\*\d+> ([^<]+) < (\d+) : (\d+) > /x ) { # do something with extracted information print "got $1/$2/$3 from '$orig'\n"; } } } extract($_) while (<DATA>); __DATA__ <*2>H<3:0>,<*2>I<3:0>,...,<*2>Z<2:0> H<3:0>,<*2>I<3:0>,...,<*2>Z<2:0>
    Prints:
    got H/3/0 from '<*2>H<3:0>,<*2>I<3:0>,...,<*2>Z<2:0>' got I/3/0 from '<*2>H<3:0>,<*2>I<3:0>,...,<*2>Z<2:0>' got Z/2/0 from '<*2>H<3:0>,<*2>I<3:0>,...,<*2>Z<2:0>' got I/3/0 from 'H<3:0>,<*2>I<3:0>,...,<*2>Z<2:0>' got Z/2/0 from 'H<3:0>,<*2>I<3:0>,...,<*2>Z<2:0>'

      I like this approach and will see if I can restructure my code a bit to be able to do this. Thanks!
Re: regular expression help
by Anonymous Monk on Jun 04, 2009 at 06:53 UTC
    Cadence name and vector expressions may be more complex than the examples you have shown.

    Splitting and then parsing the angle bracket contents would be the better way.
      Agreed, Cadence bus syntax can be much more complex than
      this. However splitting on commas and parsing the contents
      in '<>' doesn't allow me to identify unique field types.

      Eg.
      XI0<15:0> HD_DEC0_B<(7:0)*2> HD_DEC1_B<15:0:2*2> <*2>H<3:0>,<*4>P<1:0>

      Because this is an cadence array'd instance, the expansion
      of each field type is different so I need a way to
      identify this.

      HD_DEC0_B<(7:0)*2> --> field_type1
      HD_DEC1_B<15:0:2*2> --> field_type2
      <*2>H<3:0>,<*4>P<1:0> --> field_type3

      There are many more different field_types that I need to be able to identify, the above is just an example.
      What this line expands to is:

      XI0_15 HD_DEC0_B_7 HD_DEC1_B_15 H_3
      XI0_14 HD_DEC0_B_6 HD_DEC1_B_15 H_2
      XI0_13 HD_DEC0_B_5 HD_DEC1_B_13 H_1
      XI0_12 HD_DEC0_B_4 HD_DEC1_B_13 H_0
      XI0_11 HD_DEC0_B_3 HD_DEC1_B_11 H_3
      XI0_10 HD_DEC0_B_2 HD_DEC1_B_11 H_2
      XI0_9 HD_DEC0_B_1 HD_DEC1_B_9 H_1
      XI0_8 HD_DEC0_B_0 HD_DEC1_B_9 H_0
      XI0_7 HD_DEC0_B_7 HD_DEC1_B_7 P_1
      XI0_6 HD_DEC0_B_6 HD_DEC1_B_7 P_0
      XI0_5 HD_DEC0_B_5 HD_DEC1_B_5 P_1
      XI0_4 HD_DEC0_B_4 HD_DEC1_B_5 P_0
      XI0_3 HD_DEC0_B_3 HD_DEC1_B_3 P_1
      XI0_2 HD_DEC0_B_2 HD_DEC1_B_3 P_0
      XI0_1 HD_DEC0_B_1 HD_DEC1_B_1 P_1
      XI0_0 HD_DEC0_B_0 HD_DEC1_B_1 P_0

      For field_type3, I already have a routine that will explode this type and populate an array.
      However I'd like to have a regular expression that will match
      all variations of field_type3 but not match my
      other field_types.

      Thanks!
        The other field_types come about as a result of a combination of the bus syntax. For instance, you may combine field_type2 & field_type3 syntaxes and it'd still be valid bus syntax.

        Maybe you'd want to expand the bus string into its individual signals first. Cadence already has a SKILL function to do this.

        By the way, the expansions you have shown (with underscores) already have netlist name mappings performed so they can be taken into spectre without choking.

        Make sure you take the *2 and *4 multipliers into account. The expansion will be the cross product of each bus's individual signals (multiplied or not).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://768125]
Approved by Perlbotics
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2017-09-26 17:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    During the recent solar eclipse, I:









    Results (296 votes). Check out past polls.

    Notices?