Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

How we can do regex of this?

by xMiDgArDx (Initiate)
on Dec 23, 2012 at 14:03 UTC ( #1010097=perlquestion: print w/ replies, xml ) Need Help??
xMiDgArDx has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Im trying to regex this line:
Mod Sub-Module Model Serial ---- --------------------------- ------------------ ----------- 1 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1434RLPY 3 7600 ES+ DFC XL 7600-ES+3CXL JAE14520N29 3 7600 ES+T 20x1GE SFP 76-ES+T-20GQ JAE145301XM 5 Policy Feature Card 3 7600-PFC3CXL JAE14330E6J 5 C7600 MSFC4 Daughterboard 7600-MSFC4 JAE14320QBE 6 Policy Feature Card 3 7600-PFC3CXL JAE14330EAO 6 C7600 MSFC4 Daughterboard 7600-MSFC4 JAE14320QA8 7 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1433QHBR 8 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1433QXF9
Im using this code:
open(FILE,"show_module.dat"); while(<FILE>){ if ($_ =~ /^\s*(\d+)\s+(.*?) (\s*)([WS|76])(.*?)\s+(\w+)\s+(.*)\s+( +.*)/) { print "SLOT = $1\n"; print "Desc = $2\n"; print "Model = $3\n"; } } close(FILE);
But this regex dont get the 2nd and 3rd line ... (7600 ES+ DFC XL / 7600 ES+T 20x1GE SFP) How i can get all line whit special chars? Im thinking my problem is because "+" in these lines ...

Comment on How we can do regex of this?
Select or Download Code
Re: How we can do regex of this?
by moritz (Cardinal) on Dec 23, 2012 at 14:12 UTC

    A regex isn't the right tool for the job. Since the table has fixed width, use substr or unpack to extract the cells; perlpacktut has an example.

    If the width of the columns isn't known in advance, use a regex to find the column width by considering the first line, and use unpack/substr for the rest of the table.

    Update: some example code. Trimming of leading/trailing whitespaces left as an exercise to the reader

    use 5.010; use strict; use warnings; my $head_line = <DATA>; my @c_idx; while ($head_line =~ /\S+/g) { push @c_idx, $-[0]; } my $sep = <DATA>; # ignored; while (<DATA>) { chomp; my @columns; for my $idx (0..$#c_idx) { if ($idx == $#c_idx) { push @columns, substr $_, $c_idx[$idx]; } else { push @columns, substr $_, $c_idx[$idx], $c_idx[$idx+1] - $ +c_idx[$idx]; } } use Data::Dumper; print Dumper \@columns; } __DATA__ Mod Sub-Module Model Serial ---- --------------------------- ------------------ ----------- 1 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1434RLPY 3 7600 ES+ DFC XL 7600-ES+3CXL JAE14520N29 3 7600 ES+T 20x1GE SFP 76-ES+T-20GQ JAE145301XM 5 Policy Feature Card 3 7600-PFC3CXL JAE14330E6J 5 C7600 MSFC4 Daughterboard 7600-MSFC4 JAE14320QBE 6 Policy Feature Card 3 7600-PFC3CXL JAE14330EAO 6 C7600 MSFC4 Daughterboard 7600-MSFC4 JAE14320QA8 7 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1433QHBR 8 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1433QXF9
      Somewhat simpler:
      use strict; use warnings; my @names = split(' ', <DATA>); my @lenghts = split(' ', <DATA>); my %sizes; @sizes{@names} = map { length } @lenghts; sub strip_space { my ($str) = @_; $str =~ s/^\s+//; unpack 'A*', $str; } my %data; while (defined(my $line = <DATA>)) { my $pos = 0; foreach my $name (@names) { push @{$data{$name}}, strip_space(substr($line, $pos, $sizes{$name})); $pos += $sizes{$name} + 1; } } use Data::Dumper; print Dumper \%data; __END__ Mod Sub-Module Model Serial ---- --------------------------- ------------------ ----------- 1 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1434RLPY 3 7600 ES+ DFC XL 7600-ES+3CXL JAE14520N29 3 7600 ES+T 20x1GE SFP 76-ES+T-20GQ JAE145301XM 5 Policy Feature Card 3 7600-PFC3CXL JAE14330E6J 5 C7600 MSFC4 Daughterboard 7600-MSFC4 JAE14320QBE 6 Policy Feature Card 3 7600-PFC3CXL JAE14330EAO 6 C7600 MSFC4 Daughterboard 7600-MSFC4 JAE14320QA8 7 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1433QHBR 8 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1433QXF9
Re: How we can do regex of this?
by toolic (Chancellor) on Dec 23, 2012 at 14:19 UTC
    Consider unpack instead of a regex:
    use warnings; use strict; while (<DATA>) { my @cols = unpack 'A5A28A19A*', $_; if ($cols[0] =~ /^[\d\s]+$/) { # ignore header rows $cols[0] =~ s/\s//g; print "SLOT = $cols[0]\n"; print "Desc = $cols[1]\n"; print "Model = $cols[2]\n"; } } __DATA__ Mod Sub-Module Model Serial ---- --------------------------- ------------------ ----------- 1 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1434RLPY 3 7600 ES+ DFC XL 7600-ES+3CXL JAE14520N29 3 7600 ES+T 20x1GE SFP 76-ES+T-20GQ JAE145301XM 5 Policy Feature Card 3 7600-PFC3CXL JAE14330E6J 5 C7600 MSFC4 Daughterboard 7600-MSFC4 JAE14320QBE 6 Policy Feature Card 3 7600-PFC3CXL JAE14330EAO 6 C7600 MSFC4 Daughterboard 7600-MSFC4 JAE14320QA8 7 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1433QHBR 8 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1433QXF9

    Prints:

    SLOT = 1 Desc = Distributed Forwarding Card Model = WS-F6700-DFC3CXL SLOT = 3 Desc = 7600 ES+ DFC XL Model = 7600-ES+3CXL SLOT = 3 Desc = 7600 ES+T 20x1GE SFP Model = 76-ES+T-20GQ SLOT = 5 Desc = Policy Feature Card 3 Model = 7600-PFC3CXL SLOT = 5 Desc = C7600 MSFC4 Daughterboard Model = 7600-MSFC4 SLOT = 6 Desc = Policy Feature Card 3 Model = 7600-PFC3CXL SLOT = 6 Desc = C7600 MSFC4 Daughterboard Model = 7600-MSFC4 SLOT = 7 Desc = Distributed Forwarding Card Model = WS-F6700-DFC3CXL SLOT = 8 Desc = Distributed Forwarding Card Model = WS-F6700-DFC3CXL
Re: How we can do regex of this?
by muba (Priest) on Dec 23, 2012 at 14:30 UTC

    It gets the second and third line just fine

    Proof:

    But in your regex I notice a few "funny" parts:

    (\s*)

    Are you sure you want to capture the whitespaces? I assume you don't.

    ([WS|76])(.*?)

    Are you sure you mean [ and ] here? They signify character classes (much like how \d means [0-9], or "any one digit", [WS] means "any one W or S"), not capture groups. Assuming you meant (WS|76), then do you really want to capture the (WS or 76) separately from whatever follows? My guess is that you meant ((?:WS|76)).*?) here — and I doubt you actually need to specify that this substring begins with the WS or 76, so why not just simplify it to (\S+)?

    \s+(.*)\s+(.*)

    I'm not sure which parts of the sample data this would match. Did you really intend to have it there?

    Modifying your regex according to the above assumptions, and tacking on a $ for good measure, I come up with /^\s*(\d+)\s+(.+?)\s*(\S+)\s+(\w+)$/. Furthermore, I threw in a chomp because I don't want to be bothered with the newlines that are tacked onto each value for $_, so eventually my version of your script looks like this:</c>

    while(<DATA>){ print; chomp; if ($_ =~ /^\s*(\d+)\s+(.+?)\s*(\S+)\s+(\w+)$/) { print "SLOT = $1\n"; print "Desc = $2\n"; print "Model = $3\n"; print "\n"; } } close(FILE); __DATA__ Mod Sub-Module Model Serial ---- --------------------------- ------------------ ----------- 1 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1434RLPY 3 7600 ES+ DFC XL 7600-ES+3CXL JAE14520N29 3 7600 ES+T 20x1GE SFP 76-ES+T-20GQ JAE145301XM 5 Policy Feature Card 3 7600-PFC3CXL JAE14330E6J 5 C7600 MSFC4 Daughterboard 7600-MSFC4 JAE14320QBE 6 Policy Feature Card 3 7600-PFC3CXL JAE14330EAO 6 C7600 MSFC4 Daughterboard 7600-MSFC4 JAE14320QA8 7 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1433QHBR 8 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1433QXF9
    Mod Sub-Module Model Serial ---- --------------------------- ------------------ ----------- 1 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1434RLPY SLOT = 1 Desc = Distributed Forwarding Card Model = WS-F6700-DFC3CXL 3 7600 ES+ DFC XL 7600-ES+3CXL JAE14520N29 SLOT = 3 Desc = 7600 ES+ DFC XL Model = 7600-ES+3CXL 3 7600 ES+T 20x1GE SFP 76-ES+T-20GQ JAE145301XM SLOT = 3 Desc = 7600 ES+T 20x1GE SFP Model = 76-ES+T-20GQ 5 Policy Feature Card 3 7600-PFC3CXL JAE14330E6J SLOT = 5 Desc = Policy Feature Card 3 Model = 7600-PFC3CXL 5 C7600 MSFC4 Daughterboard 7600-MSFC4 JAE14320QBE SLOT = 5 Desc = C7600 MSFC4 Daughterboard Model = 7600-MSFC4 6 Policy Feature Card 3 7600-PFC3CXL JAE14330EAO SLOT = 6 Desc = Policy Feature Card 3 Model = 7600-PFC3CXL 6 C7600 MSFC4 Daughterboard 7600-MSFC4 JAE14320QA8 SLOT = 6 Desc = C7600 MSFC4 Daughterboard Model = 7600-MSFC4 7 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1433QHBR SLOT = 7 Desc = Distributed Forwarding Card Model = WS-F6700-DFC3CXL 8 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1433QXF9 SLOT = 8 Desc = Distributed Forwarding Card Model = WS-F6700-DFC3CXL

    Which sure looks good to me.

      I get your code and modified some parts to my script, and now work fine!!! Thank you for the help! =)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1010097]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (4)
As of 2014-07-13 18:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (251 votes), past polls