Re^3: Regex help

in reply to Re^2: Regex help
in thread Regex help

There are number of ways to do it with straight regular expressions, but the technology you are feeding this into (and thus the necessary input-output mapping) is unfamiliar to me. Parsing one of your lines may be as easy as /^\s*(\S+)\s+(\S+)\s*$/, but maybe this needs the m modifier depending on context. This particular expression will fail on all lines other than your data lines, since all it does is (thanks to YAPE::Regex::Explain):

The regular expression:

(?m-isx:^\s*(\S+)\s+(\S+)\s*$)

matches as follows:
  
NODE                     EXPLANATION
----------------------------------------------------------------------
(?m-isx:                 group, but do not capture (with ^ and $
                         matching start and end of line) (case-
                         sensitive) (with . not matching \n)
                         (matching whitespace and # normally):
----------------------------------------------------------------------
  ^                        the beginning of a "line"
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  $                        before an optional \n, and the end of a
                           "line"
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
[download]

and can be shown to work on this example with

#!/usr/bin/perl -w
use strict;
use Data::Dumper;

$_ = <<'EOT';
Port 1 Database Assignments

  Region    Data Type   # Records

  GLOBAL    --
  LOCAL     --
  BUF       --
  D1        Unused
  D2        Unused
  D3        Unused
  D4        Unused
  D5        Unused
  D6        Unused
  D7        Unused
  D8        Unused
  A1        Unused
  A2        Unused
  A3        Unused
  USER      Unused
EOT

my %hash;
while (/^\s*(\S+)\s+(\S+)\s*$/mg) {
   $hash{$1} = $2;
}
print Dumper \%hash;
[download]

However, it'll break pretty quickly if your input is not representative; e.g. if you Region or Data Type contain white space (this looks fixed width to me) or if # Records is not null.

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Comment on Re^3: Regex help Select or Download Code

Replies are listed 'Best First'.
Re^4: Regex help by jayto (Acolyte) on Jun 21, 2012 at 15:11 UTC
Thanks this was really helpful, you pretty much did all the work for me. I am trying to understand your regex expression which will take a bit. One more questions, is there no way to record the "# Record" column cells as being blank?	[reply]
Re^5: Regex help by kennethk (Abbot) on Jun 22, 2012 at 13:09 UTC
Recording the "# Record" column cells as being blank is easy; the hard part would be changing the expression to record non-blank entries. You could just add an empty pair of parentheses, a la `/^\s(\S+)\s+(\S+)\s()$/`. Given that none of the lines contain three non-space blocks, you might even say `/^\s(\S+)\s+(\S+)\s(\S)\s$/`. Keep in mind that given these are such general expressions, it is absolutely essential that you test this against real input and be skeptical of the results. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply] [d/l] [select]

In Section Seekers of Perl Wisdom