Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Regex help

by kennethk (Monsignor)
on Jun 21, 2012 at 13:49 UTC ( #977641=note: print w/ replies, xml ) Need Help??


in reply to Regex help

What have you tried? What didn't work? What resources are you using? This smells a lot like homework. I would say that split is probably a more appropriate tool than regular expressions. For some great resources on learning Perl, see http://learn.perl.org/. And if you post some code as per How do I post a question effectively?, we'll be happy to help you debug.


#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.


Comment on Re: Regex help
Re^2: Regex help
by jayto (Acolyte) on Jun 21, 2012 at 14:05 UTC
    Its not HW its for my job, im trying to parse data outputed from a sel2030. Ive never used regexs before and I am having a hard time figuring out read down a list and to get the parser to start at correct place in the list. Is there anyway besides using split? I am passing these regex expressions into xml attributes so if there is a way to do it without using perl commands that would be optimal. Any ideas great Monks?
      There are number of ways to do it with straight regular expressions, but the technology you are feeding this into (and thus the necessary input-output mapping) is unfamiliar to me. Parsing one of your lines may be as easy as /^\s*(\S+)\s+(\S+)\s*$/, but maybe this needs the m modifier depending on context. This particular expression will fail on all lines other than your data lines, since all it does is (thanks to YAPE::Regex::Explain):
      The regular expression: (?m-isx:^\s*(\S+)\s+(\S+)\s*$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?m-isx: group, but do not capture (with ^ and $ matching start and end of line) (case- sensitive) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of a "line" ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \S+ non-whitespace (all but \n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- \S+ non-whitespace (all but \n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- $ before an optional \n, and the end of a "line" ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
      and can be shown to work on this example with
      #!/usr/bin/perl -w use strict; use Data::Dumper; $_ = <<'EOT'; Port 1 Database Assignments Region Data Type # Records GLOBAL -- LOCAL -- BUF -- D1 Unused D2 Unused D3 Unused D4 Unused D5 Unused D6 Unused D7 Unused D8 Unused A1 Unused A2 Unused A3 Unused USER Unused EOT my %hash; while (/^\s*(\S+)\s+(\S+)\s*$/mg) { $hash{$1} = $2; } print Dumper \%hash;
      However, it'll break pretty quickly if your input is not representative; e.g. if you Region or Data Type contain white space (this looks fixed width to me) or if # Records is not null.

      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        Thanks this was really helpful, you pretty much did all the work for me. I am trying to understand your regex expression which will take a bit. One more questions, is there no way to record the "# Record" column cells as being blank?
Re^2: Regex help
by dasgar (Deacon) on Jun 21, 2012 at 14:32 UTC
    I would say that split is probably a more appropriate tool than regular expressions.

    Not trying to start an argument, but doesn't split use a regular expression ("pattern") as its first parameter?

      From a pedantic perspective, yes, split uses a regular expression to determine how to break up the string. However in my experience, "using regular expressions" in common usage is generally taken to mean using the bare expression for matching, capturing or substitution.

      So you are technically correct - the best kind of correct.


      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://977641]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (9)
As of 2014-12-27 18:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (177 votes), past polls