Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

One more q: on: parsing file/regex question

by smackdab (Pilgrim)
on Oct 23, 2003 at 23:25 UTC ( #301755=note: print w/replies, xml ) Need Help??

in reply to parsing file/regex question

Thanks for all of the help on this so far...I took the suggestions and I expanded the sample program to see if that makes a difference...

I am hoping to get this as data driven as possible to reduce errors (especially when I cut-n-paste ;-)

I am looking to process some lines in a file and validate text (I am not yet using Taint, but will at some point ;) My problem is how to validate \n or \t, as sometimes it is allowed in the text field.

The following code should work, but I just want to make sure that the s/\\t/\t/g; (and the other ones that I might need) are the best way to go)

Thanks again for any help!!!!
$PRE = '\[\s*'; $VALID1 = '[-a-zA-Z0-9_.* \t\n]'; $VALID2 = '[-a-z0-9_.*\n]'; $VALID3 = '[a-zA-Z]'; $VALID4 = '[-a-zA-Z0-9]'; $PST = '\s*\]'; while (<DATA>) { s/\\n/\n/g; #Are these harmless if s/\\t/\t/g; #not needed??? print "yep\n" if m/$PRE($VALID1+)$PST $PRE($VALID2+)$PST $PRE($VALID3+)$PST $PRE($VALID4+)$PST /ox; } __DATA__ [TEST \n DATA] [ TEST DATA ] [ 2345423 ] [ TEST DATA ]\n [TEST \n DATA] [ TEST DATA ] [ 2345423 ] [ TEST\tDATA ]\n [TEST \n DATA] [ TEST DATA ] [ 2345423 ] [ TEST DATA ]\n

Replies are listed 'Best First'.
Re: One more q: on: parsing file/regex question
by graff (Chancellor) on Oct 24, 2003 at 03:35 UTC
    You didn't say which (if any) of the three data records is supposed to yield "yep"... it looks like none of them will, because $VALID3 specifies letters only, and all three data lines have only digits in the third field. Also, for any of them to match, $PST should include "\s*" after the close bracket, as well as before it (or maybe this should be added before the open bracket in $PRE).

    You do have the right notion for converting a literal (two character) '\n' or '\t' into the corresponding regex for the given type of whitespace.

    Note that some portions of your regexes can be simplified:  [a-zA-Z0-9_] is really just "\w", and if you want to match space, newline and tab, you might as well just use "\s".

    Are $VALID1 and $VALID2 really supposed to accept periods and asterisks, as well as alphanumerics and whitespace? (Just checking... sometimes people tend to make the mistake of putting ".*" inside of square brackets when they really have something else in mind.)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://301755]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2018-01-21 07:52 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (227 votes). Check out past polls.