Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

One more q: on: parsing file/regex question

by smackdab (Pilgrim)
on Oct 23, 2003 at 23:25 UTC ( #301755=note: print w/ replies, xml ) Need Help??


in reply to parsing file/regex question

Thanks for all of the help on this so far...I took the suggestions and I expanded the sample program to see if that makes a difference...

I am hoping to get this as data driven as possible to reduce errors (especially when I cut-n-paste ;-)

I am looking to process some lines in a file and validate text (I am not yet using Taint, but will at some point ;) My problem is how to validate \n or \t, as sometimes it is allowed in the text field.

The following code should work, but I just want to make sure that the s/\\t/\t/g; (and the other ones that I might need) are the best way to go)

Thanks again for any help!!!!

$PRE = '\[\s*'; $VALID1 = '[-a-zA-Z0-9_.* \t\n]'; $VALID2 = '[-a-z0-9_.*\n]'; $VALID3 = '[a-zA-Z]'; $VALID4 = '[-a-zA-Z0-9]'; $PST = '\s*\]'; while (<DATA>) { s/\\n/\n/g; #Are these harmless if s/\\t/\t/g; #not needed??? print "yep\n" if m/$PRE($VALID1+)$PST $PRE($VALID2+)$PST $PRE($VALID3+)$PST $PRE($VALID4+)$PST /ox; } __DATA__ [TEST \n DATA] [ TEST DATA ] [ 2345423 ] [ TEST DATA ]\n [TEST \n DATA] [ TEST DATA ] [ 2345423 ] [ TEST\tDATA ]\n [TEST \n DATA] [ TEST DATA ] [ 2345423 ] [ TEST DATA ]\n


Comment on One more q: on: parsing file/regex question
Download Code
Re: One more q: on: parsing file/regex question
by graff (Chancellor) on Oct 24, 2003 at 03:35 UTC
    You didn't say which (if any) of the three data records is supposed to yield "yep"... it looks like none of them will, because $VALID3 specifies letters only, and all three data lines have only digits in the third field. Also, for any of them to match, $PST should include "\s*" after the close bracket, as well as before it (or maybe this should be added before the open bracket in $PRE).

    You do have the right notion for converting a literal (two character) '\n' or '\t' into the corresponding regex for the given type of whitespace.

    Note that some portions of your regexes can be simplified:  [a-zA-Z0-9_] is really just "\w", and if you want to match space, newline and tab, you might as well just use "\s".

    Are $VALID1 and $VALID2 really supposed to accept periods and asterisks, as well as alphanumerics and whitespace? (Just checking... sometimes people tend to make the mistake of putting ".*" inside of square brackets when they really have something else in mind.)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://301755]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (10)
As of 2015-07-03 10:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (51 votes), past polls