Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
The regular expression you choose for any situation depends a great deal on how much you can trust your data to follow a pattern. Just for some examples (untested but should serve to illustrate):

  • Your original attempt is vague (it matches 'ABCD xyz 123') but works better if modified slightly by adding an anchor to the front, or a word boundary if you don't want to be stuck to that position:

    /\b\w{3}\s+\w{3}\s+\d+/

  • If you are certain that the date always starts the line, then the split is certainly a nice option as Trimbach said.

  • If you want to be more certain that you get a real date, you could do something like:

    /(Sun|Mon|Tue|Wed|Thu|Fri|Sat)\s(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s\d{1,2}/i

    with the i option used if you can't trust the case of the letters. The alternations in this example will make it very slow however, so if you use the line a lot, that may cause problems. This should find a valid date anywhere in the line (anchor it with ^ if you don't want that as Trimbach said), but it will match cases which are not followed by the time, timezone and year, so you might want to extend the regex to match them also for an extra validity check. Even with all that specificity, this will still match "Wed Mar 98" which clearly isn't a date. To fix that, the numeric match could be changed to ([012]?[0-9]|[3][0-1]) but this is getting pretty messy!

  • For another more reasonable regex, but less precise, try:

    /[A-Z][a-z]{2}\s+[A-Z][a-z]{2}\s+\d{1,2}/

    or the slightly more specific but definitely funny looking

    /[SMTWF][uoehra][neduit]\s+[JFMASOND][aepuco][nbrylgtvc]\s+\d{1,2}/i

So in summary, a regex will just match what you are telling it to look for (if present), which may very well not be a date. It may be wise to do a validation after the match, using something like Time::ParseDate, in which case you can choose a much simpler less-specific regex.

--
I'd like to be able to assign to an luser


In reply to Re: regex-matching the date by Albannach
in thread regex-matching the date by stuffy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (7)
As of 2024-04-24 00:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found