Will regex work here?

Sherlock has asked for the wisdom of the Perl Monks concerning the following question:

I'm rather new to Perl development and I've come across a situation where I think the use of some sort of regex might work, but I'm not very good with them yet, so I need a little help.

Here's the situation. I'm reading a list of appointments from an XML file and I'd like to remove any old dates from the file. Here's a sample of the XML to show you the schema:

<SCHEDULE>
   <DAY DATE="4/21/2001">
      <APPT>
         <START>10:00</START>
         <STOP>11:00</STOP>
      </APPT>
      <APPT>
         <START>12:00</START>
         <STOP>13:30</STOP>
      </APPT>
   </DAY>
   <DAY>
      ...
   </DAY>
   ...
</CALENDAR>
[download]

What I'd like to accompish is to read this file in (possibly into a single scalar, for simplicity) and remove all <DAY> nodes with a "DATE" attribute prior to a specific date, such as the current date.

I was thinking of using some permutation of the s/// operator to accomplish someting like:

$xmlDoc =~ s/<DATE DAY="(Everything less than today)"...</DATE>//g;
[download]

This is where I'm having trouble. Can I use some sort of regex to pull this off or is there possibly another, simpler way of accomplishing this.

Thanks,
- Sherlock

P.S.
I'd like to send a little "Thank you" out to OeufMayo for the XML::Parser Tutorial that really got me going on this project. ;-)

Comment on Will regex work here? Select or Download Code

Replies are listed 'Best First'.
Re: Will regex work here? by strredwolf (Chaplain) on Apr 25, 2001 at 00:27 UTC
You really can't use a regexp across a \n, unless you suck the whole file in... then it gets rather nasty. The XML modules would help, as well as some code from my chatterbox client if you want to code it by hand. -- $Stalag99{"URL"}="http://stalag99.keenspace.com";	[reply]
Re: Re: Will regex work here? by Sherlock (Deacon) on Apr 25, 2001 at 00:31 UTC
For simplicity, the entire XML file is on one line. It isn't edited by hand, so that worked out fine. Therefore, there aren't any \n characters in the XML file. I just printed it that way so that people could understand the XML a little better. I should have stated that in my original post. Sorry. - Sherlock	[reply]
Re: Will regex work here? by chorg (Monk) on Apr 25, 2001 at 00:43 UTC
If you are using XML, you should be able to avoid the regex stuff. We all love regex, but the whole point of XML is to allow standardized(eventually) access to files and the like. My point here is that you should probably look to an XML solution. XML::Parser may be needlessly low level for what you want to do, which is perform an operation on a whole xml file, as opposed to doing tag by tag. This sounds like a job for a DOM type parser to me. check out T.J. Mather's modules Since you want to slurp in the whole file, memory is not an issue. Also, if you really want to use XML::Parser, you coud read in the file to your string, transform it SAX style on the fly into another variable and do whatever you want to do with it. Comment? _______________________________________________ "Intelligence is a tool used achieve goals, however goals are not always chosen wisely..."	[reply]
Re: Re: Will regex work here? by Sherlock (Deacon) on Apr 25, 2001 at 00:52 UTC
Well, I'm able to parse the entire XML document already using XML::Parser. For this operation, I don't really need to view the contents of the file as XML at all - just as a single big string. I'd like to simply do a one-shot removal of all the old data and then read it in as XML. The option of using an XML based solution had occured to me, and quite possibly be what I end up using, but I was just interested if a regex could be used here or not. (Like I said, I'm still just learning regex's.) Thanks for the link to Mather's modules - I haven't looked through them closely yet, but they look as if they might be very useful. - Sherlock	[reply]
Re: Will regex work here? by aardvark (Pilgrim) on Apr 25, 2001 at 02:04 UTC
Since you are only interested in certain nodes I'd look at mirod's XML::Twig. I'm trying to learn it myself but it sounds like it should do what you want. There is a tutorial for XML::Twig here I think you can also do something like this using an XSL Transformation. It sounds like you want to transform your old XML into a new XML document, based on the value of the DATE attribute. To do this you need a stylesheet and a transformation engine. There are many XSLT engines out there, you can use AxKit for an all-Perl solution or Xalan for a Java/Apache solution. There are many more that you can find here. I've been reading XSLT: Programmer's Reference for help on XSLT. It is really interesting stuff, but I'd look to XML::Twig and AxKit to help you get your job done today. When you have some time look at the other stuff. I hope this helps. Get Strong Together!!	[reply]
Re: Will regex work here? by mlong (Sexton) on Apr 25, 2001 at 03:51 UTC
If you really want a regex, this might do the job. I have omitted the date comparison code (Left as an exercise for the reader). Try this: #!/usr/bin/perl my $file = shift; my $dDate = shift; open (IN, "$file") or die "Couldn't open file: $file for reading\n"; my $slurp = join('',<IN>); close(IN); while ($slurp =~ /(<DAY DATE=\"(\d?\/\d\d\/\d\d\d\d)\">(.\|\s)+<\/DAY>)/){ # $1 Contains the whole match now. # $2 Contains the date in the current match. if(isOlder($dDate, $2)){ $slurp =~ s/$1//g; } } print $slurp; sub isOlder{ #implement your own date compare return 1; } I used the XML text from your message and put it in a file called test.txt. Usage was: tester.pl test.txt '04/24/2001' You may have to play with it a bit since it is currently too greedy (it substitutes with "nothing" more than I want), but I think this should give you a good start. Good Luck. -Matt	[reply]


Syntactic Confectionery Delight
	PerlMonks