Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Will regex work here?

by Sherlock (Deacon)
on Apr 25, 2001 at 00:11 UTC ( [id://75246]=perlquestion: print w/replies, xml ) Need Help??

Sherlock has asked for the wisdom of the Perl Monks concerning the following question:

I'm rather new to Perl development and I've come across a situation where I think the use of some sort of regex might work, but I'm not very good with them yet, so I need a little help.

Here's the situation. I'm reading a list of appointments from an XML file and I'd like to remove any old dates from the file. Here's a sample of the XML to show you the schema:
<SCHEDULE> <DAY DATE="4/21/2001"> <APPT> <START>10:00</START> <STOP>11:00</STOP> </APPT> <APPT> <START>12:00</START> <STOP>13:30</STOP> </APPT> </DAY> <DAY> ... </DAY> ... </CALENDAR>
What I'd like to accompish is to read this file in (possibly into a single scalar, for simplicity) and remove all <DAY> nodes with a "DATE" attribute prior to a specific date, such as the current date.

I was thinking of using some permutation of the s/// operator to accomplish someting like:
$xmlDoc =~ s/<DATE DAY="(Everything less than today)"...</DATE>//g;
This is where I'm having trouble. Can I use some sort of regex to pull this off or is there possibly another, simpler way of accomplishing this.

Thanks,
- Sherlock

P.S.
I'd like to send a little "Thank you" out to OeufMayo for the XML::Parser Tutorial that really got me going on this project. ;-)

Replies are listed 'Best First'.
Re: Will regex work here?
by strredwolf (Chaplain) on Apr 25, 2001 at 00:27 UTC
    You really can't use a regexp across a \n, unless you suck the whole file in... then it gets rather nasty.

    The XML modules would help, as well as some code from my chatterbox client if you want to code it by hand.

    --
    $Stalag99{"URL"}="http://stalag99.keenspace.com";

      For simplicity, the entire XML file is on one line. It isn't edited by hand, so that worked out fine. Therefore, there aren't any \n characters in the XML file. I just printed it that way so that people could understand the XML a little better.

      I should have stated that in my original post. Sorry.

      - Sherlock
Re: Will regex work here?
by chorg (Monk) on Apr 25, 2001 at 00:43 UTC
    If you are using XML, you should be able to avoid the regex stuff. We all love regex, but the whole point of XML is to allow standardized(eventually) access to files and the like.

    My point here is that you should probably look to an XML solution. XML::Parser may be needlessly low level for what you want to do, which is perform an operation on a whole xml file, as opposed to doing tag by tag. This sounds like a job for a DOM type parser to me. check out T.J. Mather's modules Since you want to slurp in the whole file, memory is not an issue.

    Also, if you really want to use XML::Parser, you coud read in the file to your string, transform it SAX style on the fly into another variable and do whatever you want to do with it.

    Comment?
    _______________________________________________
    "Intelligence is a tool used achieve goals, however goals are not always chosen wisely..."

      Well, I'm able to parse the entire XML document already using XML::Parser. For this operation, I don't really need to view the contents of the file as XML at all - just as a single big string. I'd like to simply do a one-shot removal of all the old data and then read it in as XML.

      The option of using an XML based solution had occured to me, and quite possibly be what I end up using, but I was just interested if a regex could be used here or not. (Like I said, I'm still just learning regex's.)

      Thanks for the link to Mather's modules - I haven't looked through them closely yet, but they look as if they might be very useful.

      - Sherlock
Re: Will regex work here?
by aardvark (Pilgrim) on Apr 25, 2001 at 02:04 UTC
    Since you are only interested in certain nodes I'd look at mirod's XML::Twig. I'm trying to learn it myself but it sounds like it should do what you want. There is a tutorial for XML::Twig here

    I think you can also do something like this using an XSL Transformation. It sounds like you want to transform your old XML into a new XML document, based on the value of the DATE attribute. To do this you need a stylesheet and a transformation engine. There are many XSLT engines out there, you can use AxKit for an all-Perl solution or Xalan for a Java/Apache solution. There are many more that you can find here. I've been reading XSLT: Programmer's Reference for help on XSLT. It is really interesting stuff, but I'd look to XML::Twig and AxKit to help you get your job done today. When you have some time look at the other stuff.

    I hope this helps.

    Get Strong Together!!

Re: Will regex work here?
by mlong (Sexton) on Apr 25, 2001 at 03:51 UTC
    If you really want a regex, this might do the job. I have omitted the date comparison code (Left as an exercise for the reader). Try this:

    #!/usr/bin/perl
    
    my $file = shift;
    my $dDate = shift;
    
    open (IN, "$file") or die "Couldn't open file: $file for reading\n";
    
    my $slurp = join('',<IN>);
    
    close(IN);
    
    while ($slurp =~ /(<DAY DATE=\"(\d?\/\d\d\/\d\d\d\d)\">(.|\s)+<\/DAY>)/){
            # $1 Contains the whole match now.
            # $2 Contains the date in the current match.
    	if(isOlder($dDate, $2)){
    		$slurp =~ s/$1//g;
    	}
    }
    print $slurp;
    
    sub isOlder{
    	#implement your own date compare
    	return 1;
    
    }
    
    

    I used the XML text from your message and put it in a file called test.txt. Usage was: tester.pl test.txt '04/24/2001'

    You may have to play with it a bit since it is currently too greedy (it substitutes with "nothing" more than I want), but I think this should give you a good start. Good Luck.

    -Matt

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://75246]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-04-25 15:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found