Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

regex help please

by sitnalta (Initiate)
on Oct 27, 2006 at 18:28 UTC ( #580985=perlquestion: print w/replies, xml ) Need Help??
sitnalta has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I was banging my head on trying to get this one regular expresion to delete the following two examples completely.

Here are two examples of the text I want to complete remove. They all start with rcvtime and they all end with <0a>, but it MUST be the first <0a> and not the second. Any ideas?

Example text:



Thanks in advance i appreciate and tips and tricks.

Replies are listed 'Best First'.
Re: regex help please
by chargrill (Parson) on Oct 27, 2006 at 18:53 UTC

    Hi sitnalta,

    Before jumping to any conclusions as to why the first response to your question should or should not work, please show us what you've tried, and what you expect as the results. I'm especially interested in why you would want to match only the first <0a> and not the second, especially considering your original statement is that you want them completely removed. Are you using a multi-line regex and you have data that you do NOT want to delete between those lines?

    Please see I know what I mean. Why don't you? for more information.

    s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)
      What I am attempting to do is a log file, put it into perl and to output it into a more readable form and eventually do some form of reporting on it. Who knows maybe even drop some info into a database. This is all for learning experience and simply for the challenge, since perl is fun even though I am not to good with it.

      Here is what the original message looks like, keep in mind I changed the original text message and dstaddr for personal reasons:

      15025 0 0 Note;MMG_MDR:ref=0:mdrname=DELIVERED:sysid=localhost/16503:svcname=GSM:host=atlmmg01:proc=SMEGwy-concen:rcvtime=20 06102600002184816-:sndtime=2006102600002199516-:rcvuser=SMPP-XMIT7:snduser=CHI_RT:rcvtype=SERVER:sndtype=CLIENT:rcvnet=SMPP SRV:sndnet=:rcvacc=localhost/16551:sndacc=CHI_RT:dstaddr=1234567890:orgaddr=1010100001:intmrf=0CD90A2E420D454032D51D5:extmrf= ABB0nyJk:msgstat=[0] Delivered:msgoper=SUBMIT:msgtype=NORMAL:usrinfo=:msglen=100:msgtext=FRM\3aJodi<0a>SUBJ\3alater<0a>MSG\3a TEXT OF MESSAGE GOES HERE - www .maybe-even-a-url .com –

      So what I noticed is the common deliminater is “:” so I figured I should learn howto use split and and take advantage of that. After doing so I wanted to see what it looked like if I searched for keywords in the message within the log which I pumped into an array. This is where you will see if (/Sshare/, /Ttime/) { which I eventually want to learn howto make this common line arguments. So after searching for keywords the following output was given:

      rcvtime=2006102611380086416-sndtime=2006102611380004316-msgtext=FRM\3aDang<0a>SUBJ\3alater<0a>MSG\3aMessage I looked for goes here - www.maybeevenaURL .com –

      So that’s cool, I learned how cool split is and howto to semi work with it. Now I wanted to chop out the junk so to say. This is when I started to slowly carve it out with the following before I got bored of doing it and realized I should just learn to do it correctly, here are the examples which are in the script commented out in the script of what I tried:

      # s/rcvtime=[0-9]+//; # s/sndtime=[0-9]+//; # s/msgtext//; # s/FRM//; # s/3a[a-z,[A-Z]+//;

      Here is where I am at right now, all I am trying to do so far is learn to script with perl. Suggestions are most welcomed.

      #!/usr/bin/perl -w use strict; open(SPAMMESSEGES, "spammessages"); while (<SPAMMESSEGES>) { my $spam = <SPAMMESSEGES>; my @spammer = split (/:/); foreach (@spammer) { if (/[Ss]hare/, /[Tt]ime/) { # s/rcvtime=[0-9]+//; # s/sndtime=[0-9]+//; # s/msgtext//; # s/FRM//; # s/3a[a-z,[A-Z]+//; s/rcvtime(.*)\SBJ>//; print $_; } } }
        Your problem appears to be that
        while (<SPAMMESSEGES>) {
        Assigns one line to $_ and then
        my $spam = <SPAMMESSEGES>;
        Assigns the next line to $spam so you are only processing every second line.

Re: regex help please
by ikegami (Pope) on Oct 27, 2006 at 20:24 UTC
    From "not the second",I presume both lines are present in the variable being searched. (Previous posts assumed you were reading a line at a time.)

    The following two snippets will stop matching at the first <0a>, not the second.

Re: regex help please
by Anonymous Monk on Oct 27, 2006 at 18:38 UTC
    $_ =~ s/rcvtime(.*)\<0a\>//;

      -- for useless use of capturing parens, and also for backslashing < and >

      I'm also thinking there's something the OP left out of the specification, because of the stipulation about "must match the first but not the second", so your solution is likely incomplete.

      s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)
        Good call...this might do the trick (assuming Roxy is static)

        while(<DATA>) { chomp; $_ =~ s/rcvtime.*<0a>// unless /Roxy/; print "$_\n";; } __DATA__ rcvtime=2006102600322813316-sndtime=2006102600323042116-msgtext=FRM\3a +Matd<0a> rcvtime=2006102611373625516-sndtime=2006102611373640716-msgtext=FRM\3a +Roxy<0a>

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://580985]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2018-03-22 08:27 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (273 votes). Check out past polls.