Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

(Dermot) Re: Stripping page headers

by Dermot (Scribe)
on Sep 27, 2000 at 21:26 UTC ( #34244=note: print w/replies, xml ) Need Help??


in reply to Stripping page headers

There are two main approaches to this problem insofar as I can see. The first is to strip out the bits of text that you don't want leaving the bits that you do want. The second approach is to ignore the bits you don't want and use a regex to match the bits that you do want (the records). I would be inclined to strip out the header using something along the lines of:
#!/usr/bin/perl -w use strict; my ($REPFILE, $report); undef $/; # Allows whole file to be slurped open REPFILE, "sample.rep" or die "Can't open file $REPFILE: $!\n"; $report = <REPFILE>; # All file now in report variable # Only do this for reasonably sized # report files or buy some memory :) $report =~ s/^User Report//g; $report =~ s/^Other Header Stuff//g; print $report;
Second approach, building a regex to strip out the data you do want is left as an exercise for the reader.

Replies are listed 'Best First'.
RE: Re: Stripping page headers
by Anonymous Monk on Sep 27, 2000 at 23:40 UTC
    I really like this approach, but problem is it doesnt appear to be doin anything. The file I get out of it is identical to the original when compared. Where could the error be (and i copied it practically verbatim). Thanks!
      If you're getting the same output as the input it means the substitution is not happening. Post the s/// that you are using and the file you are running it on. One possible problem would be using ^User as the regex but there are spaces before the word User in the file i.e spaces between the start of the line which is indicated by the caret (^) symbol and the text. Not sure what else it could be. You could put an if around the substitution and see if it isn't happening.
        Lets try this again. hehe <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <meta name="GENERATOR" content="Microsoft FrontPage 4.0"> <meta name="ProgId" content="FrontPage.Editor.Document"> </head> <body>

        #!/usr/bin/perl -w
        use strict;

        my ($REPFILE, $report);

        undef $/;

        open REPFILE, "report.rpt" or die "Cant open $REPFILE: $!\n";

        $report = <REPFILE>;

        $report =~ s/^User Report//g;
        $report =~ s/^All Users//g;
        $report =~ s/^User Name//g;
        $report =~ s/^-> Token//g;

        print $report;

        close REPFILE;

        #First section of report.rpt follows

         

        User Report Date: 09/26/2000 09:55:13

        All Users Page: 1 of 114

         

        User Name        Default Login Name        Default Shell Name

        -> Token Serial No.         Replacement Last Login         Original Token Type

        Temp 1             Temp1

        -> 000050488538                 01/01/1986 00:00:00                 SoftID

        Temp 2             temp2

        -> 000050488537                 01/01/1986 00:00:00                 SoftID

        Temp 3             temp3

        -> 000050488536                 01/01/1986 00:00:00                 SoftID

        </body> </html>
        Heres what I got so far. thanks for the assist again! #!/usr/bin/perl -w use strict; my ($REPFILE, $report); undef $/; open REPFILE, "report.rpt" or die "Cant open $REPFILE: $!\n"; $report = <REPFILE>; $report =~ s/^User Report//g; $report =~ s/^All Users//g; $report =~ s/^User Name//g; $report =~ s/^-> Token//g; print $report; close REPFILE; #First section of report.rpt follows User Report Date: 09/26/2000 09:55:13 All Users Page: 1 of 114 User Name Default Login Name Default Shell Name -> Token Serial No. Replacement Last Login Original Token Type Temp 1 Temp1 -> 000026546546 01/01/1986 00:00:00 SoftID Temp 2 temp2 -> 000034535654 01/01/1986 00:00:00 SoftID Temp 3 temp3 -> 000023465467 01/01/1986 00:00:00 SoftID
        Lets try this again. hehe <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <meta name="GENERATOR" content="Microsoft FrontPage 4.0"> <meta name="ProgId" content="FrontPage.Editor.Document"> </head> <body>

        #!/usr/bin/perl -w
        use strict;

        my ($REPFILE, $report);

        undef $/;

        open REPFILE, "report.rpt" or die "Cant open $REPFILE: $!\n";

        $report = <REPFILE>;

        $report =~ s/^User Report//g;
        $report =~ s/^All Users//g;
        $report =~ s/^User Name//g;
        $report =~ s/^-> Token//g;

        print $report;

        close REPFILE;

        #First section of report.rpt follows

         

        User Report Date: 09/26/2000 09:55:13

        All Users Page: 1 of 114

         

        User Name        Default Login Name        Default Shell Name

        -> Token Serial No.         Replacement Last Login         Original Token Type

        Temp 1             Temp1

        -> 000050488538                 01/01/1986 00:00:00                 SoftID

        Temp 2             temp2

        -> 000050488537                 01/01/1986 00:00:00                 SoftID

        Temp 3             temp3

        -> 000050488536                 01/01/1986 00:00:00                 SoftID

        </body> </html>

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://34244]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2020-10-24 01:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (242 votes). Check out past polls.

    Notices?