comment on

hello all

many thanks to you! I did as you adviced me! And now i was successful! i changed from

#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
use File::Find::Rule;
my @files = File::Find::Rule->file()
                 ->name('einzelergebnis*.html')
                 ->in( '/home/usr/perl/htmlfiles' );
foreach my $file(@files) {
        print $file, "\n";

}
[download]

to this


#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
use File::Find::Rule;
my @files = File::Find::Rule->file()
                 ->name('einzelergebnis*.html')
                 ->in( '.' );
foreach my $file(@files) {
        print $file, "\n";

}
[download]

and then i got the following output:

htmlfiles/einzelergebnis80b5.html<br>
htmlfiles/einzelergebnisa0ef.html<br>
htmlfiles/einzelergebnis1b42.html<br>
htmlfiles/einzelergebnis5960.html<br>
htmlfiles/einzelergebnise523.html<br>
htmlfiles/einzelergebnis2c7e.html<br>
htmlfiles/einzelergebnisdf57.html<br>
htmlfiles/einzelergebnis2b53-2.html<br>
htmlfiles/einzelergebnisb1c0-2.html<br>
htmlfiles/einzelergebnis8e8b.html<br>
htmlfiles/einzelergebnisdcc1.html<br>
htmlfiles/einzelergebnis1dae-2.html<br>
htmlfiles/einzelergebnisa70d.html<br>
htmlfiles/einzelergebnis3cec.html<br>
htmlfiles/einzelergebnis3f1f.html<br>
htmlfiles/einzelergebnis1d2b.html<br>
htmlfiles/einzelergebnis396c.html<br>
htmlfiles/einzelergebnis2592.html<br>
htmlfiles/einzelergebnisdee0.html<br>
htmlfiles/einzelergebnis987b-2.html<br>
htmlfiles/einzelergebnise20b.html<br>
[download]

...and 22 thousand lines further... ;-)

This seems to be the starting point! now i can continue figuring out how i have to configure the script of Keath - see more here URL=http://forums.devshed.com/showpost.php?p=2538358&postcount=12see this link to another thread here in this great forum - with the little script/URL . As this previous thread is very very long i think that it is worth to begin a new one! Note: many many thanks to Keath and Axldrweil for their great and generous help!!! So after having nailed down the I-O handle-issues and the path names in General the parser-script has to be configured.

well this means i have to define the paths in $file the file/directory incl. path and furthermore to define a path in $html_dir
BTW – what does the

 Array @html_files   do
[download]

here the full code or the html-parser:


#!/usr/bin/perl
use strict;
use warnings;

use HTML::TokeParser;

my $file = 'school.html';
my $p = HTML::TokeParser->new($file) or die "Can't open: $!";

my %school;
while (my $tag = $p->get_tag('div', '/html')) {
    # first move to the right div that contains the information
    last if $tag->[0] eq '/html';
    next unless exists $tag->[1]{'id'} and $tag->[1]{'id'} eq 'inhalt_
+large';
    
    $p->get_tag('h1');
    $school{'location'} = $p->get_text('/h1');
    
    while (my $tag = $p->get_tag('div')) {
        last if exists $tag->[1]{'id'} and $tag->[1]{'id'} eq 'fusszei
+le';
        
        # get the school name from the heading
        next unless exists $tag->[1]{'class'} and $tag->[1]{'class'} e
+q 'fm_linkeSpalte';
        $p->get_tag('h2');
        $school{'name'} = $p->get_text('/h2');
        
        # verify format for school type
        $tag = $p->get_tag('span');
        unless (exists $tag->[1]{'class'} and $tag->[1]{'class'} eq 's
+chulart_text') {
            warn "unexpected format: parsing stopped";
            last;
        }
        $school{'type'} = $p->get_text('/span');
        
        # verify format for address
        $tag = $p->get_tag('p');
        unless (exists $tag->[1]{'class'} and $tag->[1]{'class'} eq 'e
+inzel_text') {
            warn "unexpected format: parsing stopped";
            last;
        }
        $school{'address'} = clean_address($p->get_text('/p'));
        
        # find the description
        $tag = $p->get_tag('p');
        $school{'description'} = $p->get_text('/p');
    }
}

print qq/$school{'name'}\n/;
print qq/$school{'location'}\n/;
print qq/$school{'type'}\n/;

foreach (@{$school{'address'}}) {
    print "$_\n";
}

print qq/\nDescription: $school{'description'}\n/;

sub clean_address {
    my $text = shift;
    my @lines = split "\n", $text;
    foreach (@lines) {
        s/^\s+//;
        s/\s+$//;
    }
    return \@lines;
}
[download]

Note: i can provide you with much further information - on what the script does!

i look forward to any and all help! This is a very very great place to share knowlege!! MAny many thanks for this great plac3!
perlbeginner1!

In reply to Re^7: path-names [a very easy question of a true beginner] by Perlbeginner1
in thread path-names [a very easy question of a true beginner] by Perlbeginner1

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


No such thing as a small change
	PerlMonks