Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
XP is just a number
 
PerlMonks  

path-names [a very easy question of a true beginner]

by Perlbeginner1 (Beadle)
on Oct 01, 2010 at 23:05 UTC ( #863020=perlquestion: print w/ replies, xml ) Need Help??
Perlbeginner1 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all - hello Community,

i am new to Linux and new to PERL too. I am trying to get this perl script up and running. I have installed OpenSuse 11.3

What is wanted: I have a bunch of HTML-files, stored in a folder. with the Perl-Script (see below) i want to parse the HTML-files.

I have stored the script to the following place:

Basisordner (german word for base folder) > user > perl >
My question is - how to name the paths ...

a. to the html-folder that contains the HTML-files that need to be parsed (i named this folder html.files)
b. how to name the file that has to be created...

i suggest that this files also is located in the same directory: Basisfolder (german word for base folder) > user > perl >
guess that this makes it easy...

Please do not bear with me for the Noob-Questions. If i have to explain more - please let me know!

Love to hear from your - Many thanks in advance for any and all help.

perlbeginner1

see here the code...
#!/usr/bin/perl use strict; use warnings; use diagnostics; use HTML::TokeParser; # my $file = 'school.html'; my @html_files = File::Find::Rule->file->name( '*.html.files' )->in( $ +html_dir ); my $p = HTML::TokeParser->new($file) or die "Can't open: $!"; my %school; while (my $tag = $p->get_tag('div', '/html')) { # first move to the right div that contains the information last if $tag->[0] eq '/html'; next unless exists $tag->[1]{'id'} and $tag->[1]{'id'} eq 'inh +alt_large'; $p->get_tag('h1'); $school{'location'} = $p->get_text('/h1'); while (my $tag = $p->get_tag('div')) { last if exists $tag->[1]{'id'} and $tag->[1]{'id'} eq +'fusszeile'; # get the school name from the heading next unless exists $tag->[1]{'class'} and $tag->[1]{'c +lass'} eq 'fm_linkeSpalte'; $p->get_tag('h2'); $school{'name'} = $p->get_text('/h2'); # verify format for school type $tag = $p->get_tag('span'); unless (exists $tag->[1]{'class'} and $tag->[1]{'class +'} eq 'schulart_text') { warn "unexpected format: parsing stopped"; last; } $school{'type'} = $p->get_text('/span'); # verify format for address $tag = $p->get_tag('p'); unless (exists $tag->[1]{'class'} and $tag->[1]{'class +'} eq 'einzel_text') { warn "unexpected format: parsing stopped"; last; } $school{'address'} = clean_address($p->get_text('/p')) +; # find the description $tag = $p->get_tag('p'); $school{'description'} = $p->get_text('/p'); } } print qq/$school{'name'}n/; print qq/$school{'location'}n/; print qq/$school{'type'}n/; foreach (@{$school{'address'}}) { print "$_\n"; } print qq/nDescription: $school{'description'}n/; sub clean_address { my $text = shift; my @lines = split "\n", $text; foreach (@lines) { s/^s+//; s/s+$//; } return @lines; }

Comment on path-names [a very easy question of a true beginner]
Download Code
Re: path-names [a very easy question of a true beginner]
by Khen1950fx (Canon) on Oct 02, 2010 at 03:56 UTC
    As regards File::Find::Rule, since you are dealing with a bunch of files, you'll get better results if you do a foreach. For example,
    #!/usr/bin/perl use strict; use warnings; use diagnostics; use File::Find::Rule; my @files = File::Find::Rule->file() ->name('*.html') ->in( '/usr/local/apache/cgi-bin' ); foreach my $file(@files) { print $file, "\n"; }
      Hello khen1950fx

      many thanks for the reply - great to hear form you!

      my question is regarding the I-O handle and the path names. I have to find the right path names. Names and conventions that match the linux conventions...My machine runs OpenSuse-Linux version 11.3.

      i took your example and made some slight corrections...

      i took your hints and made this:

      #!/usr/bin/perl use strict; use warnings; use diagnostics; use File::Find::Rule; my @files = File::Find::Rule->file() ->name('*.html') ->in( 'home/usr/perl/html.files' ); foreach my $file(@files) { print $file, "\n"; }



      response:

      suse-linux:/usr/perl # perl perl_script_two.pl
      Can't stat home/usr/html.files: No such file or directory at /usr/lib/perl5/site_perl/5.12.1/File/Find/Rule.pm line 594
      suse-linux:/usr/perl #


      love to hear from you and appreciate any and all help!
      perlbeginner1
        You're using a relative path. I tried it, but I got the same result. Use the absolute path, and it works:).
        #!/usr/bin/perl use strict; use warnings; use diagnostics; use File::Find::Rule; my @files = File::Find::Rule->file() ->name('*.html') ->in( '/home/usr/perl/html.files' ); foreach my $file(@files) { print $file, "\n"; }
        hello dear Now it is clear i misunderstood the german Word Basisordner

        The german word Basisordner in OpenSuseLinux was the directory that i thought is exactly the HOME

        That is not true: The Basisordner ist not "/home" but "/"

        Accordingly i leave /home in in the Skript ;)

        then we have:

        suse-linux:/usr/perl # ls -al /usr/perl/htmlfiles/


        results: <code>

        -rwxrwxrwx 1 root root 16855 Sep 22 02:37 einzelergebnisedf8.html
        -rwxrwxrwx 1 root root 16893 Sep 22 04:27 einzelergebnisedfe.html
        -rwxrwxrwx 1 root root 17035 Sep 22 02:55 einzelergebnisee02.html
        -rwxrwxrwx 1 root root 16926 Sep 22 03:38 einzelergebnisee05-2.html
        -rwxrwxrwx 1 root root 17042 Sep 22 01:03 einzelergebnisee05.html
        -rwxrwxrwx 1 root root 16986 Sep 22 03:10 einzelergebnisee06.html
        -rwxrwxrwx 1 root root 17784 Sep 22 03:43 einzelergebnisee08-2.html
        -rwxrwxrwx 1 root root 17016 Sep 21 23:55 einzelergebnisee08.html
        -rwxrwxrwx 1 root root 17456 Sep 22 00:08 einzelergebnisee0c.html
        -rwxrwxrwx 1 root root 17176 Sep 22 03:36 einzelergebnisee15.html
        -rwxrwxrwx 1 root root 17568 Sep 22 03:45 einzelergebnisee16.html
        -rwxrwxrwx 1 root root 17216 Sep 21 23:56 einzelergebnisee18.html
        -rwxrwxrwx 1 root root 17011 Sep 22 04:21 einzelergebnisee1b.html
        -rwxrwxrwx 1 root root 16898 Sep 22 01:02 einzelergebnisee24.html
        -rwxrwxrwx 1 root root 16992 Sep 22 04:32 einzelergebnisee29.html
        -rwxrwxrwx 1 root root 16898 Sep 22 04:13 einzelergebnisee2d.html
        -rwxrwxrwx 1 root root 17051 Sep 22 03:14 einzelergebnisee31.html
        -rwxrwxrwx 1 root root 16922 Sep 22 04:22 einzelergebnisee35.html
        -rwxrwxrwx 1 root root 17104 Sep 22 00:42 einzelergebnisee3d.html
        -rwxrwxrwx 1 root root 17113 Sep 22 03:03 einzelergebnisee3e.html
        -rwxrwxrwx 1 root root 16961 Sep 22 04:29 einzelergebnisee3f.html
        -rwxrwxrwx 1 root root 17040 Sep 22 03:40 einzelergebnisee45.html
        -rwxrwxrwx 1 root root 17027 Sep 22 00:03 einzelergebnisee4c.html
        -rwxrwxrwx 1 root root 16850 Sep 22 02:56 einzelergebnisee4f-2.html
        -rwxrwxrwx 1 root root 17053 Sep 22 03:55 einzelergebnisee4f-3.html
        -rwxrwxrwx 1 root root 17159 Sep 22 00:56 einzelergebnisee4f.html
        -rwxrwxrwx 1 root root 19650 Sep 21 23:49 einzelergebnisee55.html


        and so forth ----.. more than 20 000 lines...

        suse-linux:/usr/perl # cd usr/QUOTE


        now we are a step ahead. That is great!

        perlbeginner1
Re: path-names [a very easy question of a true beginner]
by tinita (Parson) on Oct 03, 2010 at 13:04 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://863020]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (8)
As of 2014-04-19 19:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (483 votes), past polls