Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

path-names [a very easy question of a true beginner]

by Perlbeginner1 (Scribe)
on Oct 01, 2010 at 23:05 UTC ( #863020=perlquestion: print w/ replies, xml ) Need Help??
Perlbeginner1 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all - hello Community,

i am new to Linux and new to PERL too. I am trying to get this perl script up and running. I have installed OpenSuse 11.3

What is wanted: I have a bunch of HTML-files, stored in a folder. with the Perl-Script (see below) i want to parse the HTML-files.

I have stored the script to the following place:

Basisordner (german word for base folder) > user > perl >
My question is - how to name the paths ...

a. to the html-folder that contains the HTML-files that need to be parsed (i named this folder html.files)
b. how to name the file that has to be created...

i suggest that this files also is located in the same directory: Basisfolder (german word for base folder) > user > perl >
guess that this makes it easy...

Please do not bear with me for the Noob-Questions. If i have to explain more - please let me know!

Love to hear from your - Many thanks in advance for any and all help.

perlbeginner1

see here the code...
#!/usr/bin/perl use strict; use warnings; use diagnostics; use HTML::TokeParser; # my $file = 'school.html'; my @html_files = File::Find::Rule->file->name( '*.html.files' )->in( $ +html_dir ); my $p = HTML::TokeParser->new($file) or die "Can't open: $!"; my %school; while (my $tag = $p->get_tag('div', '/html')) { # first move to the right div that contains the information last if $tag->[0] eq '/html'; next unless exists $tag->[1]{'id'} and $tag->[1]{'id'} eq 'inh +alt_large'; $p->get_tag('h1'); $school{'location'} = $p->get_text('/h1'); while (my $tag = $p->get_tag('div')) { last if exists $tag->[1]{'id'} and $tag->[1]{'id'} eq +'fusszeile'; # get the school name from the heading next unless exists $tag->[1]{'class'} and $tag->[1]{'c +lass'} eq 'fm_linkeSpalte'; $p->get_tag('h2'); $school{'name'} = $p->get_text('/h2'); # verify format for school type $tag = $p->get_tag('span'); unless (exists $tag->[1]{'class'} and $tag->[1]{'class +'} eq 'schulart_text') { warn "unexpected format: parsing stopped"; last; } $school{'type'} = $p->get_text('/span'); # verify format for address $tag = $p->get_tag('p'); unless (exists $tag->[1]{'class'} and $tag->[1]{'class +'} eq 'einzel_text') { warn "unexpected format: parsing stopped"; last; } $school{'address'} = clean_address($p->get_text('/p')) +; # find the description $tag = $p->get_tag('p'); $school{'description'} = $p->get_text('/p'); } } print qq/$school{'name'}n/; print qq/$school{'location'}n/; print qq/$school{'type'}n/; foreach (@{$school{'address'}}) { print "$_\n"; } print qq/nDescription: $school{'description'}n/; sub clean_address { my $text = shift; my @lines = split "\n", $text; foreach (@lines) { s/^s+//; s/s+$//; } return @lines; }

Comment on path-names [a very easy question of a true beginner]
Download Code
Re: path-names [a very easy question of a true beginner]
by Khen1950fx (Canon) on Oct 02, 2010 at 03:56 UTC
    As regards File::Find::Rule, since you are dealing with a bunch of files, you'll get better results if you do a foreach. For example,
    #!/usr/bin/perl use strict; use warnings; use diagnostics; use File::Find::Rule; my @files = File::Find::Rule->file() ->name('*.html') ->in( '/usr/local/apache/cgi-bin' ); foreach my $file(@files) { print $file, "\n"; }
      Hello khen1950fx

      many thanks for the reply - great to hear form you!

      my question is regarding the I-O handle and the path names. I have to find the right path names. Names and conventions that match the linux conventions...My machine runs OpenSuse-Linux version 11.3.

      i took your example and made some slight corrections...

      i took your hints and made this:

      #!/usr/bin/perl use strict; use warnings; use diagnostics; use File::Find::Rule; my @files = File::Find::Rule->file() ->name('*.html') ->in( 'home/usr/perl/html.files' ); foreach my $file(@files) { print $file, "\n"; }



      response:

      suse-linux:/usr/perl # perl perl_script_two.pl
      Can't stat home/usr/html.files: No such file or directory at /usr/lib/perl5/site_perl/5.12.1/File/Find/Rule.pm line 594
      suse-linux:/usr/perl #


      love to hear from you and appreciate any and all help!
      perlbeginner1
        You're using a relative path. I tried it, but I got the same result. Use the absolute path, and it works:).
        #!/usr/bin/perl use strict; use warnings; use diagnostics; use File::Find::Rule; my @files = File::Find::Rule->file() ->name('*.html') ->in( '/home/usr/perl/html.files' ); foreach my $file(@files) { print $file, "\n"; }
        hello dear Now it is clear i misunderstood the german Word Basisordner

        The german word Basisordner in OpenSuseLinux was the directory that i thought is exactly the HOME

        That is not true: The Basisordner ist not "/home" but "/"

        Accordingly i leave /home in in the Skript ;)

        then we have:

        suse-linux:/usr/perl # ls -al /usr/perl/htmlfiles/


        results: <code>

        -rwxrwxrwx 1 root root 16855 Sep 22 02:37 einzelergebnisedf8.html
        -rwxrwxrwx 1 root root 16893 Sep 22 04:27 einzelergebnisedfe.html
        -rwxrwxrwx 1 root root 17035 Sep 22 02:55 einzelergebnisee02.html
        -rwxrwxrwx 1 root root 16926 Sep 22 03:38 einzelergebnisee05-2.html
        -rwxrwxrwx 1 root root 17042 Sep 22 01:03 einzelergebnisee05.html
        -rwxrwxrwx 1 root root 16986 Sep 22 03:10 einzelergebnisee06.html
        -rwxrwxrwx 1 root root 17784 Sep 22 03:43 einzelergebnisee08-2.html
        -rwxrwxrwx 1 root root 17016 Sep 21 23:55 einzelergebnisee08.html
        -rwxrwxrwx 1 root root 17456 Sep 22 00:08 einzelergebnisee0c.html
        -rwxrwxrwx 1 root root 17176 Sep 22 03:36 einzelergebnisee15.html
        -rwxrwxrwx 1 root root 17568 Sep 22 03:45 einzelergebnisee16.html
        -rwxrwxrwx 1 root root 17216 Sep 21 23:56 einzelergebnisee18.html
        -rwxrwxrwx 1 root root 17011 Sep 22 04:21 einzelergebnisee1b.html
        -rwxrwxrwx 1 root root 16898 Sep 22 01:02 einzelergebnisee24.html
        -rwxrwxrwx 1 root root 16992 Sep 22 04:32 einzelergebnisee29.html
        -rwxrwxrwx 1 root root 16898 Sep 22 04:13 einzelergebnisee2d.html
        -rwxrwxrwx 1 root root 17051 Sep 22 03:14 einzelergebnisee31.html
        -rwxrwxrwx 1 root root 16922 Sep 22 04:22 einzelergebnisee35.html
        -rwxrwxrwx 1 root root 17104 Sep 22 00:42 einzelergebnisee3d.html
        -rwxrwxrwx 1 root root 17113 Sep 22 03:03 einzelergebnisee3e.html
        -rwxrwxrwx 1 root root 16961 Sep 22 04:29 einzelergebnisee3f.html
        -rwxrwxrwx 1 root root 17040 Sep 22 03:40 einzelergebnisee45.html
        -rwxrwxrwx 1 root root 17027 Sep 22 00:03 einzelergebnisee4c.html
        -rwxrwxrwx 1 root root 16850 Sep 22 02:56 einzelergebnisee4f-2.html
        -rwxrwxrwx 1 root root 17053 Sep 22 03:55 einzelergebnisee4f-3.html
        -rwxrwxrwx 1 root root 17159 Sep 22 00:56 einzelergebnisee4f.html
        -rwxrwxrwx 1 root root 19650 Sep 21 23:49 einzelergebnisee55.html


        and so forth ----.. more than 20 000 lines...

        suse-linux:/usr/perl # cd usr/QUOTE


        now we are a step ahead. That is great!

        perlbeginner1
Re: path-names [a very easy question of a true beginner]
by tinita (Parson) on Oct 03, 2010 at 13:04 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://863020]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (14)
As of 2015-07-07 15:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (90 votes), past polls