Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

extracting from web and adding to csv file with their links

by programmer.perl (Beadle)
on Jul 05, 2012 at 17:23 UTC ( #980120=perlquestion: print w/replies, xml ) Need Help??
programmer.perl has asked for the wisdom of the Perl Monks concerning the following question:

Finally, I (and zentara) finished scripting this project. Code I paste below. There is one problem: how I can add also some links to a NYSE.csv file? Links (Chart, Profile, More) of first ten lines must be added. Here, text (Chart, Profile and More) don't have a links, and for NYSE.csv file for first ten rows, these text (Chart, Pro, More) must have links (see Any help will move forward these project. Thanks.

#!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use HTTP::Request::Common qw(GET); use HTML::TokeParser::Simple; my $ua = LWP::UserAgent->new; # Define user agent type $ua->agent('MyApp/0.1 '); my @requests = ( '', 'http://fina', '', +'', ); my @lin = ""; my $file = "trash"; open (FH, ">$file"); select(FH); # loop thru them foreach my $requested ( @requests ) { print "STARTING $requested ###########################\n\n\n\n\n"; # Request object my $req = GET $requested; # Make the request my $res = $ua->request($req); my $con = $res->content; # print "$con\n"; my $p = HTML::TokeParser::Simple->new( \$con ); while ( my $token = $p->get_token ) { # This prints all text in an HTM +L doc (i.e., it strips the HTML) next unless $token->is_text; print $token->as_is, ";"; } print "ENDING $requested ###########################\n\n\n\n\n\n"; } # + end of loop close (FH); select (STDOUT); open (FG, "<$file") || die "Can't open $file for a reading: $!\n"; while (<FG>) { push(@lin, $2) if $_ =~ /(Volume Leaders;US;NASDAQ;AMEX;NYSE;)(Sym +bo +l;Name.*)/g; } close(FG); foreach my $vf (@lin) { $vf =~ s/; ;/;/g; } foreach my $vf (@lin) { $vf =~ s/;/|/g; } foreach my $vf (@lin) { $vf =~ s/Symbol\|Name\|Last Trade\|Change\|Volume\|Related Info\|/SYMB + +OL\|NAME\|LAST TRADE\|CHANGE\|VOLUME\|RELATED INFO\n/g; $vf =~ s/Chart\|, \|Profile\|, \|More\|/Chart, Profile, More\n/g; $vf =~ s/(\&amp)\|/\&/g; $vf =~ s/\| \(/ \(/g; $vf =~ s/\|(\d|\d\d|\d\d\d)\.(\d|\d\d|\d\d\d)\|(\d|\d\d|\d\d\d)\:(\d|\ + +d\d|\d\d\d)/\|$1\.$2 $3\:$4/g; } my $date = localtime; $date =~ s/ /_/g; my $us = "US-$date.csv"; my $nasdaq = "NASDAQ-$date.csv"; my $amex = "AMEX-$date.csv"; my $nyse = "NYSE-$date.csv"; open (US, ">$us") || die "US.csv: $!\n"; open (NASDAQ, ">$nasdaq") || die "NASDAQ.csv: $!\n"; open (AMEX, ">$amex") || die "AMEX.csv: $!\n"; open (NYSE, ">$nyse") || die "NYSE.csv: $!\n"; print US $lin[1]; print NASDAQ $lin[2]; print AMEX $lin[3]; print NYSE $lin[4]; close (US); close (NASDAQ); close (AMEX); close (NYSE); unlink $file; # delete a file 'trash' exit 0;

Replies are listed 'Best First'.
Re: extracting from web and adding to csv file with their links
by zentara (Archbishop) on Jul 05, 2012 at 17:51 UTC
Re: extracting from web and adding to csv file with their links
by frozenwithjoy (Priest) on Jul 06, 2012 at 02:39 UTC

    Hi there. I was trying to run your script and found that the output files were empty. I tracked down the problem to Line 42:

    push(@lin, $2) if $_ =~ /(Volume Leaders;US;NASDAQ;AMEX;NYSE;)(Sym +bol;Name.*)/g;

    You need to get rid of the " +" in Sym +bol.

    edit: there is another on (although benign) in line 54:

    $vf =~ s/Symbol\|Name\|Last Trade\|Change\|Volume\|Related Info\|/SYMB +OL\|NAME\|LAST TRADE\|CHANGE\|VOLUME\|RELATED INFO\n/g;

    edit2: and another on line 57. This one screws up your regex:

    $vf =~ s/\|(\d|\d\d|\d\d\d)\.(\d|\d\d|\d\d\d)\|(\d|\d\d|\d\d\d)\:(\d|\ + +d\d|\d\d\d)/\|$1\.$2 $3\:$4/g; }

    To avoid these, you want to be careful to click the 'Download' link before copying code from PM, since lines that get wrapped get " +" added to the break.

    Regarding that regex I pointed out, I'd like to recommend an easier to write/read/maintain way of doing this:

    $vf =~ s/\|(\d{1,3})\.(\d{1,3})\|(\d{1,3})\:(\d{1,3})/\|$1\.$2 $3\:$4/g;

    However, I suspect that you meant to use the following since $3 and $4 are matching a time:

    $vf =~ s/\|(\d{1,3})\.(\d{1,3})\|(\d{1,2})\:(\d{2})/\|$1\.$2 $3\:$4/g;

    Since you are matching 1, 2, or 3 digits, instead of writing out each variation, you can specify a range using the curlies. Check out QUANTIFIERS for more info.

Re: extracting from web and adding to csv file with their links
by frozenwithjoy (Priest) on Jul 06, 2012 at 03:28 UTC

    OK, since I'm not entirely sure what you need, perhaps you could elaborate. To start, this is the first few lines from one of the output files:

    SYMBOL|NAME|LAST TRADE|CHANGE|VOLUME|RELATED INFO BAC|Bank of America Corporation Com|7.82 4:00PM EDT|0.24 (2.98%)|120,1 +61,969|Chart, Profile, More JPM|JP Morgan Chase & Co. Common St|34.38 4:02PM EDT|1.50 (4.18%)|58,5 +29,307|Chart, Profile, More SIRI|Sirius XM Radio Inc.|2.09 4:00PM EDT|0.05 (2.45%)|52,421,104|Char +t, Profile, More

    Are you wanting to change "Chart, Profile, More" into "Chart, Profile, More"? If so, since you are only collecting the text and not links from the yahoo pages, your best bet is to harvest the symbols (e.g., BAC) from what you have and stick them into Yahoo's link template (e.g. for Charts). Assuming this is what you are trying to do, I wrote to following as a proof-of-concept and used it to replace print US $lin[1];:

    for my $lin_line ( split (/\n/, $lin[1]) ) { my ( $symbol ) = $lin_line =~ m/(\w*)|/; next if $symbol eq "SYMBOL"; $lin_line =~ s|(Chart)|$1:$symbol +|; } continue { print US $lin_line, "\n"; }

    The output looks like:

    If this is what you are going for, you should elaborate on so that it reports links for 'Profile' and 'More' for all your output files.

    One other thing... You are saving your files as csv, but the data you output is not comma-delimited (unless you only want 3 columns).

      Yes, that is what I was looking for! :-) -> (I want to change "Chart, Profile, More" into "Chart link, Profile, More") It seems that my Perl script are not enough correct for the type of project... and after 2-3 years, when I'll become a master on Perl, watching my present code I will say 'how my code was comic'... I will give attention to copy-pasting code in perlmonks, Thank you for code, I added it (instead of 'print NYSE...') to my script and it is working...

        Great! Good luck on your project and feel free to send me any amazing stock trends you might uncover so I can retire and live on a sailboat.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://980120]
Approved by Old_Gray_Bear
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2018-06-22 19:48 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (124 votes). Check out past polls.