Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^2: Parse file, split

by mewoq (Initiate)
on May 28, 2013 at 00:36 UTC ( [id://1035489]=note: print w/replies, xml ) Need Help??


in reply to Re: Parse file, split
in thread Parse file, split

So here's the text file with the car names.
2011 Chevy Camaro 2011 Dodge Ram Crew Cab Short Bed 2011 Ford F150 Platinum 2011 Ford Flex 2011 Ford Transit 2011 GMC Cargo Van Extended 2011 Hyundai Genesis Coupe 2011 Kia Sol 2011 Nissan Cube 2011 Toyota Prius
This came from the name of the folder (each of these are a folder name containing images of that car) that I parsed into a text file. I plan (later with more changes) on importing this into a table in a db.

Replies are listed 'Best First'.
Re^3: Parse file, split
by AnomalousMonk (Archbishop) on May 28, 2013 at 01:29 UTC

    If the dataset is assumed valid and you avoid multi-word 'make' fields, LanX's approach would certainly seem to do the trick with this dataset. Note that  $extended_model has to be 'fixed' if the field does not exist, otherwise it's undefined.

    >perl -wMstrict -le "my @records = ( '2011 Chevy Camaro', '2011 Dodge Ram Crew Cab Short Bed', '2011 Ford F150 Platinum', '2011 GMC Cargo Van Extended', ); ;; for my $record (@records) { my ($year, $make, $model, $extended_model) = split ' ', $record, 4; $extended_model //= ''; print qq{'$year' '$make' '$model' '$extended_model'}; } " '2011' 'Chevy' 'Camaro' '' '2011' 'Dodge' 'Ram' 'Crew Cab Short Bed' '2011' 'Ford' 'F150' 'Platinum' '2011' 'GMC' 'Cargo' 'Van Extended'
Re^3: Parse file, split
by Jim (Curate) on May 28, 2013 at 01:07 UTC

    Here's a demonstration of how using regular expression pattern matching instead of string split might be more correct, robust and extensible.

    #!perl use strict; use warnings; my $valid_vehicle_description_pattern = qr{ ((?:19|20)\d\d) # $1 is Year \s+ ( # $2 is Make British\s+Leyland | Chev(?:y|rolet) | Dodge | Ford | (?:General\s+Motors|GMC?) | Hyundai | Kia | Nissan | Toyota ) \s+ (\S.*) # $3 is Model }ix; while (my $vehicle = <DATA>) { chomp $vehicle; if ($vehicle =~ $valid_vehicle_description_pattern) { my ($year, $make, $model) = ($1, $2, $3); print "Year: $year\tMake: $make\tModel: $model\n"; } else { warn "Invalid vehicle description: $vehicle\n"; } } __DATA__ 1970 British Leyland Triumph Spitfire 2011 CHEVROLET CAMARO 2011 Chevy Camaro 2011 Dodge Ram Crew Cab Short Bed 2011 Ford F150 Platinum 2011 Ford Flex 2011 Ford Transit 2011 GMC Cargo Van Extended 2011 Hyundai Genesis Coupe 2011 Kia Sol 2011 Nissan Cube 2011 Toyota Prius 2015 Apple iCar
Re^3: Parse file, split
by LanX (Saint) on May 28, 2013 at 01:31 UTC
    > This came from the name of the folder (each of these are a folder name containing images of that car) that I parsed into a text file.

    Please! Just change the delimiter when writing to something impossible in filenames, like "\t" (hopefully) and your problems when reading are all gone!

    In comparison all other approaches are just insane hacks!

    Or just avoid any intermediate files.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      Yeah, field limits did the trick

      split(/\t/, $_, 3) worked out.

      What I want to do now is, I have a text file that lists the whole folder hierarchy.for example

      C:\xx\yy\Desktop\BadWrap 2011\2011 Chevy Camaro C:\xx\yy\Desktop\BadWrap 2011\2011 Chevy Camaro\JPEG\11_Chevy_Camaro D +river.jpg

      I think I only want to now have two columns of info, one being the path and the second being the final folder/image. I don't want to modify the folder names/path, so I can't use the delimiter of \s+ or \t. Is there a way to make it only the last "\" seen per line?

Re^3: Parse file, split
by JockoHelios (Scribe) on May 28, 2013 at 01:58 UTC

    I'm assuming then that each line is the name of a folder, spaces included. A bit of RegEx would do, as in the code below.

    I'm sure the RegEx section could be redone to be cleaner and smaller, but I'm not at that level yet :) so I just did it in seperate steps.

    use strict; my $Car0 = "2011 Chevy Camaro"; my $Car1 = "2011 Dodge Ram Crew Cab Short Bed"; my $Car2 = "2011 Ford F150 Platinum"; my $Car3 = "2011 Ford Flex"; my $Car4 = "2011 Ford Transit"; my $Car5 = "2011 GMC Cargo Van Extended"; my $Car6 = "2011 Hyundai Genesis Coupe"; my $Car7 = "2011 Kia Sol"; my $Car8 = "2011 Nissan Cube"; my $Car9 = "2011 Toyota Prius"; my $OneCar = ""; my $Year = 0; my $Make = ""; my $Model = ""; push( my @CarInfo, ( $Car0, $Car1, $Car2, $Car3, $Car4, $Car5, $Car6, +$Car7, $Car8, $Car9 ) ); foreach $OneCar( @CarInfo ) { # initialize the variables $Year = $OneCar; $Make = $OneCar; $Model = $OneCar; # drop everything after the Year including the first space char $Year =~ s/\s.*//; # drop the year and the first space char $Make =~ s/\d*\s*//; # drop everything after the Make including the first space char $Make =~ s/\s.*//; # drop the year and the first space char, same as with $Make $Model =~ s/\d*\s//; # drop everything up to and including the first space char $Model =~ s/\w*\s//; print "$Year\t\t$Make\t\t$Model\n"; }
    Dyslexics Untie !!!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1035489]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-24 21:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found