|
Mad_Mac has asked for the
wisdom of the Perl Monks concerning the following question:
Oh wise PerlMonks,
I am working on a script to parse my mp3 files and write the id3 tag info to a database, mainly just for the exercise of doing it. My code is below. It works, but I'm pretty sure the way I am doing things is less efficient than it could be.
I'd love any suggestions on how to make this code run more quickly and efficiently.
BTW - if it matters, I am using ActivePerl on a Vista x64 box.
#!/usr/bin/perl
use strict;
use warnings;
use MP3::Tag;
use File::Find;
use DBI;
#set up sqlite connection
my $db = DBI->connect("dbi:SQLite:music.sqlite","","");
#set up variables
my $count = 0;
my @songinfo =
my $mp3 =
my @dirs = ("/users/mine/music");
# Get rid of the previous version of the table
$db->do("DROP TABLE IF EXISTS Songs") or DIE("Couldn't drop old table"
+);
# Recreate Table
$db->do("CREATE TABLE songs (id INTEGER PRIMARY KEY AUTOINCREMENT, Tra
+ck INTEGER, Title TEXT, Artist TEXT, Album TEXT, Year INTEGER, Genre
+TEXT, Path TEXT)") or DIE("Couldn't create new table");
#set up SQL statement
my $sth = $db->prepare('INSERT INTO Songs (Track, Title, Artist, Album
+, Year, Genre, Path) VALUES (?, ?, ?, ?, ?, ?, ?)');
# get total number of mp3's
my @songtotal = `dir c:\\users\\mine\\music\\*.mp3 /s`;
my @mp3total = split(/ /, trim($songtotal[(scalar @songtotal)-2]));
# look for files in each directory
find(\&displayMP3Info, @dirs);
# this function is called every time a file is found
sub displayMP3Info {
# if the file has an MP3 extension
if (/\.mp3$/i) {
# increment counter
$count++;
# create new MP3-Tag object
$mp3 = MP3::Tag->new($_);
# get tag information
$mp3->get_tags();
# check to see if an ID3v1 tag exists
if (exists $mp3->{ID3v1})
{
#if it's an ID3V1 Tag stick the parts of it into an array
+called songinfo
@songinfo = ($mp3->{ID3v1}->track, $mp3->{ID3v1}->title, $
+mp3->{ID3v1}->artist, $mp3->{ID3v1}->album, $mp3->{ID3v1}->year, $mp3
+->{ID3v1}->genre, $File::Find::name);
# put it in the database
$sth->execute($songinfo[0], $songinfo[1], $songinfo[2], $s
+onginfo[3], $songinfo[4], $songinfo[5], $songinfo[6]);
# print "\n$songinfo[0], $songinfo[1], $songinfo[2], $song
+info[3], $songinfo[4], $songinfo[5], $songinfo[6]\n";
# update the total on the screen
printf STDERR ("\r%02d of $mp3total[0] files.", $count);
}
# clean up
$mp3->close();
}
}
print "Done.\n";
# trim function to get rid of leading and trailing spaces
sub trim{
my $string = shift;
$string =~ s/^\s+|\s+$//g;
return $string;
}
Re: Improving SQL speed by MidLifeXis (Priest) on Nov 01, 2009 at 13:18 UTC |
| [reply] [d/l] [select] |
|
Thanks. Appreciate the tips.
| [reply] |
|
OK, so I worked through your suggestions:
- Thanks
- Got it. I was trying to be explicit in the debugging phase, but now that it's working I changed it per your suggestion.
- OK. I had them declared early on becuase I thought I was getting an error declaring them within the sub-function, but apparently not. I changed them as you recommended and it works fine.
- Thanks! Can the same trick be used on if (exists $mp3->{ID3v1}) {?
- Yea, Part of that is because I kept changing the way I wanted to use those variables as my code evolved ... part of it's because I've only been working in Perl again for a few weeks after a 10+ year hiatus, and part of it's becuase I occasionally get warnings from Perl that calling certain array variables like @songinfo[0] would be better referenced as $songinfo[0]. Which makes me think I don't really understand Perl variables yet. So I just keep mucking with it (and plagiarizing other peoples code) 'till it does what I want.
After making the changes you suggested, the code seems to be ~10% faster, but I'm not sure. I'm going to add a timer and run it the old way and with your changes. We'll see.
Thanks again for your suggestions.
Mad Mac
| [reply] [d/l] [select] |
|
| [reply] |
Re: Improving SQL speed by hesco (Deacon) on Nov 01, 2009 at 15:17 UTC |
Your title and question suggested to me a need for appropriate table indexing to improve database performance. But your code shows only a single INSERT on an unindexed table. An index slows down table INSERTs (but is often justifiable because of the needs of future queries on the data set). It is also faster than DROPping and rebuilding those indexes every time you do an INSERT, except for bulk INSERTs when that strategy can make sense.
If you are having some performance issues with this code, I doubt it is in the database interactions.
I'd suggest using Devel::NYTProf to start to get a handle on where the resources are being used. I'd guess that parsing your mp3 library is not a trivial task.
-- Hugh
if( $lal && $lol ) { $life++; }
if( $insurance->rationing() ) { $people->die(); }
| [reply] |
|
Thanks. I'll look into that module. I'm not having performance issues per se, but the code seems to parse ~15-20 songs per second, which is ~20 minutes for the 19k file in my mp3 library. Not terrible, but I have a feeling it can be faster. I don't notice an CPU or RAM hit while it's running, but I haven't looked closely.
Mad_Mac
| [reply] |
|
| [reply] |
Re: Improving SQL speed by gmargo (Pilgrim) on Nov 01, 2009 at 16:40 UTC |
One alternative is to process all the mp3 files,
and then write all the data simultaneously
in one call to execute_array,
instead of one call to execute for each mp3 file.
It's more work to set up (must accumulate all data in separate
arrays and use bind_param_array) but for a large
number of entries it will probably be faster.
| [reply] [d/l] [select] |
|
And if ganging up inserts still seems too slow, there's
apparently a bulk load command named ".import" that will load a
table from a file in one-shot:
The fact that there's a ".seperator" command makes it sound like
it's better suited to tsv files than csv, but you'd need to play
around with it to see. Most database bulk-loaders seem to work
with some version of csv, but frequently you'll need to massage
the csv first to get it in the form they expect: csv is a
standard without a standard.
| [reply] |
|
Thanks. I'd had skimmed some other pages that suggested for this type of thing it might be faster to write everything out to a CSV and the do a bulk upload, but I hadn't found a good example of the necessary syntax yet. I'll check out those links.
| [reply] |
Re: Improving SQL speed by bart (Canon) on Nov 01, 2009 at 22:19 UTC |
You may look into using transactions, and only commit every N entries, or every M seconds, whichever is faster. You may find that slowness is largely caused by (implicit or explicit) commit for every insert. | [reply] [d/l] [select] |
Re: Improving SQL speed by oha (Friar) on Nov 02, 2009 at 10:22 UTC |
| [reply] |
Re: Improving SQL speed by CountZero (Canon) on Nov 02, 2009 at 21:51 UTC |
Have a look at MP3::Find and MP3::Find::DB, it might give you some good ideas.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] |
|
| [reply] |
Back to
Seekers of Perl Wisdom
|
|