Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

ID3v2 TAG Footer Reading goes wrong

by thanos1983 (Pilgrim)
on Jan 06, 2014 at 05:14 UTC ( #1069458=perlquestion: print w/ replies, xml ) Need Help??
thanos1983 has asked for the wisdom of the Perl Monks concerning the following question:

As a beginner programmer I was trying step by step to create an ID3v2 TAG reader. As simple as possible, I have managed to make it work up to a point but it seems the footer goes really bad. In theory looks correct but in practice it does not. Any guidance of suggestion would be much appreciated.

I am reading the Header correctly in 10 Bytes and I print the header.

As a second step I am checking if the extended header exist, if it exist read and print the outputs if not proceed with SEEK_SET.

As a third and final step, I am trying to read the Footer, which contains all information (Artist, Album, etc.). In theory Footer is localized after the Header,if no Extended Header exists, so I am using SEEK_SET at 10 Bytes, else I am calculating the SEEK_SET based on the Header size and the Extended Header size.

In the Footer process I have created an until condition, ($length_of_data == 0) trying to read all predefined Header_Tags and on the last one I have set ($length_of_data = 0) which in theory it will terminate the process.

#!/usr/bin/perl use warnings; # it warns about undefined values use strict; # it's a lexically scoped declaration use Data::Dumper; use Fcntl qw( SEEK_SET ); $| = 1; #flushing output my ( $lines , $type , $major_version , $revision_number , $flags , $si +ze , $extended_size , $number_flags , $extended_flags ) = "\0"; my ( $frame_id , $frame_size , $frame_flags , $extended_header , $mp3_ +size , $length_of_data , $lines_0 , $lines_1 , $lines_2) = "\0"; my ( $lines_3 , $length , $characters , $i ); my @word = "\0" x 5; my @memory = (0) x 5; my $source = $ARGV[0] or die "Please provide one .mp3 file to open!\nC +orrect syntax perl name of the program (e.g. Exercise3.pl) and name o +f the mp3 file (e.g. silence.mp3) $!\n"; open(FH, ,"<", $source) or die "Can not open file: $source $!\n"; binmode(FH); # Open in binary mode. if (@ARGV > 1) { print "Please no more than one argument!\nCorrect syntax perl ".$s +ource." and name of the mp3 file (e.g. silence.mp3)!\n"; exit (); } else { print ("\nUser has chosen file: $source to open for reading!\n"); # Header 10 Bytes in total 3 Bytes + 1 Byte + 1 Byte + 1 Byte + 4 +Bytes = 10 Bytes seek( FH , 0 , SEEK_SET ) or die "Could not seek: $!"; # Set pointer at the beggining of fil +e (Define possition with SEEK_SET). read( FH , $lines , 3 ); # Read 24 bits (3 Bytes) ID3 and store th +e data in $lines. Header_ID ( $type ) = unpack ( "A3" , $lines ); # (A) text (ASCII) string, w +ill be space padded. # print("This is Header_ID: $type\n"); seek( FH , 3 , SEEK_SET ) or die "Could not seek: $!"; # Based on possition 0 with SEEK_SET +we move 3 byte. read( FH , $lines , 1 ); # Read 8 bits (1 Byte) and store data in +$lines Version (1 Byte Major_Version). ( $major_version ) = unpack ( "h", $lines ); # (h) A hex string (l +ow nybble first). # print("This is Major_Version: $major_version\n"); seek( FH , 4 , SEEK_SET ) or die "Could not seek: $!"; # Based on possition 0 with SEEK_SET +we move 4 byte. read( FH , $lines , 1 ); # Read 8 bits (1 Byte) and store the data + in $lines Version (1 Byte Revision_Number). ( $revision_number ) = unpack ( "h", $lines ); # (h) hex string (l +ow nybble first). # print("This is Revision_Number: $revision_number\n"); seek( FH , 5 , SEEK_SET ) or die "Could not seek: $!"; # Based on possition 0 with SEEK_SET +we move 5 byte. read( FH , $lines, 1 ); # Read 8 bits (1 Byte) and store the data +in $lines Flags (1 Byte Flags). ( $flags ) = unpack ( "h" , $lines ); # (h) hex string (low nybble + first). # print("This is Byte_Flags: $flags\n"); print "TAG Detected: ".$type."v2.".$major_version.".".$revision_nu +mber."\n"; if($flags == 0) { print("\nThe extended flags has no corresponding data: \$00 was de +tected. Proceeding!\n\n"); } else { print("Flags are not empty, we have found these characters: $flags +\n"); } seek( FH , 6 , SEEK_SET ) or die "Could not seek: $!"; # Based on possition 0 with SEEK_SET +we move 10 byte. read( FH , $lines, 4 ); # Read 32 bits (4 Bytes) and store the dat +a in $lines Size (4 Bytes Size). ( @memory ) = unpack ( "C4" , $lines ); # (I) An unsigned integer +. #print Dumper(@memory); # print ("This is the content of lines_0: ".$memory[0]."\n"); # print ("This is the content of lines_1: ".$memory[1]."\n"); # print ("This is the content of lines_2: ".$memory[2]."\n"); # print ("This is the content of lines_3: ".$memory[3]."\n"); $mp3_size = ($memory[0] & 0xFF) | (( $memory[1] & 0xFF ) << 7) | (( $memory[2] & 0xFF ) << 14) | (( $memory[3] & 0xFF ) << 21); print Dumper($mp3_size); $length_of_data = $mp3_size; # End of Header 10 complete Bytes # At this point we want to make sure that we have an extended head +er (ID3v2 flags %abcd0000) # Bit 7 of (ID3v2 flags %abcd0000) if is 1 (active indicates that +there is extended header if is 0 # it means there is no extended header. If extended header exist p +roceed else skip. if (( $flags & (0b01000000) ) == 0b01000000 ) { # Begging Extended header (Optional not vital for correct parsing) +. # Extended Header in tppotal 6 Bytes, size 4 bytes memory size 4 B +ytes is enough to read binary no characters # no need for binary to string conversion no need for terminating +string character ('\0'). # Emptying memory for future use. @memory = (0) x 5; read( FH , $lines, 4 ); # Read 32 bits (4 Bytes) and store the dat +a in $lines Extended size. ( @memory ) = unpack ( "C4" , $lines ); # (I) An unsigned integer. print ("This is the extended size of lines_0: ".$memory[0]."\n"); print ("This is the extended size of lines_1: ".$memory[1]."\n"); print ("This is the extended size of lines_2: ".$memory[2]."\n"); print ("This is the extended size of lines_3: ".$memory[3]."\n"); # Due to Sync_safe remove the 0 from the beggining of each stored +element and Bitwise, # although we are working with unsigned characters and integers it + is a good practice. # Synchsafe integers are integers that keep its highest bit (bit 7 +) zeroed, making # seven bits out of eight available. $extended_size = ($memory[0] & 0xFF) | (($memory[1] & 0xFF) << 7 ) | (($memory[2] & 0xFF) << 14 ) | (($memory[3] & 0xFF) << 21 ); read( FH , $lines, 1 ); # Read 8 bits (1 Byte) and store the data +in $lines Flags (1 Byte Flags). ( $number_flags ) = unpack ( "c" , $lines ); # (h) hex string (low + nybble first). print("This is the number of flags: $number_flags\n"); read( FH , $lines, 1 ); # Read 8 bits (1 Byte) and store the data +in $lines Flags (1 Byte Flags). ( $extended_flags ) = unpack ( "C" , $lines ); # An unsigned chara +cter (usually 8 bits). print("This is the extended header flags: $extended_flags\n"); print("This is the extended header size, after sync_safe: $extende +d_size\n"); # From the stored value we subtract the Extended Header to get the + total size so far. $length_of_data = $length_of_data - $extended_size; # Reposition the seek pointer after the Extended Header. seek( FH , $extended_size + $mp3_size , SEEK_SET ) or die "Could not seek: $!"; # Based on possition 0 with SEEK_ +SET we move $extended_size + $mp3_size. # End of Extended Header (6 Bytes in total) } else { # Set the pointer after 10 Bytes. seek( FH , 10 , SEEK_SET ) or die "Could not seek: $!"; # Based on position 0 with SEEK_S +ET we move 10 byte. # print("This is the length of data: ".$length_of_data."\n"); until($length_of_data == 0) { # Begging of Mp3 Frame (10 Bytes in total), 4 Bytes Frame_ID + + 4 Bytes Frame_Size + 2 Bytes Frame_Flags = 10 Bytes. # Loop through until the end of length of data previously meas +ured. read( FH , $lines, 4 ); # Read 32 bits (4 Bytes) and store the + data in $lines Extended Header. ( $frame_id ) = unpack ( "A4" , $lines ); # (c) signed char (8 +-bit) value. print("This is the frame_id: $frame_id\n"); read( FH , $lines, 4 ); # Read 32 bits (4 Bytes) and store the + data in $lines Extended Header. ( @memory ) = unpack ( "C4" , $lines ); # (I) An unsigned inte +ger. $frame_size = ($memory[0] & 0xFF) | (($memory[1] & 0xFF) << 7 ) | (($memory[2] & 0xFF) << 14 ) | (($memory[3] & 0xFF) << 21 ); read( FH , $lines, 2 ); # Read 16 bits (2 Bytes) and store the + data in $lines Frame Flags. ( $frame_flags ) = unpack ( "A2" , $lines ); # (c) signed char + (8-bit) value. $length = length($frame_id); # print("This is the length of frame_id: ".$length."\n"); foreach($frame_id) { if ( $frame_id eq "TALB") { print "I have found one matching pattern: TALB\n"; } elsif ( $frame_id eq "TCON") { print "I have found one matching pattern: TCON\n"; } elsif ( $frame_id eq "TIT2") { print "I have found one matching pattern: TIT2\n"; } elsif ( $frame_id eq "TPE1") { print "I have found one matching pattern: TPE1\n"; } elsif ( $frame_id eq "TRCK") { print "I have found one matching pattern: TRCK\n"; } elsif ( $frame_id eq "TYER") { print "I have found one matching pattern: TYER\n"; $length_of_data = 0; } # End of Mp3 Frame (10 Bytes in total), 4 Bytes Frame_ID + 4 Bytes Fra +me_Size + 2 Bytes Frame_Flags = 10 Bytes. } # End of until } # End foreach } # End of else condition }# End of Big else after argument condition close (FH) or die "Can not close file: $source: $!\n"; $| = 1; #flushing output

When I am executing the code:

perl mp3.pl song.mp3

I get the following output in the terminal:

Use of uninitialized value $memory[0] in bitwise and (&) at Exercise_3 +.pl line 143. Use of uninitialized value $memory[1] in bitwise and (&) at Exercise_3 +.pl line 143. Use of uninitialized value $memory[2] in bitwise and (&) at Exercise_3 +.pl line 143. Use of uninitialized value $memory[3] in bitwise and (&) at Exercise_3 +.pl line 143.

Repeating over and over, It is kind of strange because if I use the:

exit(0);

After the:

print("This is the frame_id: $frame_id\n");

The output of the terminal will be:

This is the frame_id: TPE1

Also if I move the exit(0); command to stop the program a bit further at the of until condition I get the following output:

I have found one matching pattern: TPE1

Which makes me believe that the program operates correctly before it starts looping. When it starts looping it will loose the sequence of SEEK_SET in result it will not read the file in sequence and it will read the file over and over.

I tried to use the until condition with ($lines = <FH>), inorder to take of advantage the SEEK_SET point 10 Bytes and maje the program loop over and over until the end of file. But maybe I assumed wrong because I had the same result as previously.

So at this point I have run out of ideas and debugging processes, the code should be running at any operating system, if someone wants to try it only need to provide an mp3 file as ARGV1 and in theory it should get the same output as me.

Any ideas, why is looping without printing all the $header_id since it prints one of them.

I hope providing my full code it will someone understand the structure and maybe possible small mistakes.

Thank you in advance for your time and effort, it means a lot to people who are beginners to ask someone who can provide assistance.

Comment on ID3v2 TAG Footer Reading goes wrong
Select or Download Code
Re: ID3v2 TAG Footer Reading goes wrong
by no_slogan (Deacon) on Jan 06, 2014 at 15:28 UTC

    Please read How do I post a question effectively? We don't know what problem you're having.

    It looks like you're reading the frame header, but not the frame data. Your code will try to interpret the data from the first frame as another frame header.

    You can't count on the last frame having id "TYER".

    01000000 is an octal constant. You probably mean it to be binary, which would be written 0b01000000.

      To: no_slogan, I am sorry for the previous post I did not explain how my code is operating and what problems I am facing with. Based on you suggestiongs that where clear I understand what you meant and I modified the code accordinly. Please take another look, I have added as much explanation as possible and the outputs that I am getting. Again thank you for your time and effort to assist me.

        The basic problem still seems to be that you're not doing anything about the frame data. You might try something like this:

        $position = 10; $id3_end = 10 + $id3_size; while ($position < $id3_end) { seek FH, $position, SEEK_SET; # read $frame_id, $frame_size, and $frame_flags $position += 10 + $frame_size; last if $frame_id =~ /\W/ || $position > $id3_end; # do something with $frame_id }

        The "last if" line is a sanity check, because some ID3 writers leave a mess behind them.

Re: ID3v2 TAG Footer Reading goes wrong
by Corion (Pope) on Jan 06, 2014 at 15:34 UTC

    You are doing this as a programming exercise, but you are aware that ID3v2 points out already existing solutions?

      To: Corion, You are right there are some existing modules that I could use. I am trying to make a mini version of them by writing the code by my self, for training purposes, nothing special but I end up getting stack unfortunately.

Re: ID3v2 TAG Footer Reading goes wrong (more subs)
by Anonymous Monk on Jan 06, 2014 at 23:06 UTC

    I see you're doing it the hard way ... I don't have particular interest in ID3v2 format (and thus your program) but I did have these "helpers" lying around .. simplified my program , maybe they can do the same for you (and you can add more of these)

    GititID3v2( $file ); ... more subs :)

    sub ReadBytes { my( $fh, $bytes ) = @_; $bytes or Carp::croak 'Usage: ReadBytes( $filehandle, $bytes ) '; my $readed = read $fh, my($data) , $bytes; $readed == $bytes or Carp::warn "Only read($readed) but wanted($by +tes): $! ## $^E "; $data; } sub Int8 { unpack 'c', $_[-1] } sub UInt8 { unpack 'C', $_[-1] } sub Int16 { unpack 's<', $_[-1] } sub UInt16 { unpack 'S<', $_[-1] } sub Int32 { unpack 'l<', $_[-1] } sub UInt32 { unpack 'L<', $_[-1] } sub Int64 { unpack( ( CAN_PACK_QUADS ? 'q<' : 'a8' ), $_[-1] ) } sub UInt64 { unpack( ( CAN_PACK_QUADS ? 'Q<' : 'a8' ), $_[-1] ) } sub ReadInt8 { Int8( ReadBytes( $_[-1], 8 /8 ) ); } sub ReadUInt8 { UInt8( ReadBytes( $_[-1], 8 /8 ) ); } sub ReadInt16 { Int16( ReadBytes( $_[-1], 16/8 ) ); } sub ReadUInt16 { UInt16( ReadBytes( $_[-1], 16/8 ) ); } sub ReadInt32 { Int32( ReadBytes( $_[-1], 32/8 ) ); } sub ReadUInt32 { UInt32( ReadBytes( $_[-1], 32/8 ) ); } sub ReadInt64 { Int64( ReadBytes( $_[-1], 64/8 ) ); } sub ReadUInt64 { UInt64( ReadBytes( $_[-1], 64/8 ) ); }
      Bah, typos/missing
      ReadBytes(\*STDOUT,10); ## a test :) sub ReadBytes { my( $fh, $bytes ) = @_; $bytes or Carp::croak 'Usage: ReadBytes( $filehandle, $bytes ) '; my $readed = read $fh, my($data) , $bytes; $readed == $bytes or Carp::carp "Only read($readed) but wanted($by +tes): $! ## $^E "; $data; } use constant CAN_PACK_QUADS => !! eval { my $f = pack 'q'; 1 };

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1069458]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2014-09-23 02:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (210 votes), past polls