Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Cut exact time range from compressed audio

by Sixtease (Friar)
on Apr 07, 2012 at 19:03 UTC ( #963937=perlquestion: print w/ replies, xml ) Need Help??
Sixtease has asked for the wisdom of the Perl Monks concerning the following question:

So, I am building a web application for transcribing speech. Mp3 and ogg/vorbis files reside in an external CDN. Each file is worth about 1 hour of playback. The recordings get automatically transcribed via means of ASR. The user listens to the audio and sees the transcription, which contains errors. The user selects a span of erroneously transcribed text, provides the correct transcription and this is sent to the server.

The server gets the filename of the audio, the timestamps of the start and end of the corrected passage, and the actual transcription.

Now the server does the forced alignment of the text to the audio, which means it must access the exact timespan of the audio file. And here comes the problem. The audio file is not on the server -- it is on a CDN. There's just too much data (~50GB) to store it on the web server.

So I thought I'd request just the needed part of the audio file via HTTP Range header but then you have a fragment of an mp3 or of an ogg and I don't know how to find out which exact time position in the whole file the cut part represents.

I tried using Audio::Mad and Ogg::Vorbis::Decoder but both failed to decode file fragments. I would be thankful for insights and advices. Not only how to decode the audio file fragments but also about other possible solutions.

use strict; use warnings; print "Just Another Perl Hacker\n";

Comment on Cut exact time range from compressed audio
Re: Cut exact time range from compressed audio
by Corion (Pope) on Apr 07, 2012 at 19:12 UTC

    At least mp3, that is MPEG-1, Audio Layer 3, is divided up into "frames", small chunks of audio of specific length as stated by the bitrate of that frame. You can create an index of these frames and then seek in the audio file to the desired location. Alternatively, you can just guess the location of the chunk and start reading a bit before it and a bit after its estimated size - the MPEG stream contains marker bytes (ff fe I think) that will allow you to resynchronize to the stream. I'm not aware of any time code in the MPEG stream, so I recommend going with the separate index to reduce the webserver traffic. Maybe consider even adding an external API to request a certain time - seeking on the server is likely faster than requesting parts of the file via HTTP Range headers and hoping you found the correct offset.

    But for doing sub-second exact cutting (or playback), you will need to implement a way of seeking in the decompressed stream as well, because each frame represents a "large" amound of sound. Maybe your playback libraries can start playback in the middle of a frame (after they read and discard the first part of the frame).

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://963937]
Approved by ww
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (14)
As of 2014-07-31 18:54 GMT
Find Nodes?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:

    Results (251 votes), past polls