Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Insane (?) Regexp-based jpeg (JFIF) extractor...

by blazar (Canon)
on Oct 30, 2008 at 15:13 UTC ( #720495=CUFP: print w/replies, xml ) Need Help??

I personally believe that there are tons of better, more reliable, and format-specific extractors of images for "container" files like pdf, or (those damned) pps, etc.: either free (for one of the two acceptations of the word) or not. But yesterday I was in a quick need of extracting the jpeg images contained in one such file, so I quickly checked JPEG @ Wikipedia for the headers's markers and I concocted up the latter, possibly complete of some minimal YAGNI (I know I shouldn't) for future development: it worked for me!

#!/usr/bin/perl use strict; use warnings; use 5.010; use open IO => ':raw'; use File::Basename; my $progname; BEGIN { ($progname) = fileparse $0, qr/\.pl/i; } local $/; my %cnt; while (<>) { while ( /(\xFF\xD8 .*? \xFF\xD9)/xsg ) { for my $name ($ARGV . ++$cnt{$ARGV} . '.jpeg') { open my $fh, '>', $name or die "Can't open `$name': $!\n"; warn "[$progname] Creating `$name'\n"; print $fh $1; } } } __END__
--
If you can't understand the incipit, then please check the IPB Campaign.

Replies are listed 'Best First'.
Re: Insane (?) Regexp-based jpeg (JFIF) extractor...
by ikegami (Pope) on Oct 31, 2008 at 09:15 UTC
    use open IO => ':raw'; doesn't affect <> so this won't work on Windows.

      I personally believe that you were very kind to let me know. But at the same time I must say that I have actually tested it several times (well, three files, actually) under Windows and it worked as expected: was I just lucky?

      So... I used open because it seemed the quickest and cleanest WTDI, but what do you propose as a workaround or solution? (Short of manually open()ing the files of course...)

      Update: I just actually checked with one single jpeg image and found an actual example that supports your claim. Incidentally, I would say that open.pm's docs do not make it clear enough that it won't work with <>.

      --
      If you can't understand the incipit, then please check the IPB Campaign.
        By the way, I found open doesn't work on open(my $fh, '-') either.
Re: Insane (?) Regexp-based jpeg (JFIF) extractor...
by Discipulus (Abbot) on Oct 31, 2008 at 08:17 UTC
    ++ for you and You had to know it runs fine on 5.8.8 too. cheers

    Lor*
Re: Insane (?) Regexp-based jpeg (JFIF) extractor...
by wol (Hermit) on Oct 31, 2008 at 11:46 UTC
    Pure hackery. Never mind any parsing nonsense, just wade in and find the bytes :-)

    Also, many thanks for the YAGNI link - it, and the linked pages, might help back me up in negotiations with some of my $work colleagues about what we should (or rather shouldn't) do.

    --
    .sig : File not found.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://720495]
Approved by Perlbotics
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (9)
As of 2019-08-23 14:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?