Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Bookmarking PDF by string

by ReverendDovie (Initiate)
on Jan 16, 2019 at 20:05 UTC ( #1228661=perlquestion: print w/replies, xml ) Need Help??

ReverendDovie has asked for the wisdom of the Perl Monks concerning the following question:

Hello, first time poster. Hope I get it right.

I'm trying to do the following:

1) Convert an HTML page to a PDF
2) Add the appropriate bookmarks in that PDF
3) Join it with a pre-fab "title page" PDF

I have found the answers (I'm pretty sure, haven't fully tested yet) to 1 and 3. Number 2 is getting me a bit. I found the bookmarking ability of PDF::Reuse to be close, but I want to bookmark to a specific string, not a page number since I won't necessarily know the right page number since the PDF was just generated back in step one.

Is there a way to do the bookmarking thing but to a specific string (which I can preset when building the HTML)?

Thank you

Replies are listed 'Best First'.
Re: Bookmarking PDF by string
by LanX (Archbishop) on Jan 16, 2019 at 22:13 UTC
    > Is there a way to do the bookmarking thing but to a specific string (which I can preset when building the HTML)?

    This is a very specific PDF question and practically no Perl involved except mentioning PDF::Reuse .

    I think your first step should be to clarify if it's even possible.

    I'd suggest to take a look into pdflatex and family, because LaTeX is normally very strong in cross referencing and keeping the "hyper structure" of it's document.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Re: Bookmarking PDF by string
by vr (Deacon) on Jan 17, 2019 at 00:04 UTC
    bookmarking ability of PDF::Reuse

    POD is misleading, this simply jumps to target page, to same coordinates and zoom as on currently active page. I doubt it's what's expected if user wants to go to particular line of text. So, expect at least to read the manuals and patch the source.

    + What kind of "specific string"? E.g. wkhtmltopdf adds bookmarks to what were HTML headers, and I think any decent converter has similar abilities. That said, extracting text (either plain text or XML with coordinates and attributes) from PDF page by page to find page number with required string is obvious and perhaps you are already testing this solution...

    Edit: PDF::API2::Outline also adds bookmarks.

Re: Bookmarking PDF by string
by karlgoethebier (Monsignor) on Jan 17, 2019 at 17:42 UTC
    "...preset when building the HTML"

    Ouch! This task is pain in the ass.

    I had less or more success with a workflow like this:

    • DocBook as basic markup language
    • Oxygen for the transformation to HTML
    • dblatex for the transformation to LaTeX/PDF

    You may try to convert the XML directly to PDF with Oxygen but the layout with LaTeX is much better out-of-the-box.

    Anyway: you will need a lot of time and patience. See also.

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1228661]
Front-paged by haukex
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2019-10-19 21:49 GMT
Find Nodes?
    Voting Booth?