<?xml version="1.0" encoding="windows-1252"?>
<node id="831521" title="Re: How to invoke pdftotext and extract first line of text from PDF file?" created="2010-03-28 19:23:21" updated="2010-03-28 19:23:21">
<type id="11">
note</type>
<author id="708738">
LanX</author>
<data>
<field name="doctext">
That's what I [id://831190|did]:
&lt;c&gt;
  open ( my $fh, "-|","pdftotext -layout $file -") or
    die "error extracting $file";
&lt;/c&gt;&lt;P&gt;

But I really recommend using &lt;c&gt;pdftohtml -xml -stdout&lt;/c&gt; instead if you need more reliability about text position, page-number and font (-family, -size and -color) used.&lt;P&gt;


&lt;!-- Node text goes above. Div tags should contain sig only --&gt;
&lt;div class="pmsig"&gt;&lt;div class="pmsig-708738"&gt;
&lt;p&gt;Cheers Rolf
&lt;/div&gt;&lt;/div&gt;</field>
<field name="root_node">
831519</field>
<field name="parent_node">
831519</field>
</data>
</node>
