Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Going from PDF to GEDCOM

by Anonymous Monk
on Nov 08, 2010 at 16:23 UTC ( #870144=note: print w/ replies, xml ) Need Help??


in reply to Going from PDF to GEDCOM

First idea, run strings, count the number of occurences and you 'll get the most common words ( burial/in/on/he/she/they/died/born/married )

To get sentences, slurp a page, split on period not followed by a comma (or other punctuation).

Then split into parts based on the common words and do something with them.

But, I've no idea how a sentence (or a bunch) translate into gedcom calls.

How did you generate the sentences in the first place? Reverse that process


Comment on Re: Going from PDF to GEDCOM
Replies are listed 'Best First'.
Re^2: Going from PDF to GEDCOM
by jedikaiti (Friar) on Nov 08, 2010 at 16:42 UTC

    Thanks! The getting-into-PDF process was actually automated by Family Tree Maker software, into which the GEDCOM file will be imported (once I find and reinstall it - STILL unpacking from moving in July!).

    Yes, I think I can start by separating individuals by looking for lines that begin with a number, a period, and a space. Well, most. Spouses may need to be identified using the common words method.

    Gave a little more thought to it late last night, and most people in this file are listed twice - as the children of their parents (short listing) and the more detailed individual listings. So I also need to match those up, and account for possible same names. I think I can do that by getting the basic info (name, DOB, DOD) from the short listing, then for the individual listings merging any records for whom those 3 details match, and treating anyone else as a new individual.

    Thanks again!
    Kaiti
    Swiss Army Nerd

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://870144]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2015-07-28 06:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (252 votes), past polls