PDFs internally are similar to an XML tree, Adobe has the MARS project to create a zero loss PDF to XML-ish/semi-openish and back again format. But PDFs can NEVER be represented by a tree because they have references to a node creating circular paths in the tree ("Indirect Objects"). I've found this Acrobat addon very good at fully showing the PDF COS tree and allowing manual editing of the tree, http://www.windjack.com/product/pdfcanopener/
, but its not a FOSS tool. From a quick look on CPAN, there are many libraries that will give you access to the PDF's COS tree. Not all PDFs can be parsed automatically by software.
PDFs can be just an 8x11 scanned jpeg per page. A PDF's text might look as perfect vector graphics (zoom to 1600%), but its unhighlightable. I opened it in a PDF editor. EVERY character was made of dozens of vector graphics primitives. The file was made from Adobe Illustrator and somehow during the conversion, all the fonts turned into vector graphics and were not text anymore. Try extracting text if the letter 'a' is 10 rectangles and Bezier curves all as independent individually editable shapes. Your only choice might be to try OCRing it since there is no text in the COS tree.
Since this is the government, try to think about "accessibility" support, researching those routes will get something that is supposed to be screen reader friendly, which always means computer parsable. Your text files without the formulas might be meeting ADA screen reader compatibility (I dont know), so you won't get anything better than that. The Federal Register is public domain, you can just copy the formula out of the PDF as a bitmap or as vector graphics into the destination without the computer ever understanding it.
From a quick look at that PDF, all the forumlas are text, when on the same line, and same font and same font attributes. Sub/superscripts are done by making another text box with absolute positioning. The formulas are fundamentally unparsable. They are a bunch of absolute positioned text boxes. Sub/superscripts are done by making new boxes. Fraction lines are path shapes. OCR is your only hope but I dont think it will work for engineering formulas.