http://www.perlmonks.org?node_id=1020418

puterboy has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to use perl to parse my mysqldump backup files.

The format of the colum name heading list is:
(`col1_name`, `col2_name`, `col3_name`,...)
The format of a general table is:
(col1_entry, col2_entry, col3_entry),(col1_entry, col2_entry, col3_ent +ry), (col1_entry, col2_entry, col3_entry)...
where each parenthetical list represents a row and rows are separated by commas. Numerical entries can be written as-is, while string entries are enclosed in single quotes. Single quotes within a string can be escaped with a backslash. Commas and parenthesis are treated as string characters when within a quoted string.

. It seems like the most logical way to store the data would be either as an array of arrays (without any explicit column header names) or as an array of hashes where the hash is indexed by the column names.

I am interested in trying both ways. However, I don't know how best to parse the single quotes and backslashed quotes so that for example parentheses or commas or a backslashed quote within a single quoted string get properly treated as a string character and not as a seperater.

. I also would be interested in a slick way of slurping this in without having to use some generic heavy-duty perl parser module. It seems like the parsing shouldn't be too difficult since the rules are simple but I don't know the best way...

Has anyone either parsed mysqldump format (or similar formats) before?

Replies are listed 'Best First'.
Re: Parsing mysqldump files
by Anonymous Monk on Feb 24, 2013 at 20:45 UTC
      SQL::Statement looked great at first, HOWEVER, it doesn't seem to handle the MySQL syntax of putting backquotes around column and table names... so it signals a parsing error. It would be great if there were a MySQL version...

      DBIx::MyParse and DBIx::MyParsePP seem to be very low level parsers in that they create huge trees just parsing a simple MySQL INSERT statement and it wasn't/isn't obvious to me how to extract the high level aspects of the MySQL statement constructs from the generated parse hash.

      I'm really just trying to parse the array-of-array like structure used by the MySQL INSERT statement to insert multiple rows into a table. It may even be close enough to a perl array-of-arrays to use perl to parse it...
Re: Parsing mysqldump files
by Anonymous Monk on Feb 24, 2013 at 20:42 UTC
    Sure, just import into mysql, then you have a database to work with :D
Re: Parsing mysqldump files
by 7stud (Deacon) on Feb 25, 2013 at 03:47 UTC

    I also would be interested in a slick way of slurping this in without having to use some generic heavy-duty perl parser module.

    Slurp::MindReader might be a good choice.

    Numerical entries can be written as-is, while string entries are enclosed in single quotes. Single quotes within a string can be escaped with a backslash. Commas and parenthesis are treated as string characters when within a quoted string...Has anyone (parsed similar formats) before?

    The format is called CSV. See Text::CSV_XS.
      ENOGOODATHUMOR