Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

RFC for users of Locale::Maketext::Lexicon

by clinton (Priest)
on Nov 23, 2008 at 11:47 UTC ( #725413=perlmeditation: print w/ replies, xml ) Need Help??

Hi all

I'm a co-maintainer of Locale::Maketext::Lexicon and I'd like to make a change to the format of the .po file created by Locale::Maketext::Extract / xgettext.pl.

Currently it writes out each entry in the .po file as:

#: $filename_1:$line $filename_2:$line ... #. (vars passed to maketext sub) msgid "string to translate" msgstr ""

However, when I edit the file with POEdit, it rewrites the file as:

#. (vars passed to maketext sub) #: $filename_1:$line #: $filename_2:$line ... msgid "string to translate" msgstr ""

... which causes a large unneccessary diff.

Looking at the gettext manual, having the vars comment before the file location seems to be consistent, but having the file references on new lines is controlled by the option --no-wrap. However, Locale::Maketext::Extract already wraps the msgids, so the behaviour is currently inconsistent.

What I'd like to know is: can I just make this change, or would it affect you with whatever client you use for editing the .po files? Should I be providing output format options instead?

thanks

Clint

Update: Turns out I hadn't read the gettext manual well enough. The msgid's are only split onto separate lines if they contain embedded newlines, so L::M::Extract is correct in this regard. However, the default for gettext is to have the arguments comment before the file positions, and to put each file position onto a separate line.

In the interests of maintaining backward compatibility, I've added the option --wrap which will split the file positions onto separate lines. This has been released as v 0.75 of Locale::Maketext::Extract (CPAN currently syncing)

Comment on RFC for users of Locale::Maketext::Lexicon
Select or Download Code
Re: RFC for users of Locale::Maketext::Lexicon
by GrandFather (Cardinal) on Nov 23, 2008 at 19:57 UTC

    I'm not a user of Locale::Maketext::*, but it seems to me that as soon as you offer to provide options to select the behavior you have really answered your question. The important question then is: what is the default behavior with legacy code, and again the answer is not far away - do what it does now.


    Perl reduces RSI - it saves typing
Re: RFC for users of Locale::Maketext::Lexicon
by graff (Chancellor) on Nov 24, 2008 at 06:05 UTC
    I haven't used any of these modules either, but based on the evidence you've shown, it seems like the "legacy" behavior of "Locale::Maketext::Extract / xgettext.pl" is sub-optimal (to put it nicely) because it is non-compliant with the other tools that it is supposed to interact with.

    So, speaking as an outsider, my vote would be to change the default behavior so as to make it "correct" (compliant with gettext and POEdit), but keep the legacy behavior as an option (in case anyone has actually created code dependent on that format), and be very clear about the change in the docs. (And mention that this "backward compatible" option will be deprecated.)

    Among the people who have used Local::Maketext::Extract / xgettext.pl, I wonder how many of them have considered its original output format to be troublesome because of its divergence from the norm? (For that matter, I wonder if this issue was involved in one of the POEdit change-log entries:

    Version 1.3.5 ------------- ... - fixed crash when loading some invalid PO files (#1495970) ...
      I haven't used any of these modules either, but based on the evidence you've shown, it seems like the "legacy" behavior of "Locale::Maketext::Extract / xgettext.pl" is sub-optimal (to put it nicely) because it is non-compliant with the other tools that it is supposed to interact with.
      No, that is way too strong. They are compatible: they both use "#:" to indicate the source (filename, line number), although one uses one line per entry and the other combines them in one line; and they both use "#." for variables. Their relative order is not actually important. (See the included doc Web localization in Perl in the distro)

      It's just that diff sees them as different.

      In fact, I wouldn't be too surprised if yet another tool would use yet another (compatible) format for the same data! So, making this module and this one tool compatible is likely not going to fix the problem. Not definitely, anyway.

      It's just a formatting choice. The difference between these formats is actually much smaller than the different output options for diff itself (for example: unified diff vs. normal diff), which you can control with command line switches.

      So I propose to either use an output format option, possibly in this module (you can use explicit control for every option, or a global named setting, or both), or add an external command line tool to convert any of these compatible outputs to a single common "standard" format. In fact, I think the latter is the most universal practical solution. You may add the formatting options to this module, and build this command line tool on that!

      I would not just change the format to match the "tool du jour". It won't solve anything in the long run. And no tool will barf on the data just because of this tiny format change — except diff. :)

Re: RFC for users of Locale::Maketext::Lexicon
by bart (Canon) on Nov 26, 2008 at 18:15 UTC
    I'm trying out Locale::Maketext::Lexicon 0.75 right now, and apparently it has a much worse behaviour than you describe here, regarding var comments: read_po() simply drops them! So if you read in a .po file with read_po() and immediately save it again with write_po(), they're gone! Which is not very nice...

    BTW why is there nothing in the whole suite foreseen to manually add single lexicon entry? To fill in the translation for a phrase by a Perl script? As it is now, it makes a lot more sense to manually construct the output to write a .po file, than to try to use this module, which kind of defeats its purpose, IMHO... The only thing it's useful for, impressive though it may be, is to parse file and generate stub files, with just the original text without translations. So you can only use it for the big stuff, but not for the little things.

      Strings get added and removed from your project. xgettext.pl needs to see all of the strings across the whole project to make a complete list of unique strings, the vars passed in, and where the strings are used.

      The purpose of the Locale::Maketext::Extract* modules within this distribution is to:

      • read in an existing .po file (if one exists)
      • parse all of the source code in whatever formats are supported and extract the original string, any arguments that are passed in, and the location of the string
      • write out a new .po file, merging in the new data, and removing strings that have not been translated and are no longer used.

      It retains any user generated comments and translations, but everything else is under the control of xgettext.pl, as per GNU's gettext utilities.

      It is not intended as a general PO-file manipulator.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://725413]
Approved by moritz
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (15)
As of 2014-07-23 18:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (149 votes), past polls