Re^2: Matching alphabetic diacritics with Perl and Postgresql

Replies are listed 'Best First'.
Re^3: Matching alphabetic diacritics with Perl and Postgresql by Tux (Canon) on Jun 04, 2017 at 11:37 UTC
Two points. That suggestion is not what you mean, the correct syntax includes a colon and has different spelling: `opne my $fh, "<:encoding(utf-8)"` Use a CSV parser that handles UTF-8, like Text::CSV_XS `my $aoh = csv (in => "file.csv", encoding => "utf-8");` Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^4: Matching alphabetic diacritics with Perl and Postgresql by anonymized user 468275 (Curate) on Jun 04, 2017 at 13:01 UTC
You are right - I did code it correctly in the .pl, but not in the post here (just (mis-typed) it in from memory). re Text::CSV, that's what I did at first but having switched to an open and read to be able to debug my draft-code issues with clarity, there is no reason to switch back to Text::CSV given that the csv file used is predictable enough to remove first and last chars and then split /\"\,\s\"/. You could argue that this is a "not invented here" approach, but I am even more loth to use CPAN sledegehammers to crack tiny little nuts where a few characters are all that are needed to avoid loading a module. Think: performance! Some cases are less obvious whether to use the CPAN module, but this one seems clear enough, although I will move it to a utility module where it can be readily replaced with a use of Text::CSV if circumstances change. One world, one people*	[reply]
Re^5: Matching alphabetic diacritics with Perl and Postgresql by Tux (Canon) on Jun 04, 2017 at 20:31 UTC
There is a speed compare page available. Your `split` will FAIL on one of the easiest pitfalls, used for the timing. This perfectly formatted CSV line will break any `split` pattern, and it does not even contain embedded newlines: `hello,","," ",world,"!"` [download] If you need correct CSV parsing of purely strict CSV, and that excludes space after the separator, use a module like Text::CSV::Easy_XS, which allows no deviation from the standard. If you require speed in addition to robustness, options (like space after the separator) and a usable interface, use Text::CSV_XS. If XS is not an option, use any of the _PP variants. The more data you have to parse, the happier a module will make you. It's loading time outweighs the headaches of finding possible breakages. Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^5: Matching alphabetic diacritics with Perl and Postgresql by 1nickt (Canon) on Jun 04, 2017 at 13:49 UTC
Think: performance! I'm pretty sure Tux has thought quite a bit about performance. Why don't you try benchmarking your routine against Text::CSV_XS? I think you may be surprised just how performant that CPAN sledgehammer is. The way forward always starts with a minimal test.	[reply]


Don't ask to ask, just ask
	PerlMonks