Word Count and Match

PilotinControl has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Word Count and Match by davido (Cardinal) on Jan 07, 2021 at 21:22 UTC
So... an adaptation of the code you presented works as you seem to want: `#!/usr/bin/env perl use strict; use warnings; my %count; my @words = qw(Bob Tom Dave John Jeremy Max Tom Harold Tom Bob Pete Pe +te Frank Tom); my $wordcnt='Bob\|Tom\|Dave\|Tom\|Bob\|Dave'; foreach my $word (@words){ if($word=~/($wordcnt)/io){ $count{$1}++; } } print "$_ => $count{$_}\n" for sort keys %count;` [download] This produces: `Bob => 2 Dave => 1 Tom => 4` [download] There's no extra output with some total count. Your problem would be a lot easier to diagnose if you supplied us with the following: A small amount of sample data. The smallest practical snippet of compilable code that we can run to demonstrate the failure you are seeing. The sample output you desire, given the sample input provided. Currently we're debugging code that probably isn't the code you are running, and we're only able to guess at what the data looks like. I don't think you need the `/o` modifier for your regular expression; it doesn't do anything useful nowadays. And it's probably better to use `$wordcnt = qr/Bob\|Tom..../;` instead regular quotes, though that's not strictly necessary. You should probably surround your pattern match with `\b` anchors for word boundaries so that you don't match "Bob" for the name "Bobby". Your alternation is also silly; what are repeated alternates supposed to do? I scanned back through some of your previous posts over the past seventeen years; you've been asked to provide real code and sample input and output in the past. Dave	[reply] [d/l] [select]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Word Count and Match by kcott (Archbishop) on Jan 07, 2021 at 21:29 UTC
G'day PilotinControl, I see: Partial code that no one can test. No input data. No output data. Useless error reports: "some weird happenings" and "it did nothing". I think you've been here long enough (17 years and 210 posts) to know better. My best guesses: Use Text::CSV to extract data. It'll run faster if you also have Text::CSV_XS installed. Remove duplicates from your regex alternation. Apply boundary assertions to your regex. Remove the '`i`' modifier from your regex. Kindly read "How do I post a question effectively?" before posting again. Follow what it says if you do post again. You'll probably get better answers if you include an SSCCE. — Ken	[reply] [d/l]
Re^2: Word Count and Match by LanX (Saint) on Jan 07, 2021 at 21:52 UTC
> I think you've been here long enough (17 years and 210 posts) to know better. hmm ... compare for instance Remove a Line Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^3: Word Count and Match by kcott (Archbishop) on Jan 07, 2021 at 22:22 UTC
++ I'm impressed that you could be bothered to go back through eight years of his posts to find that. It's pretty much the same code: `railcar` is now `word` (or `name` in later posting); and `Boxcar` is now `Bob` (or `David` in the same later posting) and so on. — Ken	[reply] [d/l] [select]
Re^4: Word Count and Match by LanX (Saint) on Jan 07, 2021 at 22:39 UTC
Re^5: Word Count and Match by kcott (Archbishop) on Jan 07, 2021 at 22:47 UTC
Re: Word Count and Match by eyepopslikeamosquito (Archbishop) on Jan 07, 2021 at 21:23 UTC
Your code is a mess. Please provide a Short, Self-Contained, Correct Example. Counting the frequency of words in Perl is a FAQ, best solved with a hash. See for example: How do I count the frequency of words in a file and save them for later? Count the frequency of words (perl maven) Count the frequency of words (Stack Overflow)	[reply]
Re: Word Count and Match (updated) by AnomalousMonk (Archbishop) on Jan 07, 2021 at 21:26 UTC
`my $wordcnt='Bob\|Tom\|Dave\|Tom\|Bob\|Dave';` What is the significance of the repetition of words in `$wordcnt`? Repeated patterns in a regex alternation have no effect. The regexes `/Bob\|Tom\|Dave/` and `/Bob\|Tom\|Dave\|Tom\|Bob\|Dave/` and `/Bob\|Tom\|Dave\|Tom\|Bob\|Dave\|Dave\|Tom\|Bob\|Dave\|Bob/` are equivalent. ... the code also gives a total word count ... I don't see how the posted code does this. ... MATCH the wordcnt with the FIELD in the flat file. This implies that each word being searched for occurs only once in a line from the flatfile (or maybe in the the entire flatfile). Is this true? Update: There are no boundary assertions in the OPed pattern, so `'TomTom'` will count `'Tom'` twice. Is this what you want? Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re: Word Count and Match by eyepopslikeamosquito (Archbishop) on Jan 07, 2021 at 21:37 UTC
One more thing, I noticed you're using the `/o` modifier here: `if($word=~/($wordcnt)/io){` [download] You should remove it - see regex "o" modifier especially tye's response and Aristotle's response: Never use /o Updated: Added Aristotle's response	[reply] [d/l] [select]