Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Word Count and Match

by PilotinControl (Pilgrim)
on Jan 07, 2021 at 20:05 UTC ( [id://11126550]=perlquestion: print w/replies, xml ) Need Help??

PilotinControl has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: Word Count and Match
by davido (Cardinal) on Jan 07, 2021 at 21:22 UTC

    So... an adaptation of the code you presented works as you seem to want:

    #!/usr/bin/env perl use strict; use warnings; my %count; my @words = qw(Bob Tom Dave John Jeremy Max Tom Harold Tom Bob Pete Pe +te Frank Tom); my $wordcnt='Bob|Tom|Dave|Tom|Bob|Dave'; foreach my $word (@words){ if($word=~/($wordcnt)/io){ $count{$1}++; } } print "$_ => $count{$_}\n" for sort keys %count;

    This produces:

    Bob => 2 Dave => 1 Tom => 4

    There's no extra output with some total count.

    Your problem would be a lot easier to diagnose if you supplied us with the following:

    • A small amount of sample data.
    • The smallest practical snippet of compilable code that we can run to demonstrate the failure you are seeing.
    • The sample output you desire, given the sample input provided.

    Currently we're debugging code that probably isn't the code you are running, and we're only able to guess at what the data looks like.

    I don't think you need the /o modifier for your regular expression; it doesn't do anything useful nowadays. And it's probably better to use $wordcnt = qr/Bob|Tom..../; instead regular quotes, though that's not strictly necessary. You should probably surround your pattern match with \b anchors for word boundaries so that you don't match "Bob" for the name "Bobby". Your alternation is also silly; what are repeated alternates supposed to do?

    I scanned back through some of your previous posts over the past seventeen years; you've been asked to provide real code and sample input and output in the past.


    Dave

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Word Count and Match
by kcott (Archbishop) on Jan 07, 2021 at 21:29 UTC

    G'day PilotinControl,

    I see:

    • Partial code that no one can test.
    • No input data.
    • No output data.
    • Useless error reports: "some weird happenings" and "it did nothing".

    I think you've been here long enough (17 years and 210 posts) to know better.

    My best guesses:

    • Use Text::CSV to extract data. It'll run faster if you also have Text::CSV_XS installed.
    • Remove duplicates from your regex alternation.
    • Apply boundary assertions to your regex.
    • Remove the 'i' modifier from your regex.

    Kindly read "How do I post a question effectively?" before posting again. Follow what it says if you do post again. You'll probably get better answers if you include an SSCCE.

    — Ken

      > I think you've been here long enough (17 years and 210 posts) to know better.

      hmm ... compare for instance Remove a Line

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        ++ I'm impressed that you could be bothered to go back through eight years of his posts to find that.

        It's pretty much the same code: railcar is now word (or name in later posting); and Boxcar is now Bob (or David in the same later posting) and so on.

        — Ken

Re: Word Count and Match
by eyepopslikeamosquito (Archbishop) on Jan 07, 2021 at 21:23 UTC
Re: Word Count and Match (updated)
by AnomalousMonk (Archbishop) on Jan 07, 2021 at 21:26 UTC
    my $wordcnt='Bob|Tom|Dave|Tom|Bob|Dave';

    What is the significance of the repetition of words in $wordcnt? Repeated patterns in a regex alternation have no effect. The regexes /Bob|Tom|Dave/ and /Bob|Tom|Dave|Tom|Bob|Dave/ and /Bob|Tom|Dave|Tom|Bob|Dave|Dave|Tom|Bob|Dave|Bob/ are equivalent.

    ... the code also gives a total word count ...

    I don't see how the posted code does this.

    ... MATCH the wordcnt with the FIELD in the flat file.

    This implies that each word being searched for occurs only once in a line from the flatfile (or maybe in the the entire flatfile). Is this true?

    Update: There are no boundary assertions in the OPed pattern, so 'TomTom' will count 'Tom' twice. Is this what you want?


    Give a man a fish:  <%-{-{-{-<

Re: Word Count and Match
by eyepopslikeamosquito (Archbishop) on Jan 07, 2021 at 21:37 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11126550]
Approved by Bod
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-24 00:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found