Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Second CountZero's objection to lack of input data.

However, when I run your code, supplying a test file (multi-char strings, some of which satisfy either of two contradictory definitions of bigrams* and some non-bigram strings such as nobigram, each separated by 2 newlines), your code returns " : 1" for pairs of words -- each concatenated with the following word (except in the case of "nobigram" which is first concatenated with the preceding word and then with the following word).

# output: Bigrams aabaaacc : 1 ccnobigram : 1 nobigramabcda : 1 abcdabcdaaaa : 1 bcdaaaanobigram : 1 nobigrambbhbbbb : 1 bbhbbbbbbb : 1 bbbaba : 1 abaaaaabbb : 1 aaaabbbcccccccccc : 1 cccccccccccbbbaaaccc : 1 cbbbaaacccccbbaaccc : 1

I cannot reproduce your "loop is ended after splitting EXCEPT (and then not perfectly) by removing the doubled curly braces at Lns 9-10 and 22-23, in which case only a single instance of the final two words of sample data are returned (along with the " : 1"). (Update: That's not a correct count for either definition cited for a bigram)

As to "why found is used," it seems possible, in the overly limited context you've provided, that it's intended to be a counter -- a variable in which to stash the number of bigrams found. I realize that seems exessivly obvious, but, IMO, it's the only obvious possible-answer.

In any case, if counting is your intent, please see http://search.cpan.org/~emorgan/Lingua-EN-Bigram-0.01/lib/Lingua/EN/Bigram.pm (or some fork for the language in which your interests lie).


*Definitions vary:

  • Wikikpedia says "A bigram or digram is every sequence of two adjacent elements in a string of tokens, which are typically letters, syllables, or words; they are n-grams for n=2.
     
    while
     
  • The Free OnLine Dictionary defines a bigram as a two-letter word (FOL is NOT, IMO, a reliable source, but Merriam-Webster and others define bigram only for those using paid access or their (one-shot) free trial).

For clarity, here is the content (verbatim) of the text file:

aabaaa cc nobigram abcda bcdaaaa nobigram bbhbbbb bbb aba aaaabbb cccccccccc cbbbaaaccc ccbbaaccc

In reply to Re^3: please reply by ww
in thread please reply by an

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others about the Monastery: (4)
    As of 2015-08-01 01:54 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









      Results (285 votes), past polls