Re^5: Supervised machine learning algo for text matching across two files


Perl-Sensitive Sunglasses
	PerlMonks

Re^5: Supervised machine learning algo for text matching across two files

by choroba (Cardinal)

on May 24, 2017 at 21:15 UTC ( [id://1191156]=note: print w/replies, xml )

Need Help??

in reply to Re^4: Supervised machine learning algo for text matching across two files
in thread Supervised machine learning algo for text matching across two files

You can add a feature like "if you split the long string to words based on a dictionary and extract first letters, you'll get part of the abbreviation." Then let the algorithm decide whether it's useful or not. Similarly, you can train the algorithm on a large corpus of downloaded texts, maybe the fact that the words tend to appear in the same article could be used as a feature, too (or at least some number expressing their collocability).

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
[download]

Comment on Re^5: Supervised machine learning algo for text matching across two files Download Code

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://1191156]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others imbibing at the Monastery: (2)

As of 2024-04-25 20:03 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found