Evelina Gabasova (@evelgab)
"F# empowers users to tackle complex computing problems with simple, maintainable and robust code."
Recognizing languagesWhat is the language of this text? A Csillagok haboruja egy uropera filmsorozatnak, irodalmi muveknek es szamitogepes jatekoknak a neve. |
This is Hungarian, of course! |
[ NEAREST NEIGHBOUR CLASSIFIER ]
Get sample text from from Wikipedia pages (done)
Calculate features frequencies of letter pairs
Compare languages using their features
Example using sample English text "the three"
|
|
|
Now calculate probabilities of the pairs
|
|
|
th |
e_ |
ee |
el |
|
English |
0.3 |
0.2 |
0.2 |
0.1 |
Portuguese |
0.0 |
0.2 |
0.1 |
0.3 |
Distance is the sum of squares of differences.
th |
e_ |
ee |
el |
|
English |
0.3 |
0.2 |
0.2 |
0.1 |
Portuguese |
0.0 |
0.2 |
0.1 |
0.3 |
Difference |
0.3 |
0.0 |
0.1 |
-0.2 |
Sum of squares: \(0.09+0.0+0.01+0.04 = 0.14\)
English |
Spanish |
Portuguese |
Czech |
|
Unknown text |
0.10 |
0.14 |
0.25 |
0.27 |
[ PERCEPTRON ]
[ LOGISTIC REGRESSION ]
\(f(x) = \frac{1}{1 + e^{-x}}\)
Initial weights can be generated randomly
Improve weights using gradient descent
Repeat recursively until certain error or number of steps
FsLab Package www.fslab.org
@evelgab | |
evelina@evelinag.com | |
github.com/evelinag | |
evelinag.com |