val ts : obj
Full name: index.ts
val box : value:'T -> obj
Full name: Microsoft.FSharp.Core.Operators.box
Exploring StackOverflow
Evelina Gabasova
@evelgab
?
Tags
- What are the most common tags?
Tags
- What are the most common tags?
- When do people ask questions?
Question: When?
Most weekend
- Minecraft : 1.19
- LWJGL : 1.12
- SFML (Simple and Fast Multimedia Library) : 1.06
- D : 1.04
- pygame : 1.03
Most weekday
- SQL Server Reporting Services 2008, r2 : 0.11
- Infragistics : 0.13
- SQL Server Reporting Services 2008 : 0.13
- Axapta : 0.13
- DocusignAPI : 0.14
Continuous integration
Who's your target user?
Question: Where?
Where?
- 5 277 833 users in total
- 769 541 filled in their location
7151 Mawson Station, Australian Antarctic Territory, Antarctica
Where?
\[n \times \frac{1}{\text{population}} \times \frac{\text{registered}}{\text{located}} \times 1,000,000\]
ppm (Programmers-per-million)
Where really?
(Sampling bias)
Tags + Users
=
Communities
Evelina |
1 |
0 |
1 |
1 |
0 |
Krzysztof |
1 |
1 |
1 |
0 |
0 |
44 265 tags x 5 277 831 users
Users with more than 1,000 posts
Tags with more than 5,000 posts
807 tags, 1633 power users
t-SNE
t-distributed Stochastic Neighbourhood embedding
t-SNE in R
1:
2:
3:
4:
5:
|
library(tsne)
ts <- tsne(m, perplexity=20)
plot(ts)
|
t-SNE in F#
1:
2:
3:
4:
5:
6:
|
open RProvider
open RProvider.tsne
let ts = R.tsne(namedParams[ "X", box m; "perplexity", box 20])
R.plot(ts)
|
The best thing about R is that it was written by statisticians.
The worst thing about R is that it was written by statisticians.
Bow Cowgill, 2009
fable.io
Platzer, A. (2013). Visualization of SNPs with t-SNE. PLoS ONE, 8(2), e56883.
Quantifying Communities
No overlap
iOS - R
Django - middleware
Most Central
Based on number of connections
- arrays
- string
- performance
Communities
Clustering of nodes in a network
(depending on algorithm used)
(depending on algorithm used)
(depending on algorithm used)
✔ Tags
✔ Users
✔ Technologies
Questions and Answers
t-SNE
Point + local neighbourhood
⬇
2D embedding
Word2vec
Word + local context
⬇
vector-space embedding
Word2vec
F# is a functional language on the .NET platform.
Scala is a functional/object oriented language on the JVM.
Scala - JVM + .NET
=
C#, F#
F# - .NET
=
SML, OCaml, GHCi, Haskell, Idris
Is
StackOverflow
a meritocracy?
Regression
input data
⬇
score
Input data
18,100,293 rows
"Accepted"
"Question Score"
"Number of tags"
"Answer Count"
"Comment Count"
"Question Favourite Count"
"Question View Count"
"Author Reputation"
"Author Profile Views"
"Author Up Votes"
"Author Down Votes"
"Gold Badges"
"Silver Badges"
"Bronze Badges"
"Author Number of Tags"
"Time to Answer"
⬇
"Score"
linear regression, nonlinear regression, random forests, SVMs, neural networks
Most predictive
- Question favourites
- Question view count
- Accepted
Quality matters
... unless you are Jon Skeet
Technological side of things
Tool for the job
Data science side of things
Questions
|
Evelina Gabasova
@evelgab
evelinag.com
|