val ts : obj

Full name: index.ts
val box : value:'T -> obj

Full name: Microsoft.FSharp.Core.Operators.box

Exploring StackOverflow

Evelina Gabasova

@evelgab

?

Questions

Tags

  • What are the most common tags?

Tags

  • What are the most common tags?
  • When do people ask questions?

Question: When?

Weekend index

Most weekend

  1. Minecraft : 1.19
  2. LWJGL : 1.12
  3. SFML (Simple and Fast Multimedia Library) : 1.06
  4. D : 1.04
  5. pygame : 1.03

Most weekday

  1. SQL Server Reporting Services 2008, r2 : 0.11
  2. Infragistics : 0.13
  3. SQL Server Reporting Services 2008 : 0.13
  4. Axapta : 0.13
  5. DocusignAPI : 0.14

Most common tags

Functional languages

Continuous integration

Continuous integration

Who's your target user?

?

Question: Where?

Where?

  • 5 277 833 users in total
  • 769 541 filled in their location

$HOME

83%

(Unfortunately) Germany

7151 Mawson Station, Australian Antarctic Territory, Antarctica

Where?

Javascript

\[n \times \frac{1}{\text{population}} \times \frac{\text{registered}}{\text{located}} \times 1,000,000\]



ppm (Programmers-per-million)

Where really?

F#

(Sampling bias)

Tags + Users

=

Communities

Tags

define

relations

F#

C#

JS

R

Cobol

Evelina

1

0

1

1

0

Krzysztof

1

1

1

0

0

44 265 tags x 5 277 831 users

Users with more than 1,000 posts

Tags with more than 5,000 posts

807 tags, 1633 power users

t-SNE

t-distributed Stochastic Neighbourhood embedding

Embedding

t-SNE in R

1: 
2: 
3: 
4: 
5: 
library(tsne)

ts <- tsne(m, perplexity=20)

plot(ts)

t-SNE in F#

1: 
2: 
3: 
4: 
5: 
6: 
open RProvider
open RProvider.tsne

let ts = R.tsne(namedParams[ "X", box m; "perplexity", box 20])

R.plot(ts)

The best thing about R is that it was written by statisticians.

The worst thing about R is that it was written by statisticians.

Bow Cowgill, 2009

fable.io

Platzer, A. (2013). Visualization of SNPs with t-SNE. PLoS ONE, 8(2), e56883.

Quantifying Communities

Network

No overlap

iOS - R

Django - middleware

Most Central

Based on number of connections

  1. arrays
  2. string
  3. performance

Communities

Clustering of nodes in a network

(depending on algorithm used)

(depending on algorithm used)

(depending on algorithm used)

✔ Tags

✔ Users

✔ Technologies


Questions and Answers

Word2vec

word embeddings

t-SNE

Point + local neighbourhood

2D embedding

Word2vec

Word + local context

vector-space embedding

Word2vec


F# is a functional language on the .NET platform.

Scala is a functional/object oriented language on the JVM.

Scala - JVM + .NET

=

C#, F#

F# - .NET

=

SML, OCaml, GHCi, Haskell, Idris

Is

StackOverflow

a meritocracy?

Regression

input data

score

Input data

18,100,293 rows

"Accepted" "Question Score" "Number of tags" "Answer Count" "Comment Count" "Question Favourite Count" "Question View Count" "Author Reputation" "Author Profile Views" "Author Up Votes" "Author Down Votes" "Gold Badges" "Silver Badges" "Bronze Badges" "Author Number of Tags" "Time to Answer"

"Score"

linear regression, nonlinear regression, random forests, SVMs, neural networks

Most predictive

  • Question favourites
  • Question view count
  • Accepted

Quality matters

... unless you are Jon Skeet

Technological side of things

Tool for the job

Data science side of things

Questions

Evelina Gabasova

@evelgab

evelinag.com