Comparing F# and C# with dependency networks

Fans of different programming languages always argue about benefits of their language of choice. It is difficult to use objective criteria in a debate like this. Terms like 'clarity' or 'maintainability' are too vague and subjective. What if we used some tools from network science to compare projects written in different languages?

In this blog post I use network analysis to investigate how complex dependency graphs are and if they differ between C# and F#. It turns out that F# and C# dependency networks have quite different structures and use different local network patterns. For example, I'll describe specific types of cyclic dependencies that frequently appear only in C# projects.

Examples of motifs on 3 and 4 nodes

This blog post is an addition to an excellent article by Scott Wlaschin on modularity and cyclic dependencies in real-world F# and C# projects Cycles and modularity in the wild. I wanted to look at the same data that Scott extracted in his article but from network analysis perspective.

Dependency networks

For my analysis, I extracted dependency networks from 40 different projects, half of them written in C# and half of them in F#. All the networks come from compiled assemblies that can be downloaded through NuGet. I used similar method as Scott for my analysis. If you want more details, head over to F# for fun and profit for a more detailed description. I'll give just a brief overview here.

Structure of a dependency network

A dependency network is formed by nodes and oriented links between them.

Nodes in the dependency network are formed by

  • Classes in C#
  • Modules in F#

Compiler turns F# modules into static classes so the two definitions should be roughly comparable, at least on the CIL level. Both types of nodes represent only top-level classes and modules, nested types are incorporated into their parent class or module. The networks analyzed in this blog post contain all the classes and modules from each project, not just the public ones.

Link from A to B

Links between the nodes represent dependencies. There is a link from A to B in the network if:

  • Class B inherits from class A or implements interface A.
  • Function in B calls a function or method from A.
  • Field, property, method or function in B references A as a parameter or as a return type.

Note that I switched direction of dependency arrows in the network compared to the original article at F# for fun and profit. Now links represent the direction in which information is passed between nodes. This definition corresponds more to the logic of information flow in a program. For example if there is a bug in a function, it will propagate along the dependency arrows into all nodes that call the function.

Projects under the spotlight

I expanded the list of analysed projects compared to the original analysis at F# for fun and profit. Again, the projects are not directly comparable in general. I hope that by using more projects, data get averaged and we can get a bigger picture out of them. The results are still biased from the small sample size though.

Here are the 40 projects (individual dlls) that got included into the analysis (in no particular order):

C# projects:

Antlr, AutoMapper, Castle, elmah, EntityFramework, FParsecCS, log4net, MathNet.Numerics, SignalR, Bcl.Runtime, Owin, Cecil, Moq, Nancy, Newtonsoft.Json, Nuget, NUnit, SpecFlow, xunit, YamlDotNet

F# projects:

canopy, Deedle, Fake, Foq, FParsecFS, FsCheck, FSharp.Compiler.Service, FSharp.Core, FSharp.Data, FSharp.Data.Twitter, FSharpx, FsPowerPack, FsSql, FsUnit, FsYaml, Storm, TickSpec, WebSharper, WebSharper.Core, WebSharper.Html

Network statistics

The networks extracted from compiled project dlls have very different sizes. The following chart shows the number of nodes (classes or modules) and number of dependencies in each project. The axes in the figure are logarithmic so that we can put data with different scales into one picture.

Number of nodes vs. number of dependencies

Projects written in F# seem to be generally smaller. On the other hand, C# projects tend to be larger both in the number of nodes and number of dependencies. It is interesting that the plot looks approximately like a straight line. This indicates a power law relation between the number of nodes and links both in F# and C# projects.

Next question we might ask is how complex are the networks? One measure of complexity in code depedendency networks might be how many dependencies are chained together in the graph. Long chains of dependencies increase complexity of code. For example, bugs that get propagated through a long dependency path might affect a large part of the whole project. A standard measure for this is the network diameter. It is computed by looking at shortest paths between all possible pairs of nodes in a network. Diameter is defined as the length of the longest of these paths. For diameters in C# and F# projects we get these box plots:

Network diameters

Diameter of analyzed C# projects is on average more than double the diameter of F# projects. Diameters are actually roughly proportional to the number of nodes and links in each network. Because C# has larger networks, diameters expand as well.

One aspect where F# and C# projects differ dramatically is the number of isolated nodes. These represent standalone modules or classes that do not have any dependency within the project. Here is a box plot showing the proportion of standalone nodes.

Isolated nodes

Isolated nodes appear much more frequently in F# projects than in C# projects. This is probably an effect of different programming paradigms. Object-oriented language like C# might require the programmer to introduce more dependencies into the code. As a result, functional F# has cleaner modularity than C# on average.

Below are images of networks from two different projects as an example. There is Yaml.NET on the left and FSharp.Core on the right. The two projects are not comparable in terms of their scope. However, their networks have roughly the same number of nodes and similar diameter.

FSharp.Core has more isolated nodes that do not have any dependencies within the project which seems to be typical for F# projects. The densely connected core of the project is much smaller than in C#. The two networks are meant just as an illustration of typical features of C# and F# dependency networks.

Yaml.NET network

FSharp.Core network

Here are the detailed numbers for the analyzed projects:

C# code statistics

Project

Code size

Number of nodes

Number of links

Isolated nodes

Diameter

Antlr

34344

91

257

8.8 %

5

AutoMapper

34793

152

549

5.3 %

8

Castle

112538

430

1766

5.6 %

8

elmah

43728

116

300

7.8 %

5

EntityFramework

1144189

1679

11671

4.7 %

16

FParsecCS

32230

35

48

14.3 %

3

log4net

102651

227

746

0.9 %

10

MathNet.Numerics

492095

342

1285

5.6 %

8

SignalR

63690

221

735

6.8 %

11

Bcl.Runtime

73

8

2

62.5 %

1

Owin

13376

55

98

10.9 %

7

Cecil

100650

240

1145

5.0 %

8

Moq

158417

541

1536

11.1 %

14

Nancy

130818

369

1205

5.4 %

12

Newtonsoft.Json

157716

237

1005

4.6 %

13

Nuget

101586

229

943

2.2 %

10

NUnit

45873

183

505

14.2 %

7

SpecFlow

41187

242

578

2.5 %

7

xunit

14590

72

209

1.4 %

7

YamlDotNet

42372

161

550

2.5 %

7

F# code statistics

Project

Code size

Number of nodes

Number of links

Isolated nodes

Diameter

canopy

23630

11

12

27.3 %

2

Deedle

122918

95

249

18.9 %

5

Fake

1395

3

1

33.3 %

1

Foq

38532

40

75

5.0 %

3

FParsecFS

45946

6

4

33.3 %

2

FsCheck

76418

54

103

16.7 %

5

FSharp.Compiler.Service

110523

42

23

50.0 %

2

FSharp.Core

206348

154

287

40.3 %

6

FSharp.Data

135001

94

173

8.5 %

6

FSharp.Data.Twitter

10372

20

29

25.0 %

3

FSharpx

290577

175

77

56.0 %

2

FsPowerPack

102878

93

68

46.2 %

4

FsSql

15311

13

14

0.0 %

4

FsUnit

1580

2

0

100.0 %

0

FsYaml

14573

8

10

12.5 %

3

Storm

55072

67

195

3.0 %

5

TickSpec

27970

34

48

5.9 %

3

WebSharper

43747

56

22

57.1 %

2

WebSharper.Core

83201

12

13

25.0 %

2

WebSharper.Html

14152

19

37

10.5 %

2

Network motifs

We looked at some global properties of dependency networks, now we turn to explore more local features. Motifs are small reccurring patterns of links between nodes that appear in real-life networks. For example, there has been a lot of research on motifs in gene regulatory networks and their functional meaning. We can apply the same approach to our dependency networks to see if there are any typical patterns.

Motif finding in general networks is computationally hard because it involves identifying graph isomorphisms. The larger the motif, the harder it is to find it in a network. In this analysis, I looked only at motifs on three and four nodes. I used the igraph package in R with F# RProvider. The motif finding function from igraph counts the number of times each possible motif on three or four nodes appears in a given network.

Motifs on 3 nodes

Motifs on 3 nodes

There are 13 possible motifs on three nodes. I computed how many times each of these motifs appears in the project networks. Because each network has different size, the counts were normalized with respect to the total number of motifs in each network. The following bar plot compares average frequencies of all the motifs.

Average motif profiles on 3 nodes in C# and F# projects

Motif profiles in C# and F# projects

Motifs number 1, 2, 4 and 5 are the most common in both C# and F# projects. They seem to differ only in how often each motif appears. The results seem quite intuitive because these motifs look like standard patterns that would be expected in a software project. The bar plot shows only the average frequencies and variance between individual projects is quite high. Summary of results for each project is available here.

Motifs that are C#-specific

What is interesting is that there are several motifs that appear in many C# projects but they are not in any of the analyzed F# projects. Here they are:

C# only motifs

Additionally motif number 12 appears just once in FSharp.Core and nowhere else among the F# projects. What all these motifs have in common is that they all contain cyclic dependencies. Scott Wlaschin wrote a nice blog post on why cyclic dependencies are evil. Simply said, they add complexity, mess up structure of code and complicate maintainability. So, this is how the evil cyclic dependencies look in real-world projects. Especially motif number 13 with full connectivity looks like something that should be avoided. How frequent are these cyclic motifs?

Motif

Number of projects

8

13

9

9

10

14

13

4

The table shows how many projects contain each of the C#-specific motifs. Motifs number 8 and 10 are in majority of the analysed networks which means they are quite widespread. Fortunately, the most entagled motif number 13 is the least common one and occurs only in 4 projects. There are no motifs that would appear only in F# projects.

Motifs on 4 nodes

I will not give the full analysis of motifs on 4 nodes because there are 199 of them. However, there are a few interesting things to point out. Again, F# and C# share the most common motifs which look like patterns that we would expect to see:

Most common motifs on 4 nodes

And again, we have some motifs that appear exclusively in C# projects, this time we have 129 motifs that are C#-only. There are no motifs that would be just in F# projects. These are the most common C#-specific ones:

C#-specific motifs on 4 nodes

These motifs are also quite widespread.
The first one appears in 14 projects, the rest of them in 13 projects. Finally, what about the most complex motif on 4 nodes?

The most complex motif on 4 nodes

It turns out that this motif appears in 3 of the C# projects (specifically in EntityFramework, Mono.Cecil and Newtonsoft.Json). This pattern looks like quite a poor design choice.

Explore motifs in your projects

If you want to find what is the motif profile of your own project, this FsLab Journal shows how to run the analysis. Source code from the Journal is available here. You can also download the full source code that replicates results from this blog post from my GitHub page.

Summary

In this blog post, I looked at dependency networks in several C# and F# projects. The analysis shows some similarities and differences between the two programming languages. In general, C# projects tend to be larger, with more classes and dependencies. They also have longer chains of dependencies on average. Real world F# projects are smaller with cleaner modularity.

I also described recurring patterns (motifs) that appear in dependency networks. The most common motifs are similar in C# and F# projects. However, most of C# projects contain motifs with complicated cyclic dependencies that do not appear in F# at all. Cyclic dependencies in general complicate the code and obscure dependency structure.

This analysis is still very limited. For example we can debate if
the dependency networks are well defined with respect to both languages to be truly comparable. Nevertheless, it seems that this type of analysis can reveal some aspects of dependency networks.

In general, it seems that most C# projects would be harder to maintain because of all the cyclic dependencies and more complex structure overall. The question is whether it is a feature of the language itself that encourages programmers to create more complex systems.

I also presented a poster on this topic at Cambridge Networks Day 2014.

Correction 13/6/2014: Relation between number of nodes and number of links is a power law function.

comments powered by Disqus