Growing Networks…

Could it be? Yes. Yes I think it is! Actual content! I never thought this day would come.

Last semester I took a course from Mark Newman on Network Theory. This class was awesome.

Network theory is a hot topic these days and is being used to study everything from friendship networks on Facebook to the spread of swine flu in epidemiology. One of the earlier contributions to the field were preferential attachment models.

Originally created to study citation networks (one academic paper cites another academic paper), this model is driven by the rule, I attach to you based on the number of attachments you already have. This idea was so successful because it was able to reproduce a number of features of the actual data, namely the power law seen in the degree distribution of these networks.

In other words, the distribution of the number of citations papers have is such that most papers have very few, but a non-vanishing amount will have a very large number of citations. Power laws are found in lots of places, perhaps most notably in the distribution of wealth. Some may recall the 80-20 rule, stating roughly 80% of the world’s wealth is concentrated in the hands of 20% of the world’s population.

Anyways, without adequate explanation i’ll just throw some math out there. The Barabasi-Albert model is essentially a special case of preferential attachment models. In this model, vertices are added one by one to a network and each is allowed to connect to exactly $$c$$ other vertices, where connections are weighted so that a node with more connections is more likely to be connected to. In this model, connections are undirected.

Skipping ahead to the good stuff, we can write down the degree distribution for this model.

$$p_k = frac{2c(c+1)}{k(k+1)(k+2)}$$

Here $$p_k$$ is the fraction of vertices that have $$k$$ connections. To see where the power law comes in, if we take the limit where $$k$$ becomes large, we fine,

$$p_kapprox k^{-3}$$.

So, what does this all mean. Well Professor Newman has done quite a bit of work involving citations networks and one of the major take-aways is that earlier papers have a major “first-mover” advantage as in they get cited more simply because they are the only things around to cite. This means the papers in the distribution with a huge number citations are most likely papers published in the very beginning of a field.

Depending on how cynical you are, you could say that these papers are either cited a lot because they are good, important papers, or they were only cited because there was nothing better to cite.

In any case, aside from all of the math, I am also interested in visualizing this information. One of our homework problems over the semester was to write a computer program that generated networks grown using this Barabasi-Albert Model.

I took my skeleton for that program and set about making a sketch in Processing that would actually draw the networks that got generated, rather than store them as a list of numbers. Obviously there isn’t enough screen real-estate to throw 1 million nodes out there, but 4000 or so still looks pretty cool.

Here is the result:
Growing Networks

The nodes towards the top can be interpreted as the “first” papers in the field. Notice they have A LOT of citations. The graph at the bottom is a histogram that shows (roughly speaking) the number of citations each vertex has, but the vertices are ordered based on when they were created. As you can see ones created in the very beginning have WAY more connections than anyone else.

Click the above image for the high-resolution version.

For contrast, I also wrote a program that simply generates random graphs (without this preferential attachment business). In math speak i generated a graph $$G(n,p)$$ where $$n$$ is the number of nodes and $$p$$ is the probability that any pair is connected. So for this network, we just throw down a bunch of nodes and then go through each of the $$frac{1}{2}n(n-1)$$ pairs of nodes and add an edge with probability $$p$$.

In the limit where $$n$$ becomes large, the degree distribution (the fraction of vertices that have a given number of connections) is given by a Poisson distribution. Even for a network of 4000 nodes, we begin to see this distribution. As an added measure I have also changed the node size to correspond to the number of connections it has. Thus larger nodes have more connections. Notice that with this type of growth very few nodes have over 7 or 8 connections.

Here is the random graph:
Growing Networks | Random Graph

Click the above image for the high-resolution version.

If anyone wants the source (its probably terribly written) feel free to e-mail me or request in the comments.

Leave a Reply