Graphs in nature

Examples

May be directed or undirected. Citation graph can be considered directed. Coauthorship graphs. Query graphs. Online social network websites

Affiliation networks

A bipartite graph G’=(A,C,E) of actors A, communities C and edges E. Can implicitly define social network amongst A: by folding G’/ forming edges between actors depending on shared memberships in \(c \in C\).

Static patterns

Small diameter

Small world phenomenon in social networks: remarkably short paths connect very unrelated people. Eg: A certain developmental psychologist has Erdos number 3. 6 degrees of separation idea.

Two randomly selected web pages are connected is around .35.

So, effective diameter of the graph is small.

In and out Degree distribution

Heavy tailed distributions. Folks look at shape of the graph, and observe a distribution pattern.

May follow power law distribution over a large degree-range; observed in online social networks, citation graphs. So, number of nodes with degree d \(\propto d^{-\gb}\). \(\gb\) may or mayn’t vary over time.

Else DGX distribution.

Else log normal distribution.

In-degree distributuion is more skewed, usually.

Scree plot

Plot of ew vs rank using log vs log scale. Approximately obeys power law: You see a straight line.

Clustering coefficient

In social networks, most new edges form to close triangles. Let degree(v)=d. Clustering coefficient \(C_{d}(v) = \frac{\Del(v)}{\nbd(v)(\nbd(v)-1)/2}\), where \(\Del(v) = \triangle\) centered at v: the fraction of potential \(\triangle\)’s centered at v which actually exist.

\(C_{d} = avg_{v:deg(v)=d}(C_{d}(v))\). Gets smaller with increasing degree. Follows power law distribution: \(C_{d} \propto d^{-1}\).

Global clustering coefficient \(C = avg_{v}(C_{d}(v))\). It is a measure of transitivity in the network. C in real networks is significantly higher than for random networks, conditioned on same degree distribution.

Indicates some degree of hierarchial and community structure. The low-degree nodes belong to very dense sub-graphs and those sub-graphs are connected to each other through hubs.

Community structure

2 meanings

Communities amongst V observed in real world G, due to homophily. In the internet, many dense bipartitite subgraphs observed (Border).

Communities in the social networking websites often don’t correspond to dense subgraphs of the social network.

Hierarchical structure

Found in social network, internet: a scale-free self similar structure observed.

Fractal or onion structure

Similar structure observed as you keep zooming in on a part of G.

Group size: Power law distribution. Number of affiliations of nodes of a certain degree: exponential distribution. \cite{zhelevaSocialAffiliationNw} Group membership: a high fraction of singletons: especially in small groups, decreases with group size. With groups size, highest degree of member nodes increases.

Core-periphery structure

found in internet autonomous systems.

Network motifs

Basic building blocks of complex networks. Frequency of occurance compared to random graphs, function then hypothesized. Graph isomporphism checks feasible only for motifs of size \(\leq 5\).

Eg: gene regulatory networks.

Microscopic properties

Most (70 to 90%) edges are bidirectional \cite{leskovecThesis}..

Other properties

Eigenvector components distribution skewed.

Resilience to random node attacks, but suceptibility to high degree hub node attacks.

Stress.

Temporal evolution patterns

% See \cite{LeskovecThesis}.

Densification

Polynomial densification: \(|E(t)| \propto |V(t)^{\ga}|\) with \(\ga >1\). So, avg degree increases with time.

Relationship with power law degree distribution

\(\gb \in (1,2) \implies \ga = 2/\gb\).

\(\gb >2, \ga \implies\) \(\gb(n) = \frac{4n^{\ga-1}-1}{2n^{\ga-1}-1}\) changes in a certain way over time.

Both observed in actual graphs \cite{leskovecThesis}.

Shrinking diameter

Diameter shrinks over time, asymptotically levels off.

Densification alone doesn’t not result in this (experiment on \(G_{n,p}\)). Densification with particularly evolving degree sequence result in this. Nothing special about edge attachments: make random graphs with same degree sequences. \cite{leskovecThesis}

Node arrival function

N: time \(\to \set{0,1}\). High variation among networks: exponential to sublinear.

Edge initiation process

Node lifetime L

After a certain time, node does not form new edges. Spike at L=0, as many people join the network and never visit the networking website again. Best modelled by \(Pr(L = l) = ke^{-lk}\): exponential distribution.

Age

Edge initiation rate seems to stay constant accross ages.

Graph shows spike at age=0, as many people join the network and never visit the networking website again.

Time gap G between edges

G follows power law with exponential cutoff. \(p(x) \propto x^{-\ga}e^{-x\gb}\). \(\ga\) stays constant with time, but \(\gb\) grows linearly.

Edge (u, v) destination selection process

Most (30 to 60%) new edges close triangles: local. Pr(v is k hops away) \(\propto e^{-ck}\); Pr(certain v: k hops away) is doubly inverse exponential as number of verteces k hops away itself grows exponentially. \cite{leskovecThesis}

For the creation of the first edge: \(Pr(degree(v) = d) \propto d^{t}\), where t = 1 or 0.9 etc..

Computer networks

Eg: Internet.

Important matrices

Matrices showing delay, traffic etc..

Matrix completion problems

Often these have missing values. So, use matrix completion approaches.

Peculiarities

Maybe try to embed network: but often triangle inequality and symetry doesn’t hold.

Missing values tend to be highly structured. But many matrix completion methods assume that missing values tend to be uniformly distributed.