On the evolution of some random graphs
by
Phil Pollett
Department of Mathematics
The University of Queensland
A RANDOM GRAPH
The construction. A random (undirected) graph with n vertices is constructed in the following way: pairs of vertices are selected one at a time in such a way that each pair has the same probability of being selected on any given occasion, and, each selection is made independently of previous selections. If the vertex pair is selected, then an edge is constructed which connects x and y.
Are multiple edges possible? In my model, yes! For example, if the vertex pair were to be selected k times, there would be k edges connecting x and y: a multiple edge contributing cycles of length 2.
ASYMPTOTIC BEHAVIOUR
Suppose that m edges have been selected. We shall be concerned with the behaviour of the graph in the limit as n and m become large, but in such a way that m=O(n).
The problem. Our problem is to determine the limiting probability that the graph is acyclic.
Motivation. Havas and Majewski present an algorithm for minimal perfect hashing (used for memory-efficient storage and fast retrieval of items from static sets) based on this random graph. Their algorithm is optimal when the graph is acyclic.
WHY ACYCLIC?
Consider a set W of m words (or keys). Every bijection , where , is called a minimal perfect hash function. HM find hash functions of the form
map keys to integers (they identify the pair of vertices of the graph corresponding to the edge w) and g maps integers to I.
Given and , can g be chosen so that h is a bijection?
If the graph is acyclic then, yes, it is easy to construct g from h. Traverse the graph: if vertex w is reached from vertex u then set
where e=(u,w).
EFFICIENCY
HM's algorithm generates and at random until an acyclic graph is found:
where and are tables of random integers and w[i] denotes the i-th character (an integer) of key i.
The efficiency of the algorithm is determined by the probability that the graph is acyclic: the expected number of iterations needed to find an acyclic graph will be (typically between 2 and 3).
EVALUATING
Theorem. If n and m tend to in such a way that , where c is a positive constant, the limiting probability p that the graph is acyclic is given by
Proof. On request. It uses results from [HM] and Erdös and Renyi.
SKETCH PROOF
Let be the number of cycles of length k and let . Following [HM] write
Now let , so that and
ER show that the distribution of is asymptotically Poisson: in particular,
It follows that
So, formally,
and hence
By Fatou's Lemma, we always have
from which it follows immediately that
this argument is valid even if the sum in (2) is divergent. We deduce immediately that if , .
When c<1/2, we have and
From Markov's inequality we have and so By Lemma 2 of [HM], we have, for each fixed , that as . In particular, for each , the sequence is bounded above by . It follows that is bounded above by . Further, since ,
Thus, by Dominated Convergence, we have
and, hence, .
THE FIVE STAGES OF EVOLUTION
PRIMORDIAL STEW: m(n)=o(n)
If , then (with limiting probability 1) all components are trees.
Trees of order k appear when m reaches order . In particular, , the number of trees of order k, has a (limiting) Poisson distribution with mean , where
Finally, if , the number of trees of order k is asymptotically normally distributed with mean and variance equal to
To be precise, . This result holds in the next two stages of evolution; we only require .
SPOOKY: , where 0<c<1/2
Cycles of all orders start to appear: , the number of cycles of order k, has a (limiting) Poisson distribution with mean .
Furthermore, with limiting probability 1, all components are either trees or consist of exactly one cycle (k vertices and k edges), the latter having a Poisson distribution with mean
where k is the order of the cycle.
The largest component is a tree; it has
vertices (with probability tending to 1).
A MONSTER APPEARS: , where
When (c=1/2), the largest component has (with probability tending to 1) vertices. When with c>1/2, a giant component appears: the largest component in the graph has G(c) n vertices, where G(c)=1-X(c)/2c and
Note that G(1/2)=0 and as .
Almost all the other vertices belong to trees: the total number of vertices belonging to trees is almost surely n(1-G(c))+o(n).
For c>1/2, the expected number of components in the graph is asymptotically
CONNECTEDNESS: , where
The graph is becoming connected: if
then (with probability tending to 1) there are only trees of order outside the giant component, the limiting distribution of the number of trees of order l being Poisson with mean . For example (k=1), if
there are (almost surely) only isolated vertices outside the giant component, the number of these having a limiting Poisson distribution with mean . And, the chance that the graph is indeed connected tends to (which itself tends to 1 as grows).
ASYMPTOTIC REGULARITY: , where
The whole graph becomes regular: with probability tending to 1, the graph becomes connected and the orders of all vertices are equal.