The Johnson-Lindenstrauss lemma asserts that a set of n points in any high dimensional Euclidean space can be mapped down into an dimensional Euclidean space such that the distance between any two points changes by only a factor of for any .
Johnson and Lindenstrauss {cite} proved a fundamental mathematical result: any point set in any Euclidean space can be embedded in dimensions without distorting the distances between any pair of points by more than a factor of , for any . The original proof of Johnson and Lindenstrauss was much simplified by Frankl and Maehara {cite}, using geometric insights and refined approximation techniques.
Suppose we have a set of -dimensional points and we map them down to dimensions, for appropriate constant . Define as the linear map, that is if , then . For example could be a matrix.
The general proof framework
All known proofs of the Johnson-Lindenstrauss lemma proceed according to the following scheme: For given and an appropriate , one defines a suitable probability distribution on the set of all linear maps . Then one proves the following statement:
Statement: If any is a random linear mapping drown from the distribution , then for every vector we have
Having established this statement for the considered distribution , the JL result follows
easily: We choose at random according to F. Then for every , using linearity
of and the above Statement with , we get that fails to satisfy
with probability at most
. Consequently, the probability
that any of the pairwise distances is distorted by by more than is at most . Therefore, a random works with probability at least .
Given an matrix and an matrix ,
we present 2 simple and intuitive algorithms to compute
an approximation P to the product , with provable
bounds for the norm of the "error matrix" .
Both algorithms run in time. In both
algorithms, we randomly pick columns of A to
form an matrix S and the corresponding rows of
B to form an matrix R. After scaling the columns
of S and the rows of R, we multiply them together to
obtain our approximation P . The choice of the probability
distribution we use for picking the columns of
A and the scaling are the crucial features which enable
us to give fairly elementary proofs of the error bounds.
Our rest algorithm can be implemented without storing
the matrices A and B in Random Access Memory,
provided we can make two passes through the matrices
(stored in external memory). The second algorithm has
a smaller bound on the 2-norm of the error matrix, but
requires storage of A and B in RAM. We also present
a fast algorithm that \describes" P as a sum of rank
one matrices if B = A