Given an arbitrary set of points in the plane, we would expect most pairs of points to determine distinct lines–this is the “generic” case. The obvious counterexample is when all of the points lie on a single line. Beck’s lemma says that one of these two situations must occur (up to constants) in any finite set of points in the plane.

Beck’s Lemma: Let {P} be a subset of {\mathbb{R}^n} with {|P|=n}. Then one of the following holds:

  1. There is a line containing {cn} points of {P} for some constant {c},
  2. or there are {c'n^2} distinct lines containing at least two points of {P} each.

Note that (2) is the generic case, while (1) is a counterexample to the generic case.

Proof of Beck’s lemma: We say that {a,b\in P} are {j}-connected if the line determined by {a} and {b} contains between {>2^j} and {\leq 2^{j+1}} points of {P}.

Let {L_j} be the set of lines determined by {P} containing more than {2^j} points of {P}. Then by the Szemerédi-Trotter theorem:

\displaystyle |L_j|2^j\leq I(P,L_j)\leq\max\{4n+|L_j|,4n^{2/3}|L_j|^{2/3}+|L_j|\}.

If the max is {4n+|L_j|}, then we have {|L_j|\leq 4n/(2^j-1)} and if the max is {4n^{2/3}|L_j|^{2/3}+|L_j|}, then we have {|L_j|\leq 64n^2/(2^{j}-1)^3}. Thus

\displaystyle |L_j|\lesssim\frac n{2^j}+\frac{n^2}{2^{3j}}.

If {a} and {b} are {j}-connected, then they lie on a line of {L_j}. Thus the number of lines created by a {j}-connected pair is at most {L_j}. Each such line contains at most {{2^{j+1}\choose 2}} pairs of points, so

\displaystyle \# j\mbox{-connected pairs}\lesssim 2^jn+\frac{n^2}{2^j}.

Let {C} be a positive constant that we will determine later. From now on we focus our attention on {j} such that {C\leq 2^j\leq n/C}. For this range of {j}, the number of {j}-connected pairs is bounded above by {n^2/C}:

\displaystyle \sum_{C\leq 2^j\leq n/C }2^jn+\frac{n^2}{2^j}\lesssim \frac{n^2}C,

where we bounded the sum of first term by the highest term (i.e. {1+2+\cdots+2^j\leq 2^{j+1}}) and we bounded the second term using the geometric series ({1+2^{-1}+2^{-2}+\cdots = 2}).

Since there are {{n\choose 2}=n(n-1)/2} pairs of points, if {C} is large enough then there are more than {n^2/4} pairs of points that are not {j}-connected for {C\leq 2^j\leq n/C}. If one pair is above this range, then the line that they determine contains at least {n/C} points, which gives us (1). Otherwise we have more than {n^2/4} pairs below this region and so we have option (2). QED

Beck’s original proof did not use the Szemeredi-Trotter theorem—in fact, Beck’s paper was published in the same issue of Combinatorica as Szemeredi and Trotter’s paper. Instead he proved the bound on the number of lines containing between 2^j and 2^{j+1} points:

\displaystyle |L_j|-|L_{2j}|\lesssim\frac{n^2}{2^{(2+\delta)j}},

where \delta=1/20 and 1\leq 2^j\leq(2n)^{1/2}. This estimate is sufficient for our argument.