Partitioning Sets and Families (8)

Let’s focus on Lemma A.1 of Paper S4 that shows the number bound indicating the chart we made last time. To show (4) there, the formulation of Lemma A.2 in Appendix of Paper 3 is not quite relevant although it is ok for Lemma 4.1 on p. 12 to refer to. It’s good if it’s changed into the formulation of Lemma A.1, covering a broader range of applications such as showing (4) to resolve our initial core growth problem.

Before breaking down its proof, let’s get convinced that \( {m \choose q}{n-m \choose d-q} \) regarded as a function of \( q \in [0, m] \) takes maximum around \( q \simeq \frac{md}{n} = \frac{m}{r} \) given that \( m \ll d \ll n \). This is because \[ {m \choose q+1} = \frac{m-q}{q+1} {m \choose q}, \quad \textrm{and} \quad {n-m \choose d-q-1} =\frac{d-q}{n-m+q+1}{n-m \choose d-q}, \] so \( {m \choose q+1}{n-m \choose d-q-1} = t {m \choose q} {n-m \choose d-q} \) where \[ t= \frac{(m-q)(d-q)}{(q+1)(n-m+q+1)} = \frac{md}{qn} \frac{\left[1-O(q/m) \right]\left[1-O(q/d) \right]}{\left[1 + O(1/q) \right] \left[1 – O(m/n) \right]} \simeq \frac{md}{qn}, \] Here the approximate equality is due to the extra condition \( 1 \ll q \ll m \) added to \( m \ll d \ll n \). So, if \( q > m/r + o(1) \), then \( {m \choose q}{n-m \choose d-q} \) increases when \( q\) is incremented, and decreases if \( q < m/r – o(1) \). (Asymptotic notations here work just as well as 8/11/23. )

If \( q = O(1) \), the value of \(t \) is \( \frac{m}{O(r)} \) so it has the same asymptotic behavior upon the incrementation. If \( q \ge \epsilon m \), then \( {m \choose q} \) increases by less than \( \epsilon^{-1} \), and \( {n-m \choose d-q} \) reduces by an approximate factor \( r^{-1} \) not affecting the asymptotic decrease rate. Therefore, we conclude that \( {m \choose q}{n-m \choose d-q} \) takes maximum at either \( \lfloor m/r \rfloor \) or \( \lceil m/r \rceil \) when \( m \ll d \ll n \): if \(q \le \lfloor m/r \rfloor -1 \) the function still increases, and if \(q \ge \lceil m/r \rceil +1 \), it decreases. There’s a subtlety at \( q= \lceil m/r \rceil \), but you can check that \( q= \lceil m/r \rceil~ \Rightarrow~ t<1 \) by a bit more involved argument, so you know it can’t take maximum at \( q \ge \lceil m/r \rceil +1 \).

As said before, we are analyzing the distribution of the summand \( {m \choose q}{n-m \choose d-q} \) of the Vandermode’s identity \( \sum_{q=0}^m {m \choose q}{n-m \choose d-q} = {n \choose d} \). This is equivalent to the following visualization involving random \(d \)-sets:

It’s good to know that random \( d \)-sets are distributed in \( X \) in the above sense. Each element of \( U \) is chosen almost independently so it gets in \( X’ \) with probability \( \simeq m/n \), and in \( X- X’ \) with probability \( \simeq (n-m)/n \). (Element placements are not completely independent because you can’t place one where it’s already taken. But \(n \) is much larger than \( d \) giving a small difference from \(d \) independent trials.) So a typical \(U \) must have \( \frac{|U \cap X’|}{|U- X’|} \) around \( \frac{m}{n-m} \). This insight allows us to derive (2) on 5/27/24 and Remark 4 on 6/10/24, where we can just consider typical \(U \) with \( m_*/2 < q < 2m_* \) because all others have a negligible occurrence probability.

A lot of statistical distributions, not just combinatorial ones, have a similar tendency that event occurrence probabilities are concentrated in a small range of a random variable. This is often due to the nature of some binomial coefficients.

Here is our formulation of the lemma:

So if the lemma is true, we have (4) last time stepping forward to resolve our problem. We’ll start seeing its proof next time.

Links for mobile devices

Proof of the EGT (4)

Our second preparation to figure out the double mark lemma is about the relationship between the given \( \Gamma \)-condition and double marks. It’s discussed in Remarks D)-F) on p. 5 of Paper S3.

We consider the (m-j)-neighbor pairs (U, V) such that \( U, V \in {\mathcal F} \) and |U – V|= m-j, denoting their family by \( {\mathcal P}_j \). Each \( (U, V) \in {\mathcal P}_j \) creates \( {n – 2m+j \choose l-2m+j} \) double marks (U, V, Y) since \( |U \cup V| = 2m-j\), so $$ |{\mathcal D}| = \sum_{j=0}^m |{\mathcal P}_j| {n – 2m+j \choose l-2m+j} = \sum_{Y \in {X \choose l}} \left| {\mathcal F} \cap {Y \choose m} \right|^2,$$ where \( \left| {\mathcal F} \cap {Y \choose m} \right|^2 \) equals the number of double marks of the l-set Y.

By the given \( \Gamma(b) \)-condition of \({\mathcal F} \) with \(b = 5 \gamma mn/ l\), $$ (3) \qquad |{\mathcal P}_j| < b^{-j} {m \choose j} |{\mathcal F}|^2, \qquad \textrm{for all}~ j \in [m], $$ since for each \( U \in {\mathcal F}\), there are less than \( b^{-j} |{\mathcal F}| \cdot {m \choose j}\) sets \( V \in {\mathcal F}\) such that |V – U|= m-j. By the approximation (2) seen in the 8/4 post, $$ (4) \qquad |{\mathcal D}| < \sum_{j=0}^m b^{-j} {m \choose j} |{\mathcal F}|^2 {n – 2m+j \choose l-2m+j} $$ $$ = \sum_{j=0}^m b^{-j} {m \choose j} {n \choose m}^2 e^{-2 \kappa ({\mathcal F})} {n – 2m+j \choose l-2m+j} $$ $$ \simeq {n \choose m} e^{-2 \kappa ({\mathcal F})} \sum_{j=0}^m {m \choose j} {n-m \choose m-j} {n – 2m+j \choose l-2m+j} \left( \frac{n}{b m} \right)^j. $$ This is the main observation for us trying to find a good upper bound on \( |{\mathcal D}| \).

(3) is derived from the \( \Gamma(b) \)-condition only, so we can even say that it itself is the given constraint of \( {\mathcal F} \) that implies (the claims of) the double mark lemma and Corollary 3.4. In addition, this alone implies the collorary claim: $$ (5) \qquad \sum_{T \in {X \choose j}} |{\mathcal F}[T]|^2 < \left( \frac{m}{b} \right)^j |{\mathcal F}|^2 = \left( \frac{5 \gamma n}{l} \right)^{-j} |{\mathcal F}|^2, \quad \textrm{for all}~ j \in [m], $$ whose LHS is the second moment for j-shadows we mentioned in the 6/22 post.

It is true because \( |{\mathcal P}_j| \le \sum_{T \in {X \choose j}} |{\mathcal F}[T]|^2 \), as each \( |{\mathcal F}[T]|^2 \) counts all the pairs \( (U, V) \in {\mathcal F} \times {\mathcal F} \) such that \( U \cap V \) contains the j-set T. So if (5), $$ |{\mathcal P}_j| < \left( \frac{m}{b} \right)^j |{\mathcal F}|^2 \qquad \textrm{for all}~ j \in [m]. $$ We won’t actually use \( {m \choose j} = (j!)^{-1} \prod_{i=0}^{m-1} (m-i) \) in the proof but just \( {m \choose j} \le m^j \), so the condition is practically the same as (3) for us here. Hence, (5) alone means the double mark lemma and Corollary 2.3. We’ll revisit this topic on the collective \( \Gamma \)-condition (5) of \( {\mathcal F} \) later.

Links for mobile devices