Ball Divergence (BD) is a novel nonparametric two‐sample statistic that quantifies the discrepancy between two probability measures
and
on a metric space
.[1] It is defined by integrating the squared difference of the measures over all closed balls in
. Let
be the closed ball of radius
centered at
. Equivalently, one may set
and write
. The Ball divergence is then defined by
This measure can be seen as a integral of the Harald Cramér's distance over all possible pairs of points. By summing squared differences of
and
over balls of all scales, BD captures both global and local discrepancies between distributions, yielding a robust, scale-sensitive comparison. Moreover, since BD is defined as the integral of a squared measure difference, it is always non-negative, and
if and only if
.
Testing for equal distributions
Next, we will try to give a sample version of Ball Divergence. For convenience, we can decompose the Ball Divergence into two parts:
and
Thus
Let
denote whether point
locates in the ball
. Given two independent samples
form
and
form
where
means the proportion of samples from the probability measure
located in the ball
and
means the proportion of samples from the probability measure
located in the ball
. Meanwhile,
and
means the proportion of samples from the probability measure
and
located in the ball
. The sample versions of
and
are as follows
Finally, we can give the sample ball divergence
It can be proved that
is a consistent estimator of BD. Moreover, if
for some
, then under the null hypothesis
converges in distribution to a mixture of chi-squared distributions, whereas under the alternative hypothesis it converges to a normal distribution.
Properties
1. The square root of Ball Divergence is a symmetric divergence but not a metric, because it does not satisfy the triangle inequality.
2. It can be shown that Ball divergence, energy distance test[2], and MMD[3] are unified within the variogram framework; for details see Remark 2.4 in [1].
Homogeneity Test
Ball divergence admits a straightforward extension to the K-sample setting. Suppose
are
probability measures on a Banach space
. Define the K-sample BD by
It then follows from Theorems 1 and 2 that
if and only if
By employing closed balls to define a metric distribution function, one obtains an alternative homogeneity measure.[4]
Given a probability measure
on a metric space
, its metric distribution function is defined by
where
is the closed ball of radius
centered at
, and
If
are i.i.d. draws from
, the empirical version is
Based on these, the homogeneity measure based on MDF, also called metric Cramér-von Mises (MCVM) is
where
be their mixture with weights
, and
.
The overall MCVM is then
The empirical MCVM is given by
where
be an i.i.d. sample from
, and
A practical choice for
is the median of the squared distances
References
- ^ a b Pan, Wenliang; Tian, Yuan; Wang, Xueqin; Zhang, Heping (2018-06-01). "Ball Divergence: Nonparametric two sample test". The Annals of Statistics. 46 (3): 1109–1137. doi:10.1214/17-AOS1579. ISSN 0090-5364. PMC 6192286. PMID 30344356.
- ^ Székely, Gábor J.; Rizzo, Maria L. (August 2013). "Energy statistics: A class of statistics based on distances". Journal of Statistical Planning and Inference. 143 (8): 1249–1272. doi:10.1016/j.jspi.2013.03.018. ISSN 0378-3758.
- ^ Gretton, Arthur; Borgwardt, Karsten M.; Rasch, Malte; Schölkopf, Bernhard; Smola, Alexander J. (2007-09-07), "A Kernel Method for the Two-Sample-Problem", Advances in Neural Information Processing Systems 19, The MIT Press, pp. 513–520, doi:10.7551/mitpress/7503.003.0069, hdl:1885/37327, ISBN 978-0-262-25691-9, retrieved 2024-06-28
- ^ Wang, X., Zhu, J., Pan, W., Zhu, J., & Zhang, H. (2023). Nonparametric Statistical Inference via Metric Distribution Function in Metric Spaces. Journal of the American Statistical Association, 119(548), 2772–2784. https://doi.org/10.1080/01621459.2023.2277417