Mathematical Background

Uniformly Most Accurate (UMA)

The standard notion of optimality for 1-sided confidence bounds is that of being uniformly most accurate (UMA). Here we present the UMA condition for a lower confidence bound (Eq. 3.22 of [Lehmann]), and the condition for upper confidence bounds is analagous. For test statistic \(T\), a lower confidence bound \(\underline{p}(T)\) satisfying

\[\mathbb{P}_p [\underline{p}(T) \le p] \ge 1-\alpha \quad \forall p\]

and, amongst all possible lower bounds \(\underline{p}^\prime(T)\), minimizes

\[\mathbb{P}_p [\underline{p}^\prime(T) \le p_0] \quad \forall p_0 < p\]

is a UMA lower confidence bound at confidence level \(1-\alpha\).

Uniformly Most Accurate Unbiased (UMAU)

A UMAU 2-sided confidence interval satisfies the property of unbiasedness (Eq. 5.39 of [Lehmann]):

\[\mathbb{P}_p [\underline{p}(T) \le p \le \overline{p}(T)] \ge \mathbb{P}_p [\underline{p}(T) \le p^\prime \le \overline{p}(T)] \quad \forall p^\prime \neq p\]

and amongst all unbiased confidence intervals is UMA. Being unbiased ensures that true unknown parameter has the highest probability of being included in the interval.

UMA Lower Bound

The background for the UMA lower bound is from [Vincent]. Let \(\textup{bin}(k,n,p)\) and \(\textup{Bin}(k,n,p)\) denote the binomial probability mass function (PMF) and CDF, with \(k\) successes, \(n\) trials, and success probability \(p\). The floor function \(\lfloor x \rfloor\) rounds the argument \(x\) down to the nearest integer. Now, define a test statistic \(T = U + \sum_{i=1}^n X_i\) where \(U \sim \mathcal{U}[0,1]\) and \(X_i\) are the observed successes and failures. Then \(T\) is random with probability density

\[f_p(t) = {n \choose \lfloor t \rfloor} p^{\lfloor t \rfloor} (1-p)^{n - \lfloor t \rfloor}.\]

Then, we can compute the CDF as follows,

\[\begin{split}F_p(t) = \int_0^{t} {n \choose \lfloor x \rfloor} p^{\lfloor x \rfloor} (1-p)^{n - \lfloor x \rfloor} dx \\ = \textup{Bin}(\lfloor t \rfloor - 1, n, p) + (t - \lfloor t \rfloor) \cdot \textup{bin}(\lfloor t \rfloor, n, p).\end{split}\]

\(F_p(t)\) is continuous in both \(t\) and \(p\).

Then, by Corollary 3.5.1 of [Lehmann], we have the UMA lower confidence bound,

\[\begin{split}\underline{p}(t) = \begin{cases} 0 \quad &\text{if } t < 1-\alpha \\ 1 \quad &\text{if } t > n + 1-\alpha \\ p^* \quad &\text{s.t. } F_{p^*}(t) = 1-\alpha \text{ otherwise.} \end{cases}\end{split}\]

That satisfies \(\mathbb{P}[\underline{p} \le p] = 1-\alpha\)

The bisection method can be used to solve \(F_{p^*}(t) = 1-\alpha\), as \(F_p(t)\) is monotonically decreasing in \(p\). Taking limits \(p \rightarrow 0^+\) and \(p \rightarrow 1^-\), one can show that \(F_p(t) = 1-\alpha\) has no solution when \(t<1-\alpha\) and \(t>n+1-\alpha\), we thus set \(\underline{p}\) to either zero or one in these cases. This bound is well-established (see Example 3.5.2 of [Lehmann]). The Clopper-Pearson bound can be obtained by setting \(U = 0\) in the test statistic.

Next, (employing the Ghosh-Pratt identity) we compute expected shortage. A useful identity is

\[\mathbb{P}[\underline{p} \le p_0] = \mathbb{P}[t \le t^*(p_0)]\]

where \(t^*(p_0)\) is the unique value that satisfies

\[F_{p_0}(t^*) = 1-\alpha.\]

Then, we can compute expected shortage as

\[\textup{ES}(p) = \int_0^p \mathbb{P}_p[\underline{p} \le p_0] dp_0 = \int_0^p \mathbb{P}_p[t \le t^*(p_0)] dp_0 = \int_0^p F_p(t^*(p_0)) dp_0.\]

We evaluate this integral standard numerical software.

Next, we find MES via global optimization. Specifically, consider the following reformulation of expected shortage,

\[\textup{ES}(p_1, p_2) = \int_0^{p_1} F_{p_2}(t^*(p_0)) dp_0.\]

In this form, expected shortage is increasing in \(p_1\) and decreasing in \(p_2\); this property is known as mixed monotonicity. Note that for \(p_1 = p_2\) we recover the original function. Mixed monotonic functions can be globally optimized using the branch and bound procedure given in [Matthiesen].

UMA Upper Bound

We construct a UMA upper bound on the probability of success by first computing a UMA lower bound on the probability of failure and then applying the identity

\[\begin{split}\overline{p} = 1 - \underline{q} \\ q = 1 - p\end{split}\]

To construct a UMA lower bound on the probability of failure, \(\underline{q}\), we simply use the procedure mentioned above but with the test statistic counting failures instead of successes.

UMAU 2-Sided Bound

From [Blyth] we know that a parameter value is in the 2-sided confidence interval (\(p_0 \in [\underline{p}, \overline{p}]\)) if and only if

\[n_l + \gamma_l \le K + U \le n_u + \gamma_u\]

where \(K\) is the number of observed successes in \(n\) trials, \(U\) is a uniform random variable (thus the test statistic is \(T = K + U\)), and \(n_l, n_u\) are integers and \(\gamma_l, \gamma_u \in [0,1]\) that uniquely satisfy the relationships:

\[\begin{split}\begin{gather} 0 \le n_l \le n_u \le n+1 \\ \gamma_l = \frac{(n_u - np_0)(\mathbb{P}_{p_0}[n_l \le K \le n_u -1] - (1-\alpha)) - n_l(1-p_0)\mathbb{P}_{p_0}[K=n_l] + n_u(1-p_0)\mathbb{P}_{p_0}[K=n_u]} {(n_u - n_l) \mathbb{P}_{p_0}(K = n_l)} \\ \gamma_u = \frac{(n_l - np_0)(\mathbb{P}_{p_0}[n_l \le K \le n_u -1] - (1-\alpha)) - n_l(1-p_0)\mathbb{P}_{p_0}[K=n_l] + n_u(1-p_0)\mathbb{P}_{p_0}[K=n_u]} {(n_u - n_l) \mathbb{P}_{p_0}(K = n_u)} \end{gather}\end{split}\]

Thus, given a test statistic \(T\), we find \(\underline{p}\) by finding the smallest value of \(p_0\) for which the above conditions have a solution. Similarly, we find \(\overline{p}\) by finding the largest value of \(p_0\) for which the above conditions have a solution.

Then, (employing the Ghosh-Pratt identity), we can compute expected width as

\[\begin{split}\begin{gather} \textup{EW}(p) = \int_0^1 \mathbb{P}_p[\underline{p} \le p_0 \le \overline{p}] dp_0 = \int_0^1 \mathbb{P}_p[n_l + \gamma_l \le t \le n_u + \gamma_u] dp_0 \\ = \int_0^1 F_p(n_u + \gamma_u) - F_p(n_l + \gamma_l) dp_0 \end{gather}\end{split}\]

Similar to expected shortage (and expected excess), expected width also has a mixed monotonic form.

\[\begin{gather} \textup{EW}(p_1, p_2) = \int_0^1 F_{p_2}(n_u + \gamma_u) - F_{p_1}(n_l + \gamma_l) dp_0. \end{gather}\]

Thus, we can compute the maximum expected width (MEW) using mixed monotonic programming.

Clopper-Pearson Lower Bound

In the tradeoff_table.ipynb notebook we compare the MES of the UMA lower bound with that of the Clopper-Pearson lower bound. Computation of MES for the Clopper-Pearson bound can be accomplished in a similar manner to that of the UMA bound. The Clopper-Pearson lower confidence bound is computed as

\[\underline{p}_{cp} = \min \{p \mid \textup{Bin}(k-1, n, p) \ge 1-\alpha \}\]

where \(k\) is the number of successes and \(n\) is the number of trials. It follows that

\[\begin{split}\begin{gather} \underline{p}_{cp} \le p_0 \iff k \le t_{cp}^*(p_0) + 1 \\ t_{cp}^*(p_0) = \max \{t \in [0,1,\ldots,n] \mid \textup{Bin}(t, n, p_0) \le 1-\alpha \}. \end{gather}\end{split}\]

Thus, (employing the Ghosh-Pratt identity) the expected shortage for the Clopper-Pearson lower bound is

\[\textup{ES}_{cp}(p) = \int_0^p \mathbb{P}_p[\underline{p}_{cp} \le p_0] dp_0 = \int_0^p \mathbb{P}_p[k \le t_{cp}^*(p_0)] dp_0 = \int_0^p \textup{Bin}(t_{cp}^*(p_0), n, p) dp_0.\]

It is best to evaluate this integral without integration software because the integrand is piecewise-constant. In addition, the expected shortage for the Clopper-Pearson lower confidence bound also has a mixed monotonic form.

\[\textup{ES}_{cp}(p_1, p_2) = \int_0^{p_1} \textup{Bin}(t_{cp}^*(p_0), n, p_2) dp_0.\]

We can use this form to find the MES for the Clopper-Pearson lower bound, as done in the tradeoff_table.ipynb notebook.

References

[Vincent]

Vincent, Joseph A., et al. “How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation.” arXiv preprint arXiv:2405.05439 (2024).

[Lehmann] (1,2,3,4)

1. Lehmann and J. P. Romano, Testing Statistical Hypotheses. Springer, 2022, vol. 4.

[Matthiesen]

Matthiesen, C. Hellings, E. A. Jorswieck, and W. Utschick, “Mixed Monotonic Programming for Fast Global Optimization,” IEEE Transactions on Signal Processing, vol. 68, pp. 2529–2544, 2020.

[Blyth]

Blyth, Colin R., and David W. Hutchinson. “Table of Neyman-shortest unbiased confidence intervals for the binomial parameter.” Biometrika 47.3/4 (1960): 381-391.