Mathematical Background

Uniformly Most Accurate (UMA)

The standard notion of optimality for 1-sided confidence bounds is that of being uniformly most accurate (UMA). Here we present the UMA condition for a lower confidence bound (Eq. 3.22 of [Lehmann]), and the condition for upper confidence bounds is analagous. For test statistic \(T\), a lower confidence bound \(\underline{p}(T)\) satisfying

\[\mathbb{P}_p [\underline{p}(T) \le p] \ge 1-\alpha \quad \forall p\]

and, amongst all possible lower bounds \(\underline{p}^\prime(T)\), minimizes

\[\mathbb{P}_p [\underline{p}^\prime(T) \le p_0] \quad \forall p_0 < p\]

is a UMA lower confidence bound at confidence level \(1-\alpha\).

Uniformly Most Accurate Unbiased (UMAU)

A UMAU 2-sided confidence interval satisfies the property of unbiasedness (Eq. 5.39 of [Lehmann]):

\[\mathbb{P}_p [\underline{p}(T) \le p \le \overline{p}(T)] \ge \mathbb{P}_p [\underline{p}(T) \le p^\prime \le \overline{p}(T)] \quad \forall p^\prime \neq p\]

and amongst all unbiased confidence intervals is UMA. Being unbiased ensures that true unknown parameter has the highest probability of being included in the interval.

UMA Lower Bound

The background for the UMA lower bound is from [Vincent]. Let \(\textup{bin}(k,n,p)\) and \(\textup{Bin}(k,n,p)\) denote the binomial probability mass function (PMF) and CDF, with \(k\) successes, \(n\) trials, and success probability \(p\). The floor function \(\lfloor x \rfloor\) rounds the argument \(x\) down to the nearest integer. Now, define a test statistic \(T = U + \sum_{i=1}^n X_i\) where \(U \sim \mathcal{U}[0,1]\) and \(X_i\) are the observed successes and failures. Then \(T\) is random with probability density

\[f_p(t) = {n \choose \lfloor t \rfloor} p^{\lfloor t \rfloor} (1-p)^{n - \lfloor t \rfloor}.\]

Then, we can compute the CDF as follows,

\[\begin{split}F_p(t) = \int_0^{t} {n \choose \lfloor x \rfloor} p^{\lfloor x \rfloor} (1-p)^{n - \lfloor x \rfloor} dx \\ = \textup{Bin}(\lfloor t \rfloor - 1, n, p) + (t - \lfloor t \rfloor) \cdot \textup{bin}(\lfloor t \rfloor, n, p).\end{split}\]

\(F_p(t)\) is continuous in both \(t\) and \(p\).

Then, by Corollary 3.5.1 of [Lehmann], we have the UMA lower confidence bound,

\[\begin{split}\underline{p}(t) = \begin{cases} 0 \quad &\text{if } t < 1-\alpha \\ 1 \quad &\text{if } t > n + 1-\alpha \\ p^* \quad &\text{s.t. } F_{p^*}(t) = 1-\alpha \text{ otherwise.} \end{cases}\end{split}\]

That satisfies \(\mathbb{P}[\underline{p} \le p] = 1-\alpha\)

The bisection method can be used to solve \(F_{p^*}(t) = 1-\alpha\), as \(F_p(t)\) is monotonically decreasing in \(p\). Taking limits \(p \rightarrow 0^+\) and \(p \rightarrow 1^-\), one can show that \(F_p(t) = 1-\alpha\) has no solution when \(t<1-\alpha\) and \(t>n+1-\alpha\), we thus set \(\underline{p}\) to either zero or one in these cases. This bound is well-established (see Example 3.5.2 of [Lehmann]). The Clopper-Pearson bound can be obtained by setting \(U = 0\) in the test statistic.

Next, (employing the Ghosh-Pratt identity) we compute expected shortage. A useful identity is

\[\mathbb{P}[\underline{p} \le p_0] = \mathbb{P}[t \le t^*(p_0)]\]

where \(t^*(p_0)\) is the unique value that satisfies

\[F_{p_0}(t^*) = 1-\alpha.\]

Then, we can compute expected shortage as

\[\textup{ES}(p) = \int_0^p \mathbb{P}_p[\underline{p} \le p_0] dp_0 = \int_0^p \mathbb{P}_p[t \le t^*(p_0)] dp_0 = \int_0^p F_p(t^*(p_0)) dp_0.\]

We evaluate this integral standard numerical software.

Next, we find MES via global optimization. Specifically, consider the following reformulation of expected shortage,

\[\textup{ES}(p_1, p_2) = \int_0^{p_1} F_{p_2}(t^*(p_0)) dp_0.\]

In this form, expected shortage is increasing in \(p_1\) and decreasing in \(p_2\); this property is known as mixed monotonicity. Note that for \(p_1 = p_2\) we recover the original function. Mixed monotonic functions can be globally optimized using the branch and bound procedure given in [Matthiesen].

UMA Upper Bound

We construct a UMA upper bound on the probability of success by first computing a UMA lower bound on the probability of failure and then applying the identity

\[\begin{split}\overline{p} = 1 - \underline{q} \\ q = 1 - p\end{split}\]

To construct a UMA lower bound on the probability of failure, \(\underline{q}\), we simply use the procedure mentioned above but with the test statistic counting failures instead of successes.

UMAU 2-Sided Bound

From [Blyth] we know that a parameter value is in the 2-sided confidence interval (\(p_0 \in [\underline{p}, \overline{p}]\)) if and only if

\[n_l + \gamma_l \le K + U \le n_u + \gamma_u\]

where \(K\) is the number of observed successes in \(n\) trials, \(U\) is a uniform random variable (thus the test statistic is \(T = K + U\)), and \(n_l, n_u\) are integers and \(\gamma_l, \gamma_u \in [0,1]\) that uniquely satisfy the relationships:

\[\begin{split}\begin{gather} 0 \le n_l \le n_u \le n+1 \\ \gamma_l = \frac{(n_u - np_0)(\mathbb{P}_{p_0}[n_l \le K \le n_u -1] - (1-\alpha)) - n_l(1-p_0)\mathbb{P}_{p_0}[K=n_l] + n_u(1-p_0)\mathbb{P}_{p_0}[K=n_u]} {(n_u - n_l) \mathbb{P}_{p_0}(K = n_l)} \\ \gamma_u = \frac{(n_l - np_0)(\mathbb{P}_{p_0}[n_l \le K \le n_u -1] - (1-\alpha)) - n_l(1-p_0)\mathbb{P}_{p_0}[K=n_l] + n_u(1-p_0)\mathbb{P}_{p_0}[K=n_u]} {(n_u - n_l) \mathbb{P}_{p_0}(K = n_u)} \end{gather}\end{split}\]

Thus, given a test statistic \(T\), we find \(\underline{p}\) by finding the smallest value of \(p_0\) for which the above conditions have a solution. Similarly, we find \(\overline{p}\) by finding the largest value of \(p_0\) for which the above conditions have a solution.

Then, (employing the Ghosh-Pratt identity), we can compute expected width as

\[\begin{split}\begin{gather} \textup{EW}(p) = \int_0^1 \mathbb{P}_p[\underline{p} \le p_0 \le \overline{p}] dp_0 = \int_0^1 \mathbb{P}_p[n_l + \gamma_l \le t \le n_u + \gamma_u] dp_0 \\ = \int_0^1 F_p(n_u + \gamma_u) - F_p(n_l + \gamma_l) dp_0 \end{gather}\end{split}\]

Similar to expected shortage (and expected excess), expected width also has a mixed monotonic form.

\[\begin{gather} \textup{EW}(p_1, p_2) = \int_0^1 F_{p_2}(n_u + \gamma_u) - F_{p_1}(n_l + \gamma_l) dp_0. \end{gather}\]

Thus, we can compute the maximum expected width (MEW) using mixed monotonic programming.

Clopper-Pearson Lower Bound

In the tradeoff_table.ipynb notebook we compare the MES of the UMA lower bound with that of the Clopper-Pearson lower bound. Computation of MES for the Clopper-Pearson bound can be accomplished in a similar manner to that of the UMA bound. The Clopper-Pearson lower confidence bound is computed as

\[\underline{p}_{cp} = \min \{p \mid \textup{Bin}(k-1, n, p) \ge 1-\alpha \}\]

where \(k\) is the number of successes and \(n\) is the number of trials. It follows that

\[\begin{split}\begin{gather} \underline{p}_{cp} \le p_0 \iff k \le t_{cp}^*(p_0) + 1 \\ t_{cp}^*(p_0) = \max \{t \in [0,1,\ldots,n] \mid \textup{Bin}(t, n, p_0) \le 1-\alpha \}. \end{gather}\end{split}\]

Thus, (employing the Ghosh-Pratt identity) the expected shortage for the Clopper-Pearson lower bound is

\[\textup{ES}_{cp}(p) = \int_0^p \mathbb{P}_p[\underline{p}_{cp} \le p_0] dp_0 = \int_0^p \mathbb{P}_p[k \le t_{cp}^*(p_0)] dp_0 = \int_0^p \textup{Bin}(t_{cp}^*(p_0), n, p) dp_0.\]

It is best to evaluate this integral without integration software because the integrand is piecewise-constant. In addition, the expected shortage for the Clopper-Pearson lower confidence bound also has a mixed monotonic form.

\[\textup{ES}_{cp}(p_1, p_2) = \int_0^{p_1} \textup{Bin}(t_{cp}^*(p_0), n, p_2) dp_0.\]

We can use this form to find the MES for the Clopper-Pearson lower bound, as done in the tradeoff_table.ipynb notebook.

References

[Vincent]

Vincent, Joseph A., et al. “How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation.” arXiv preprint arXiv:2405.05439 (2024).

[Lehmann] (1,2,3,4)
    1. Lehmann and J. P. Romano, Testing Statistical Hypotheses. Springer, 2022, vol. 4.

[Matthiesen]
  1. Matthiesen, C. Hellings, E. A. Jorswieck, and W. Utschick, “Mixed Monotonic Programming for Fast Global Optimization,” IEEE Transactions on Signal Processing, vol. 68, pp. 2529–2544, 2020.

[Blyth]

Blyth, Colin R., and David W. Hutchinson. “Table of Neyman-shortest unbiased confidence intervals for the binomial parameter.” Biometrika 47.3/4 (1960): 381-391.