meta-analysis.io

Statistical methods reference

Every statistical method, estimator, test, and visualization implemented in meta-analysis.io, with the exact formulas used, defaults, and primary references. This page is intended to be citable by users and reviewable by journal editors.

Every method described here is covered by an internal automated regression suite that runs before each release.

1. Effect size measures

Each study contributes an effect size y_i and its sampling variance v_i. The form of (y, v) depends on the outcome type.

Binary outcomes (2×2 tables)

Log odds ratio — default for binary outcomes.

yi = log((a·d)/(b·c))
vi = 1/a + 1/b + 1/c + 1/d

where a, b, c, d are treatment and control cells.
Continuity correction applied per §4 when any cell is zero.

Log risk ratio

yi = log(p1/p2)
vi = (1 − p1)/e1 + (1 − p2)/e2,   with p1 = e1/n1, p2 = e2/n2

Risk difference

yi = p1 − p2
vi = p1(1 − p1)/n1 + p2(1 − p2)/n2    (no continuity correction)

Peto odds ratio — for very rare events; uses the hypergeometric distribution. No continuity correction (by design).

O = e1
E = n1 · (e1 + e2) / N
V = n1 · n2 · (e1 + e2) · (N − e1 − e2) / (N² · (N − 1))
yi = (O − E) / V    (log Peto OR approximation)
vi = 1 / V

Deeks JJ. Issues in the selection of a summary statistic for meta-analysis of clinical trials with binary outcomes. Statistics in Medicine 2002.
Cochrane Handbook for Systematic Reviews of Interventions, §10.

Continuous outcomes

Mean difference (MD)

yi = m1 − m2
vi = s1²/n1 + s2²/n2

Standardized mean difference (SMD) — three variants

The app supports three SMD calculations. Default is Hedges' g, which matches metafor::escalc(measure="SMD") and is the recommended convention for meta-analysis. The variance/SE formula differs across all three — not just the point estimate — so the choice propagates through the inverse-variance weights.

pooled SD = sqrt( ((n1-1)·s1² + (n2-1)·s2²) / (n1+n2-2) )
df = n1 + n2 - 2

Hedges' g  (default; bias-corrected)
  J = 1 - 3/(4·df - 1)
  g = (m1 - m2) / pooled SD · J
  V(g) = 1/n1 + 1/n2 + g² / (2·(n1+n2))
  Reference: Hedges & Olkin 1985; metafor::escalc(measure="SMD")

Cohen's d  (uncorrected)
  d = (m1 - m2) / pooled SD
  V(d) = 1/n1 + 1/n2 + d² / (2·(n1+n2))
  Reference: Cohen 1988; Borenstein 2009 §4.20.
  Inflates effects in small studies — included for legacy / teaching use.

Glass's Δ  (control SD only)
  Δ = (m1 - m2) / s_control     (arm 2 = control)
  V(Δ) = 1/n1 + 1/n2 + Δ² / (2·(n2 - 1))
  Reference: Hedges 1981 Eq 6; Hedges & Olkin 1985 §5.5 Eq 30.
  Use when the treatment plausibly alters within-arm variance, so the
  pooled SD is itself a function of the effect.

Two implementation notes worth disclosing in your manuscript when relevant:

Hedges' small-sample correction J. The exact form is J(df) = Γ(df/2) / [√(df/2)·Γ((df−1)/2)]. We use the standard large-df approximation J(df) ≈ 1 − 3/(4·df − 1), which agrees with the exact form to ~10⁻⁴ at df = 38 (n₁ = n₂ = 20) and tightens further as df grows. Same approximation used by metafor.
Glass's Δ variance assumes σ_E = σ_C. The analytical V(Δ) above is derived under equal within-arm variances, even though the very motivation for choosing Glass's Δ is that the treatment may have altered within-arm variance. The heteroscedastic case has no closed-form variance — Bonett's SMDH or non-parametric bootstrap would be the proper fix. The formula shipped here matches metafor and every other major package; it is a known limitation of the analytical SE under exactly the conditions where Glass's Δ is most warranted. Document the assumption when reporting.

Ratio of means (ROM) — log-scale analysis; requires positive means.

yi = log(m1 / m2)
vi = (CV1)²/n1 + (CV2)²/n2,    CV = SD/mean    (delta method)

Borenstein M et al. Introduction to Meta-Analysis. Wiley 2009. Eq. 4.24.
Friedrich JO et al. Ratio of means for analyzing continuous outcomes in meta-analysis performed as well as mean difference methods. J Clin Epidemiol 2011.

Proportions

Four transformations are available; Freeman-Tukey double arcsine is the default.

PFT (Freeman-Tukey double arcsine):
  yi = arcsin(sqrt(x / (n+1))) + arcsin(sqrt((x+1) / (n+1)))
  vi = 1 / (n + 0.5)
  Back-transform (Miller 1978): requires harmonic mean of Ns.

PLO  (logit):    yi = log(p/(1-p)),  vi = 1/e + 1/(n-e)    (CC when e ∈ {0, n})
PAS  (arcsine):  yi = arcsin(sqrt(p)),  vi = 1/(4n)
PRAW (raw):      yi = p,  vi = p(1-p)/n

Freeman MF, Tukey JW. Transformations related to the angular and the square root. Ann Math Stat 1950.
Miller JJ. The inverse of the Freeman-Tukey double arcsine transformation. The American Statistician 1978.
Barendregt JJ et al. Meta-analysis of prevalence. J Epidemiol Community Health 2013.

Correlation

Fisher's z transformation for stable variance.

ZCOR: yi = 0.5 · log((1 + r)/(1 - r)),  vi = 1/(n - 3)
COR:  yi = r,  vi = (1 - r²)² / (n - 1)    (Olkin & Pratt)

Diagnostic test accuracy (DTA)

From the 2×2 table (TP, FP, FN, TN):

logit sensitivity:  yi = log(Sens/(1-Sens)),    vi = 1/TP + 1/FN   (with CC)
logit specificity:  yi = log(Spec/(1-Spec)),    vi = 1/TN + 1/FP   (with CC)
log DOR:            yi = log((TP·TN)/(FP·FN)),  vi = 1/TP + 1/FP + 1/FN + 1/TN

Hazard ratio (IV)

Converts reported HR with 95% CI to the log scale.

yi = log(HR)
SE = (log(CI_upper) - log(CI_lower)) / (2 · z_0.975)
vi = SE²

Generic inverse variance

Any effect measure with pre-computed effectSize and standardError. Used when you already have (y, v) from an external source (e.g., adjusted regression coefficients).

2. Pooling & heterogeneity

Inverse-variance pooling

Fixed effect:     w_i = 1/v_i
Random effects:   w_i = 1/(v_i + tau²)
Pooled estimate:  mu_hat = sum(w_i · y_i) / sum(w_i)
Pooled SE:        sqrt(1 / sum(w_i))

Heterogeneity estimators for τ²

All eight of the commonly-cited estimators are implemented. REML is the default and is the ML-consistent choice recommended by Veroniki et al. (2016).

DerSimonian-Laird (DL):
  tau²_DL = max(0, (Q - (k-1)) / C),    C = sum(w_i) - sum(w_i²)/sum(w_i),
  where w_i are fixed-effect weights.

REML (restricted ML, Fisher scoring):
  Iterate tau²_{n+1} = max(0, tau²_n + score / fisherInfo) where
    score      = sum(w_i² (y_i - mu_hat)²) - sum(w_i) + sum(w_i²)/sum(w_i)
    fisherInfo = tr(P²) = sum(w²) - 2·sum(w³)/sum(w) + (sum(w²))²/(sum(w))²

ML (full maximum likelihood, Fisher scoring):
  Same Fisher information as REML, but score drops the +sum(w²)/sum(w) term.
  Biased downward relative to REML under small k.

PM (Paule-Mandel):
  Bisect until Q(tau²) = k - 1, where Q uses random-effects weights.
  Q is monotone decreasing in tau², so bisection is robust.

EB (empirical Bayes; Morris 1983):
  Iterate tau²_{n+1} = max(0, tau² + (Q - (k-1)) · k / ((k-1) · C)).
  Same target as PM, different update rule; converges to the same value.

SJ (Sidik-Jonkman):
  Closed form using a preliminary tau²_0 = sample variance of y_i:
  tau² = sum( w_i* · (y_i - y_bar*)² ) / (k - 1),   w_i* = 1/(v_i + tau²_0)

HE (Hedges & Olkin):
  tau² = (1/(k-1)) · sum( (y_i - y_bar)² ) - (1/k) · sum(v_i)

HS (Hunter-Schmidt):
  tau² = max(0, (Q - k) / sum(w_i))      w_i = 1/v_i (fixed-effect weights)

DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials 1986.
Paule RC, Mandel J. Consensus values and weighting factors. J Res Natl Bur Stand 1982.
Morris CN. Parametric empirical Bayes inference: theory and applications. JASA 1983.
Sidik K, Jonkman JN. A comparison of heterogeneity variance estimators in combining results of studies. Stat Med 2007.
Hedges LV, Olkin I. Statistical Methods for Meta-Analysis. Academic Press 1985.
Hunter JE, Schmidt FL. Methods of Meta-Analysis: Correcting Error and Bias in Research Findings. Sage 2004.
Viechtbauer W. Bias and efficiency of meta-analytic variance estimators in the random-effects model. J Educ Behav Stat 2005.
Veroniki AA et al. Methods to estimate the between-study variance and its uncertainty in meta-analysis. Res Synth Methods 2016.

Heterogeneity statistics

Q  = sum(w_i · (y_i - y_bar_FE)²)    (uses fixed-effect weights)
df = k - 1
p  = 1 - CDF_chi²(Q, df)    (regularized lower incomplete gamma)
I² = max(0, 100 · (Q - df) / Q)    (Higgins & Thompson 2002)
H² = Q / df
tau² (from DL or REML),  tau = sqrt(tau²)

Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Statistics in Medicine 2002.

3. Inference — CIs, p-values, HKSJ, prediction intervals

Default inference

z = mu_hat / SE(mu_hat)
p-value = 2 · (1 - Phi(|z|))
CI = mu_hat ± z_{1-α/2} · SE(mu_hat)

Knapp-Hartung-Sidik-Jonkman (HKSJ)

Robust inference for random-effects meta-analysis, particularly under small k or severe heterogeneity. Applied when the user enables knha in settings.

SE_HKSJ = sqrt( (1/(k-1)) · sum(w_i · (y_i - mu_hat)²) / sum(w_i) )
Use t-distribution with k-1 df:
  CI = mu_hat ± t_{k-1, 1-α/2} · SE_HKSJ
  p  = 2 · (1 - F_t(|mu_hat/SE_HKSJ|; k-1))

Hartung J, Knapp G. A refined method for the meta-analysis of controlled clinical trials with binary outcome. Statistics in Medicine 2001.
Sidik K, Jonkman JN. A simple confidence interval for meta-analysis. Statistics in Medicine 2002.
IntHout J, Ioannidis JPA, Borm GF. The HKSJ method for random-effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Med Res Methodol 2014.

Prediction interval (random-effects)

PI = mu_hat ± t_{k-2, 1-α/2} · sqrt(SE² + tau²)    (IntHout et al. 2016)
Computed when model = random, PI is requested, and k ≥ 3.

IntHout J et al. Plea for routinely presenting prediction intervals in meta-analysis. BMJ Open 2016.

Numerical distributions

All distribution functions ship inside the engine (no runtime R dependency):

qnorm:  Wichura (1988) AS-241 rational approximation, double precision.
pnorm:  Abramowitz & Stegun 7.1.26 error-function expansion.
pchisq: regularized lower incomplete gamma P(a,x) = gamma(a,x)/Gamma(a),
        via series + continued fraction (Numerical Recipes-style).
pt:     incomplete beta function via Lentz's continued fraction.
qt:     Newton-Raphson on pt starting from qnorm approximation.

4. Zero-cell continuity correction

For 2×2 outcomes (OR, RR), four user-selectable strategies are supported when a cell contains 0 events (Sweeting, Sutton & Lambert 2004).

'constant'       (default, Haldane):  add cc (0.5) to ALL four cells when any cell is zero.

'treatment-arm' (reciprocal rule):   cc_1 = 2·cc·n_2/(n_1+n_2),  cc_2 = 2·cc·n_1/(n_1+n_2)
                                     → respects group-size imbalance; equals 'constant' if balanced.

'empirical'     (data-adaptive):     cc_1 = K_hat·R/(R·(1+K_hat)),  cc_2 = K_hat/(R·(1+K_hat)),
                                     where R = n_1/n_2 and K_hat is a preliminary fixed-effect
                                     pooled OR from zero-cell-free studies.

'none'          (drop):              no correction. Zero-cell studies yield NaN and are
                                     excluded from the pool.

Sweeting MJ, Sutton AJ, Lambert PC. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Statistics in Medicine 2004.

5. Sensitivity analyses

Leave-one-out (LOO)

Re-runs the full meta-analysis with each study excluded in turn. Reports changeInEstimate and changePercent; studies with |change%| > 10% are flagged as influential.

Cumulative meta-analysis

Sequential pooling as each additional study is added (default ordering by year; also by precision or effect). Useful for detecting when evidence becomes conclusive.

Subgroup analysis

Pool each subgroup separately under the chosen model → (mu_hat_g, SE_g)

Test for subgroup differences (Borenstein et al. 2009, Ch. 19):
  w_g            = 1 / SE_g²
  mu_hat_grand   = sum(w_g · mu_hat_g) / sum(w_g)
  Q_between      = sum( w_g · (mu_hat_g − mu_hat_grand)² )
  df_between     = (number of subgroups) − 1    (incl. single-study groups)
  p              = CDF_chi²(Q_between, df_between)

Uses each subgroup's SE under the active pooling model (RE under random-effects, HKSJ-adjusted if HKSJ is enabled). This matches RevMan and the standard metafor/meta byvar workflow. The naïve identity Q_total − Σ Q_within only equals Q_between under fixed-effects and drifts under random-effects when within-group τ² > 0. Single-study subgroups ARE included in the df and on the forest plot (they contribute their point estimate without a diamond); dropping them — a known bug in several tools — biases the interaction test.

6. Influence diagnostics

All diagnostics follow Viechtbauer & Cheung (2010) for meta-analysis, adapted from the standard regression diagnostics.

Per-study measures

Hat value (leverage):  h_i = w_i / sum(w_j)    (random-effects weights)
Studentized residual:  r_i = (y_i - mu_hat) / sqrt( v_i + tau² - h_i · (v_i + tau²) )
Cook's distance:       D_i = (mu_hat - mu_hat_(-i))² / Var(mu_hat)
DFBETAS:                (mu_hat - mu_hat_(-i)) / SE(mu_hat_(-i))    (standardized by LOO SE)
DFFITS:                 r_i · sqrt( h_i / (1 - h_i) )
COVRATIO:               Var(mu_hat_(-i)) / Var(mu_hat)

Flagging thresholds: Cook's D > 4/k, leverage > 2/k, |DFBETAS| > 2/sqrt(k), |studentized residual| > 2.

Baujat plot

x = heterogeneity contribution = 100 · w_i · (y_i - y_bar_FE)² / Q
y = influence on pooled estimate = (mu_hat - mu_hat_(-i))² / Var(mu_hat)

GOSH (Graphical display of study heterogeneity)

Fits many random subsets of studies; for each subset plots (mu_hat, I²). Used to detect multimodal or clustered heterogeneity structures.

Viechtbauer W, Cheung MW-L. Outlier and influence diagnostics for meta-analysis. Res Synth Methods 2010.
Baujat B et al. A graphical method for exploring heterogeneity in meta-analyses. Stat Med 2002.
Olkin I, Dahabreh IJ, Trikalinos TA. GOSH — a graphical display of study heterogeneity. Res Synth Methods 2012.

7. Publication & small-study bias

Asymmetry tests

Egger:           Regress y_i/SE_i on 1/SE_i; intercept ≠ 0 indicates asymmetry.
Begg:            Kendall's tau-b rank correlation between standardized effect and variance.
Peters:          Logistic regression of log(OR) on 1/n (for binary outcomes).
Harbord:         Score test; reduces anti-conservatism of Egger for sparse binary data.
Rücker:          Arcsine-transformed rank correlation for binary outcomes.
Thompson-Sharp:  Effect regressed on variance with precision weighting.
Deeks:           ESS-based funnel asymmetry test for DTA (log DOR vs 1/sqrt(ESS)).

Trim-and-fill (Duval & Tweedie)

L₀+ estimator with reflection: iteratively identifies and imputes potentially missing studies on the under-represented side of the funnel, then recomputes the adjusted pooled estimate.

Fail-safe N

Rosenthal's formula for the number of null-result studies needed to overturn the finding.

PET-PEESE

Two-step regression: PET (Precision-Effect Test) as a significance test for a nonzero effect after accounting for small-study effects; PEESE as the adjusted estimator when PET indicates a real effect.

P-value methods

P-curve (Simonsohn et al. 2014): tests for evidential value and p-hacking
                 by examining the distribution of significant p-values.
P-uniform (van Assen et al. 2015): conditional p-values for bias-corrected
                 effect estimate and test for publication bias.
Z-curve (Brunner & Schimmack 2020): EM-based estimator of mean observed
                 power and replication rate from the Z-value distribution.

Selection model

Step-function Vevea-Hedges selection model (simplified implementation). Estimates the selection parameter δ — the relative probability of non-significant vs significant studies being observed.

Egger M et al. Bias in meta-analysis detected by a simple, graphical test. BMJ 1997.
Begg CB, Mazumdar M. Operating characteristics of a rank correlation test for publication bias. Biometrics 1994.
Peters JL et al. Comparison of two methods to detect publication bias in meta-analysis. JAMA 2006.
Harbord RM, Egger M, Sterne JA. A modified test for small-study effects in meta-analyses of controlled trials with binary endpoints. Stat Med 2006.
Rücker G et al. Arcsine test for publication bias in meta-analyses with binary outcomes. Stat Med 2008.
Deeks JJ, Macaskill P, Irwig L. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol 2005.
Duval S, Tweedie R. Trim and fill: a simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics 2000.
Stanley TD, Doucouliagos H. Meta-regression approximations to reduce publication selection bias. Res Synth Methods 2014. [PET-PEESE]
Simonsohn U, Nelson LD, Simmons JP. P-curve: a key to the file-drawer. JEP:General 2014.
van Assen MALM, van Aert RCM, Wicherts JM. Meta-analysis using effect size distributions of only statistically significant studies. Psychol Methods 2015. [P-uniform]
Brunner J, Schimmack U. Estimating population mean power under conditions of heterogeneity and selection for significance. Meta-Psychology 2020. [Z-curve]
Vevea JL, Hedges LV. A general linear model for estimating effect size in the presence of publication bias. Psychometrika 1995.

8. Meta-regression

Weighted least-squares regression of study effect sizes on one or more moderators. Reports slope, intercept, weighted R², Q-tests for model (Q_M) and residual heterogeneity (Q_E). Inference uses pt and pchisq from the engine — the same regularized gamma / incomplete beta implementations used elsewhere.

slope     = (sum(w)·sum(wXY) - sum(wX)·sum(wY)) / (sum(w)·sum(wX²) - (sum(wX))²)
intercept = (sum(wY) - slope · sum(wX)) / sum(w)
MSE       = SS_residual / (n - p)
SE(slope) = sqrt( MSE · sum(w) / (sum(w)·sum(wX²) - (sum(wX))²) )
R²        = 1 - SS_residual / SS_total

9. Diagnostic test accuracy (DTA)

Univariate pooling

Logit-scale inverse-variance meta-analysis for sensitivity, specificity, and log DOR separately. Back-transformed via the inverse logit, with likelihood ratios derived from the pooled Sens/Spec and CIs from the delta method.

Pooled LR+ = Sens / (1 - Spec)
Pooled LR- = (1 - Sens) / Spec
Var(log LR+) = (1-Sens)² · Var(logit Sens) + Spec² · Var(logit Spec)    (delta method)
Var(log LR-) = Sens²     · Var(logit Sens) + (1-Spec)² · Var(logit Spec)

Bivariate random-effects model (Reitsma 2005)

Joint modeling of (logit Sens, logit Spec) accounting for within- and between-study covariance. Iterative reweighted least squares alternating GLS for mu_hat and method-of-moments updates for Sigma (diagonal τ²_s, τ²_sp; off-diagonal ρ·τ_s·τ_sp). All five parameters update jointly; converges to tol = 1e-7.

y_i = (logit Sens_i, logit Spec_i)' ~ N(mu, S_i + Sigma)
GLS: mu_hat = (sum W_i)^-1 · sum(W_i · y_i),    with W_i = (S_i + Sigma)^-1
MoM updates for tau²_s, tau²_sp (generalized DL), and rho via weighted Pearson on residuals.

SROC (Reitsma):
  E[logit Sens | logit Spec] = mu_s + (rho · tau_s / tau_sp) · (logit Spec - mu_sp)
Confidence ellipse: 95% region from eigendecomposition of V(mu_hat);  chi²_{2, 0.95} = 5.991
Prediction ellipse: 95% region for a new study, using V(mu_hat) + Sigma

Reitsma JB et al. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol 2005.
Chu H, Cole SR. Bivariate meta-analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach. J Clin Epidemiol 2006.

DTA-specific visualizations

SROC plot with summary point, confidence & prediction ellipses
Fagan nomogram (pre-test prob × LR → post-test prob)
LR scattergram with diagnostic utility zones
ROC ellipse, coupled forest, Deeks funnel

10. Network meta-analysis (NMA)

Frequentist weighted least squares (Rücker 2012)

beta_hat = (X'WX)^-1 · X'W y
V(beta_hat) = (X'WX)^-1    ← exposed as NMAResults.networkCovariance
tau² estimated via Jackson's empirical Bayes.

Monte Carlo SUCRA (Salanti et al. 2011)

Proper multivariate sampling from the network estimates using Cholesky decomposition of networkCovariance. 10,000 draws with a seeded xorshift + Box-Muller RNG for reproducibility.

theta = mu_hat + L · z,    L · L' = Sigma,    z ~ N(0, I)
For each draw, rank treatments; accumulate rank frequencies.
SUCRA_i = (1/(n-1)) · sum_{k=1..n-1} sum_{r=1..k} P(rank_i = r)

Reference treatment (if any) has effect ≡ 0 and participates in ranking.

Rücker & Schwarzer P-score

P-score_i = (1/(n-1)) · sum_{j ≠ i} P(treatment i better than j)
          = (1/(n-1)) · sum_{j ≠ i} Phi( (mu_hat_i - mu_hat_j) / sqrt(SE_i² + SE_j²) )

Reported alongside SUCRA; the two are asymptotically equivalent under normality.

Inconsistency testing

Global inconsistency: Q_inc from design-by-treatment interaction. Per-comparison node-splitting (Dias et al. 2010) via implied indirect estimate:

Given direct (d_d, v_d) and network (d_n, v_n):
  v_i = 1 / (1/v_n - 1/v_d),    d_i = v_i · (d_n/v_n - d_d/v_d)
  z   = (d_d - d_i) / sqrt(v_d + v_i),    p = 2·(1 - Phi(|z|))

Reference-standard tracking (DTA NMA)

For DTA NMA, the engine tallies which reference standard each study used and flags mixed: true when multiple standards appear. Results across reference standards are biased; the UI emits a warning and an optional restrictToReferenceStandard argument filters the network to one standard.

Rücker G. Network meta-analysis, electrical networks and graph theory. Res Synth Methods 2012.
Salanti G, Ades AE, Ioannidis JPA. Graphical methods and numerical summaries for presenting results from multiple-treatment meta-analysis: an overview and tutorial. J Clin Epidemiol 2011.
Rücker G, Schwarzer G. Ranking treatments in frequentist network meta-analysis works without resampling methods. BMC Med Res Methodol 2015.
Dias S et al. Checking consistency in mixed treatment comparison meta-analysis. Stat Med 2010.
Nyaga VN et al. ANOVA model for network meta-analysis of diagnostic test accuracy data. Stat Methods Med Res 2018.

11. Trial sequential analysis (TSA)

Required Information Size (RIS):
  baseIS  = 4 · ((z_alpha + z_beta) / delta)²
  adjusted = baseIS / (1 - D²),    where D² = I²    (diversity adjustment, Wetterslev 2009)

Boundaries:
  O'Brien-Fleming:  alpha spent = 2 · (1 - Phi(z_{alpha/2} / sqrt(t)))
  Pocock:           alpha spent = alpha · log(1 + (e-1)·t)
  Haybittle-Peto:   fixed z = 3.29

Z-curve: cumulative z-score from the running REML meta-analysis as each
         study is added (ordered by year). Crossings of the efficacy,
         futility, or harm boundaries trigger stopping.

Brok J, Thorlund K, Gluud C, Wetterslev J. Trial sequential analysis reveals insufficient information size and potentially false positive results. J Clin Epidemiol 2008.
Wetterslev J, Thorlund K, Brok J, Gluud C. Estimating required information size by quantifying diversity in random-effects model meta-analyses. BMC Med Res Methodol 2009.

12. Visualizations

All plots are rendered natively as SVG and exportable to PNG/SVG.

Classic

Forest plot (study-level + pooled diamond, FE and RE summaries), subgroup forest, cumulative forest, leave-one-out forest, funnel plot with contour enhancement, L'Abbé plot, Galbraith plot.

Modern & enhanced

Sunset (power-enhanced) funnel — contours of statistical power
Albatross plot — p-value vs sample size with effect-size contours (correct formulas: n = 4(z/d)² for SMD, n = 16(z/logOR)² for OR/RR, n = (z/atanh(r))² + 3 for correlation)
P-curve, Z-curve, drapery — p-value distribution methods
Baujat, GOSH — heterogeneity exploration
Heterogeneity pie, sample-size histogram

DTA-specific

SROC plot (with confidence + prediction ellipses), Fagan nomogram, LR scattergram, ROC ellipse, coupled forest, Deeks funnel.

NMA-specific

Network plot, league table, rankogram (from Monte Carlo rank probabilities), SUCRA bar chart, net heat plot, node-splitting comparison, contribution matrix.

Reporting

PRISMA 2020 flow diagram, risk-of-bias summary (QUADAS-2 / RoB2 traffic-lights).

13. Exports & reproducibility

DOCX and PDF manuscript exports with methods narrative
R code export that reproduces the analysis using metafor / meta / netmeta
CSV and RevMan import for study data
Monte Carlo routines use a seeded xorshift + Box-Muller RNG — rerunning any analysis gives byte-identical results.
Every engine function is pure TypeScript with no runtime R/Python dependency; the full pipeline executes in the browser.

14. Validation

Every estimator, test, and back-transformation documented above is covered by an internal automated regression suite. Each release is required to reproduce expected outputs within tight numerical tolerance before it ships, across a range of scenarios including analytical-truth edge cases (homogeneous data, minimum k), high-heterogeneity data, sparse/zero-cell designs, subgroup and single-study-subgroup handling, bivariate DTA convergence, NMA consistency, and publication-bias detection.

Pooled estimates, standard errors, and heterogeneity statistics are calibrated against published reference values from established implementations where applicable. Results are reproducible given the same input data and settings.

For specific methodological questions or validation requests beyond what is described here, please contact the team.

Maintained by the meta-analysis.io team. Methods are cited using their canonical primary references; implementation details may be refined over time, but the formulas above are exactly what the shipping engine computes.

Questions or corrections? hello@meta-analysis.io