少女祈祷中...

Monte Carlo Integration: Correctness, Convergence, and Application in Rendering Equation

Correctness of Monte Carlo Integration

Given a probability density function $p(x)$ over the interval $[a, b]$, we sample a dataset $\{X_1, X_2, \dots, X_N\}$.

Define $F_N$:

$$F_N = \frac{1}{N} \sum_{i=1}^N \frac{f(X_i)}{p(X_i)}$$

Expected value of $F_N$:

$$\begin{aligned} E[F_N] &= E\left[\frac{1}{N} \sum_{i=1}^N \frac{f(X_i)}{p(X_i)}\right] \\ &= \frac{1}{N} \sum_{i=1}^N E\left[\frac{f(X_i)}{p(X_i)}\right] \\ &= \frac{1}{N} \sum_{i=1}^N \int_a^b \frac{f(x)}{p(x)} p(x) dx \\ &= \int_a^b f(x) dx. \end{aligned}$$

Thus, Monte Carlo integration converges to the true integral as $N \to \infty$:

$$\int_D f(x) dx = \lim_{N \to \infty} \frac{1}{N} \sum_{i=1}^N \frac{f(X_i)}{p(X_i)}.$$

Convergence and Variance Analysis

Let $Y = \frac{f(X)}{p(X)}$. The variance of $F_N$ is derived as:

$$\begin{aligned} \sigma^2(F_N) &= \sigma^2\left[\frac{1}{N} \sum_{i=1}^N Y_i\right] \\ &= \frac{1}{N^2} \sum_{i=1}^N \sigma^2[Y_i] \\ &= \frac{1}{N^2}(N \cdot \sigma^2[Y]) \\ &= \frac{1}{N} \sigma^2[Y]. \end{aligned}$$

Hence, the standard deviation:

$$\sigma(F_N) = \frac{\sigma[Y]}{\sqrt{N}}.$$
  • As $N \to \infty$, $\sigma(F_N) \to 0$, proving convergence.

Impact of variance ($\sigma^2[Y]$):

  • Smoothness of $Y$: If $Y$ has large variations, $\sigma^2[Y]$ increases, slowing convergence.
  • Choice of $p(x)$: If $p(x) \propto f(x)$, $\sigma^2[Y] \to 0$, and convergence is optimal.

Monte Carlo Integration for the Rendering Equation

Rendering equation:

$$L_o(x, \omega_o) = L_e(x, \omega_o) + \int_{\Omega} L_i(x, \omega_i) f(x, \omega_i, \omega_o) \cos \theta_i \, \mathrm{d}\omega_i$$

Monte Carlo approximation:

$$L_o(x, \omega_o) = L_e(x, \omega_o) + \frac{1}{N} \sum_{j=1}^N \frac{L_i(x, \omega_i) f(x, \omega_i, \omega_o) \cos \theta_i}{\text{pdf}(\omega_i)}$$

Key Insights

  1. Variance reduction:

    • If $\text{pdf}(\omega_i)$ is chosen to closely match the product $L_i(x, \omega_i) f(x, \omega_i, \omega_o) \cos \theta_i$, variance is minimized.
  2. Importance sampling:

    • Select $\text{pdf}(\omega_i) \propto L_i(x, \omega_i) f(x, \omega_i, \omega_o) \cos \theta_i$ for optimal convergence and stability.

Importance Sampling

  • Principle:
    Importance sampling selects a probability density function $p(x)$ that closely resembles or is proportional to the integrand $f(x)$. This reduces variance and improves convergence speed.

  • Monte Carlo Approximation with Importance Sampling:

    $$I = \int f(x) \, dx \approx \frac{1}{N} \sum_{i=1}^{N} \frac{f(x_i)}{p(x_i)} \quad \text{where } x_i \sim p(x)$$
    • A good choice of $p(x)$ minimizes variance.
    • Optimal case: $p(x) \propto f(x)$, which results in zero variance.

Resampled Importance Sampling (RIS)

  • Motivation:
    When directly sampling from a complex or unnormalized distribution $g(x)$ is difficult, Resampled Importance Sampling (RIS) provides an effective alternative.
    RIS generates approximate samples for $g(x)$ by leveraging another easier-to-sample distribution $p(x)$.

Steps for RIS

  1. Generate candidate samples:
    • Draw $M$ samples $\{x_1, x_2, \dots, x_M\}$ from a simple distribution $p(x)$.
  2. Compute weights for candidates:
    • Assign weights to each sample: $$w_j = \frac{g(x_j)}{p(x_j)}, \quad j \in [1, M]$$
  3. Resample $N$ samples:
    • Resample $\{y_1, y_2, \dots, y_N\}$ from the candidate set $\{x_1, x_2, \dots, x_M\}$ based on the weights $w_j$.

    • The new samples $\{y_i\}$ approximate the distribution $g(x)$.

Monte Carlo Integration with RIS

  • Estimator with Resampled Weights:
    If $g(x)$ is normalized and its samples are available, the Monte Carlo estimator can be written as:

    $$\hat{I}_{MC} = \frac{1}{N} \sum_{i=1}^{N} \frac{f(y_i)}{g(y_i)}$$
  • Estimator with Candidate Weights:
    When $g(x)$ is unnormalized, RIS introduces a weighting function $\omega^*(\cdot)$ based on the $M$ candidates and $N$ resampled points:

    $$\hat{I}_{RIS} = \frac{1}{N} \sum_{i=1}^{N} \frac{f(y_i)}{\hat{q}(y_i)} \cdot \omega^*$$
    • Weighting function: $$\omega^* = \frac{1}{M} \sum_{j=1}^{M} w(x_j)$$
  • Final RIS Formula:

    $$\hat{I}_{RIS} = \frac{1}{N} \sum_{i=1}^{N} \left( \frac{f(y_i)}{\hat{q}(y_i)} \right) \cdot \frac{1}{M} \sum_{j=1}^{M} w(x_j)$$
  • Conditions for Unbiasedness

    1. $g(x) > 0$ and $p(x) > 0$ wherever $f(x) \neq 0$.
    2. Both $M > 0$ (number of candidates) and $N > 0$ (number of resampled points).

Proof of Unbiasedness for RIS Sampling

To prove that the estimator is unbiased, we need to demonstrate that its expected value equals the true value. The core idea is based on the property:

$$E_{XY}(XY) = E_X \left( E_Y(Y \mid X) \cdot X \right)$$

Starting with the RIS estimator, we have:

$$E\left( \hat{L}_{\text{ris}} \right) = E\left[ E\left( \frac{1}{N} \sum_{i=1}^N \frac{f(Y_i)}{g(Y_i)} \;\middle|\; X_1, \dots, X_M \right) \cdot \frac{1}{M} \sum_{j=1}^M w(X_j) \right]$$

Since $Y$ is randomly selected from the sample, the expectation simplifies as:

$$E\left( \hat{L}_{\text{ris}} \right) = E\left[ E\left( \frac{f(Y)}{g(Y)} \;\middle|\; X_1, \dots, X_M \right) \cdot \frac{1}{M} \sum_{j=1}^M w(X_j) \right]$$

Given $Y$ is sampled from $X_1, \dots, X_M$, the inner expectation can be expressed as a sum over the weights of the samples:

$$E\left( \hat{L}_{\text{ris}} \right) = E\left[ \sum_{k=1}^M \left( \frac{f(X_k)}{g(X_k)} \cdot \frac{w(X_k)}{\sum_{j=1}^M w(X_j)} \right) \cdot \frac{1}{M} \sum_{j=1}^M w(X_j) \right]$$

The denominator in the weight normalization cancels with the outer sum, resulting in:

$$E\left( \hat{L}_{\text{ris}} \right) = E\left[ \frac{1}{M} \sum_{k=1}^M \frac{f(X_k)}{g(X_k)} \cdot w(X_k) \right]$$

Switching to the expectation over a single sample $X$, we get:

$$E\left( \hat{L}_{\text{ris}} \right) = E\left( \frac{f(X)}{g(X)} \cdot w(X) \right)$$

Thus, the RIS estimator is unbiased as its expected value equals the true value.

Applications in Ray Tracing

  • Integrand:
    The rendering equation integrand for Monte Carlo evaluation:

    $$L_i(x, \omega_o) f(\omega_i, \omega_o) \cos \theta$$
  • Process:

    • Generate $M$ candidate directions $\omega_i$ using a simple distribution $p(\omega_i)$.
    • Compute weights $w_j = \frac{g(\omega_j)}{p(\omega_j)}$.
    • Resample $N = 1$ direction for each query to compute an unbiased estimate of the integral.
  • Outcome:
    RIS reduces variance and ensures unbiased estimation, even when $g(x)$ is not normalized.

Reservoir Sampling and Weighted Reservoir Sampling

Reservoir Sampling

Reservoir Sampling is a class of algorithms used to randomly select $N$ elements from a stream of data (or from a set where $M$ is very large and cannot be stored entirely). The value of $N$ is usually small, and $M$ is typically large, meaning the entire dataset may not be stored in memory.

Basic Algorithm

Given a sequence of size $M$, or when the total size is unknown and data is being streamed, you need to sample $N$ elements.

  1. Initialization:

    • Create an array that can hold $N$ elements.
    • Populate this array with the first $N$ elements from the stream.
  2. Subsequent Elements:

    • Starting from element $N+1$, for each new element in the stream, decide with a probability of $\frac{N}{M_{\text{so far}}}$ (where $M_{\text{so far}}$ is the total number of elements seen so far) whether the element should replace one of the elements in the array.
    • The probability of replacement for any element in the reservoir is equal across the array.
  3. Final Result:

    • Once all elements have been processed, the array will contain $N$ elements, which are the required samples.

Derivation of the Probability

For each element in the reservoir, the probability of it being retained is derived as follows:

  • Before any new elements are encountered, all elements in the reservoir have a probability of 1 of being retained.

  • At step $N+1$, the probability that an element in the reservoir is replaced by the $N+1$-th element is:

    $$\text{probability of replacement} = \frac{N}{N+1} \times \frac{1}{N} = \frac{1}{N+1}$$

    Therefore, the probability of not being replaced is:

    $$1 - \frac{N}{N+1} \times \frac{1}{N} = \frac{N}{N+1}$$
  • At step $N+2$, the probability of not being replaced is:

    $$1 - \frac{N}{N+2} \times \frac{1}{N} = \frac{N+1}{N+2}$$
  • This process continues, and after $M$ elements have been processed, the probability of an element being retained in the reservoir is:

    $$\text{probability of being retained} = 1 \times \frac{N}{N+1} \times \frac{N+1}{N+2} \times \frac{N+2}{N+3} \times \ldots \times \frac{M-1}{M}=\frac{N}{M}$$

Hence, every element has an equal probability of $\frac{N}{M}$ of being retained in the final sample.

When the total number of elements, $M_{\text{so far}}$, is unknown, this probability becomes $\frac{N}{M_{\text{so far}}}$ at any point during the sampling process.

Weighted Reservoir Sampling

Weighted Reservoir Sampling extends the basic algorithm by allowing each element to have an associated weight $w_i$, which determines its likelihood of being included in the sample.

Pseudocode for Weighted Reservoir Sampling

# Reservoir Sampling Algorithm with Weights

class Reservoir:
    y = 0                # Output sample
    w_sum = 0            # Sum of weights
    M = 0                # Number of samples seen so far

    function update(x_i, w_i):
        w_sum = w_sum + w_i
        M = M + 1
        if rand() < (w_i / w_sum):
            y = x_i

function reservoirSampling(S):
    Reservoir r
    for i = 1 to M:
        r.update(S[i], weight(S[i]))
    return r

Derivation of the Probability

To understand why the probability of the last element being selected in Weighted Reservoir Sampling is $\frac{w_i}{w_{\text{total}}}$, let’s break it down step by step.

Definitions

  • $S = \{x_1, x_2, \dots, x_M\}$: A sequence of elements.
  • $w_i$: Weight associated with element $x_i$.
  • $w_{\text{sofar}} = \sum_{k=1}^{j} w_k$: Total weight of elements processed up to $x_j$.
  • $y$: Current reservoir sample.

Key Idea

Each element $x_i$ has a weight $w_i$, which determines the probability it gets selected into the reservoir at the time it is processed. For $x_i$ to remain in the reservoir, it must survive the updates from all subsequent elements $x_{i+1}, x_{i+2}, \dots, x_M$.

Selection Probability for $x_i$

  1. Initial Selection
    When $x_i$ is processed, it has a probability of being selected as $y$:

    $$P(\text{select } x_i \text{ at step } i) = \frac{w_i}{w_{\text{sofar}}}$$

    Here, $w_{\text{sofar}}$ is the sum of weights of the elements processed up to $x_i$.

  2. Survival Probability
    After $x_i$ is selected, it must survive replacement by $x_{i+1}, x_{i+2}, \dots, x_M$. At each subsequent step $j$, $x_i$ is replaced with probability:

    $$P(\text{replace } x_i \text{ at step } j) = \frac{w_j}{w_{\text{sofar}}^{(j)}}$$

    where $w_{\text{sofar}}^{(j)}$ is the cumulative weight after including $x_j$.

    The probability that $x_i$ survives step $j$ is:

    $$P(\text{survive step } j) = 1 - \frac{w_j}{w_{\text{sofar}}^{(j)}} = \frac{w_{\text{sofar}}^{(j)} - w_j}{w_{\text{sofar}}^{(j)}}$$
  3. Combined Survival Probability
    For $x_i$ to remain the final reservoir sample, it must survive all replacements from $x_{i+1}$ to $x_M$. The overall survival probability is:

    $$P(\text{survive } x_{i+1}, \dots, x_M) = \prod_{j=i+1}^M \left( \frac{w_{\text{sofar}}^{(j)} - w_j}{w_{\text{sofar}}^{(j)}} \right)$$
  4. Final Probability for $x_i$
    Combining the probability of being selected at step $i$ with the probability of surviving subsequent steps, we have:

    $$P(x_i \text{ is final sample}) = P(\text{select } x_i \text{ at step } i) \cdot P(\text{survive } x_{i+1}, \dots, x_M)$$

    Substituting the expressions:

    $$P(x_i \text{ is final sample}) = \frac{w_i}{w_{\text{sofar}}} \cdot \prod_{j=i+1}^M \left( \frac{w_{\text{sofar}}^{(j)} - w_j}{w_{\text{sofar}}^{(j)}} \right)$$

Simplification Using Recursion

The term $\frac{w_{\text{sofar}}^{(j)} - w_j}{w_{\text{sofar}}^{(j)}}$ can be rewritten as:

$$\frac{w_{\text{sofar}}^{(j)} - w_j}{w_{\text{sofar}}^{(j)}} = \frac{w_{\text{sofar}}^{(j-1)}}{w_{\text{sofar}}^{(j)}}$$

This reflects the fact that removing the weight $w_j$ leaves the cumulative weight of the previous step. Substituting this into the product:

$$\prod_{j=i+1}^M \left( \frac{w_{\text{sofar}}^{(j)} - w_j}{w_{\text{sofar}}^{(j)}} \right) = \prod_{j=i+1}^M \left( \frac{w_{\text{sofar}}^{(j-1)}}{w_{\text{sofar}}^{(j)}} \right)$$

This is a telescoping product, where intermediate terms cancel out, leaving:

$$\prod_{j=i+1}^M \left( \frac{w_{\text{sofar}}^{(j-1)}}{w_{\text{sofar}}^{(j)}} \right) = \frac{w_{\text{sofar}}^{(i)}}{w_{\text{sofar}}^{(M)}}$$

Substituting Back

Substituting this result into the original formula:

$$P(x_i \text{ is final sample}) = \frac{w_i}{w_{\text{sofar}}^{(i)}} \cdot \frac{w_{\text{sofar}}^{(i)}}{w_{\text{sofar}}^{(M)}}$$

The $w_{\text{sofar}}^{(i)}$ terms cancel, leaving:

$$P(x_i \text{ is final sample}) = \frac{w_i}{w_{\text{sofar}}^{(M)}}$$

Since $w_{\text{sofar}}^{(M)} = w_{\text{total}}$, we conclude:

$$P(x_i \text{ is final sample}) = \frac{w_i}{w_{\text{total}}}$$

Combining Weighted Reservoir Sampling with RIS

The following steps illustrate how this combination works:

  1. Generate a sample stream for each pixel, neighboring pixels, and time-reusable pixels in the image.
  2. Update the reservoir with the sample, weight, and the total count of samples each time.
  3. After updating, use the RIS unbiased estimator to perform the final sampling.

ReSTIR GI: Path Resampling for Real-Time Path Tracing