Monte Carlo Integration: Correctness, Convergence, and Application in Rendering Equation
Correctness of Monte Carlo Integration
Given a probability density function $p(x)$ over the interval $[a, b]$, we sample a dataset $\{X_1, X_2, \dots, X_N\}$.
Define $F_N$:
$$F_N = \frac{1}{N} \sum_{i=1}^N \frac{f(X_i)}{p(X_i)}$$Expected value of $F_N$:
$$\begin{aligned} E[F_N] &= E\left[\frac{1}{N} \sum_{i=1}^N \frac{f(X_i)}{p(X_i)}\right] \\ &= \frac{1}{N} \sum_{i=1}^N E\left[\frac{f(X_i)}{p(X_i)}\right] \\ &= \frac{1}{N} \sum_{i=1}^N \int_a^b \frac{f(x)}{p(x)} p(x) dx \\ &= \int_a^b f(x) dx. \end{aligned}$$Thus, Monte Carlo integration converges to the true integral as $N \to \infty$:
$$\int_D f(x) dx = \lim_{N \to \infty} \frac{1}{N} \sum_{i=1}^N \frac{f(X_i)}{p(X_i)}.$$Convergence and Variance Analysis
Let $Y = \frac{f(X)}{p(X)}$. The variance of $F_N$ is derived as:
$$\begin{aligned} \sigma^2(F_N) &= \sigma^2\left[\frac{1}{N} \sum_{i=1}^N Y_i\right] \\ &= \frac{1}{N^2} \sum_{i=1}^N \sigma^2[Y_i] \\ &= \frac{1}{N^2}(N \cdot \sigma^2[Y]) \\ &= \frac{1}{N} \sigma^2[Y]. \end{aligned}$$Hence, the standard deviation:
$$\sigma(F_N) = \frac{\sigma[Y]}{\sqrt{N}}.$$- As $N \to \infty$, $\sigma(F_N) \to 0$, proving convergence.
Impact of variance ($\sigma^2[Y]$):
- Smoothness of $Y$: If $Y$ has large variations, $\sigma^2[Y]$ increases, slowing convergence.
- Choice of $p(x)$: If $p(x) \propto f(x)$, $\sigma^2[Y] \to 0$, and convergence is optimal.
Monte Carlo Integration for the Rendering Equation
Rendering equation:
$$L_o(x, \omega_o) = L_e(x, \omega_o) + \int_{\Omega} L_i(x, \omega_i) f(x, \omega_i, \omega_o) \cos \theta_i \, \mathrm{d}\omega_i$$Monte Carlo approximation:
$$L_o(x, \omega_o) = L_e(x, \omega_o) + \frac{1}{N} \sum_{j=1}^N \frac{L_i(x, \omega_i) f(x, \omega_i, \omega_o) \cos \theta_i}{\text{pdf}(\omega_i)}$$Key Insights
-
Variance reduction:
- If $\text{pdf}(\omega_i)$ is chosen to closely match the product $L_i(x, \omega_i) f(x, \omega_i, \omega_o) \cos \theta_i$, variance is minimized.
-
Importance sampling:
- Select $\text{pdf}(\omega_i) \propto L_i(x, \omega_i) f(x, \omega_i, \omega_o) \cos \theta_i$ for optimal convergence and stability.
Importance Sampling
-
Principle:
Importance sampling selects a probability density function $p(x)$ that closely resembles or is proportional to the integrand $f(x)$. This reduces variance and improves convergence speed. -
Monte Carlo Approximation with Importance Sampling:
$$I = \int f(x) \, dx \approx \frac{1}{N} \sum_{i=1}^{N} \frac{f(x_i)}{p(x_i)} \quad \text{where } x_i \sim p(x)$$- A good choice of $p(x)$ minimizes variance.
- Optimal case: $p(x) \propto f(x)$, which results in zero variance.
Resampled Importance Sampling (RIS)
- Motivation:
When directly sampling from a complex or unnormalized distribution $g(x)$ is difficult, Resampled Importance Sampling (RIS) provides an effective alternative.
RIS generates approximate samples for $g(x)$ by leveraging another easier-to-sample distribution $p(x)$.
Steps for RIS
- Generate candidate samples:
- Draw $M$ samples $\{x_1, x_2, \dots, x_M\}$ from a simple distribution $p(x)$.
- Compute weights for candidates:
- Assign weights to each sample: $$w_j = \frac{g(x_j)}{p(x_j)}, \quad j \in [1, M]$$
- Resample $N$ samples:
-
Resample $\{y_1, y_2, \dots, y_N\}$ from the candidate set $\{x_1, x_2, \dots, x_M\}$ based on the weights $w_j$.
-
The new samples $\{y_i\}$ approximate the distribution $g(x)$.
-
Monte Carlo Integration with RIS
-
Estimator with Resampled Weights:
$$\hat{I}_{MC} = \frac{1}{N} \sum_{i=1}^{N} \frac{f(y_i)}{g(y_i)}$$
If $g(x)$ is normalized and its samples are available, the Monte Carlo estimator can be written as: -
Estimator with Candidate Weights:
$$\hat{I}_{RIS} = \frac{1}{N} \sum_{i=1}^{N} \frac{f(y_i)}{\hat{q}(y_i)} \cdot \omega^*$$
When $g(x)$ is unnormalized, RIS introduces a weighting function $\omega^*(\cdot)$ based on the $M$ candidates and $N$ resampled points:- Weighting function: $$\omega^* = \frac{1}{M} \sum_{j=1}^{M} w(x_j)$$
-
Final RIS Formula:
$$\hat{I}_{RIS} = \frac{1}{N} \sum_{i=1}^{N} \left( \frac{f(y_i)}{\hat{q}(y_i)} \right) \cdot \frac{1}{M} \sum_{j=1}^{M} w(x_j)$$ -
Conditions for Unbiasedness
- $g(x) > 0$ and $p(x) > 0$ wherever $f(x) \neq 0$.
- Both $M > 0$ (number of candidates) and $N > 0$ (number of resampled points).
Proof of Unbiasedness for RIS Sampling
To prove that the estimator is unbiased, we need to demonstrate that its expected value equals the true value. The core idea is based on the property:
$$E_{XY}(XY) = E_X \left( E_Y(Y \mid X) \cdot X \right)$$Starting with the RIS estimator, we have:
$$E\left( \hat{L}_{\text{ris}} \right) = E\left[ E\left( \frac{1}{N} \sum_{i=1}^N \frac{f(Y_i)}{g(Y_i)} \;\middle|\; X_1, \dots, X_M \right) \cdot \frac{1}{M} \sum_{j=1}^M w(X_j) \right]$$Since $Y$ is randomly selected from the sample, the expectation simplifies as:
$$E\left( \hat{L}_{\text{ris}} \right) = E\left[ E\left( \frac{f(Y)}{g(Y)} \;\middle|\; X_1, \dots, X_M \right) \cdot \frac{1}{M} \sum_{j=1}^M w(X_j) \right]$$Given $Y$ is sampled from $X_1, \dots, X_M$, the inner expectation can be expressed as a sum over the weights of the samples:
$$E\left( \hat{L}_{\text{ris}} \right) = E\left[ \sum_{k=1}^M \left( \frac{f(X_k)}{g(X_k)} \cdot \frac{w(X_k)}{\sum_{j=1}^M w(X_j)} \right) \cdot \frac{1}{M} \sum_{j=1}^M w(X_j) \right]$$The denominator in the weight normalization cancels with the outer sum, resulting in:
$$E\left( \hat{L}_{\text{ris}} \right) = E\left[ \frac{1}{M} \sum_{k=1}^M \frac{f(X_k)}{g(X_k)} \cdot w(X_k) \right]$$Switching to the expectation over a single sample $X$, we get:
$$E\left( \hat{L}_{\text{ris}} \right) = E\left( \frac{f(X)}{g(X)} \cdot w(X) \right)$$Thus, the RIS estimator is unbiased as its expected value equals the true value.
Applications in Ray Tracing
-
Integrand:
$$L_i(x, \omega_o) f(\omega_i, \omega_o) \cos \theta$$
The rendering equation integrand for Monte Carlo evaluation: -
Process:
- Generate $M$ candidate directions $\omega_i$ using a simple distribution $p(\omega_i)$.
- Compute weights $w_j = \frac{g(\omega_j)}{p(\omega_j)}$.
- Resample $N = 1$ direction for each query to compute an unbiased estimate of the integral.
-
Outcome:
RIS reduces variance and ensures unbiased estimation, even when $g(x)$ is not normalized.
Reservoir Sampling and Weighted Reservoir Sampling
Reservoir Sampling
Reservoir Sampling is a class of algorithms used to randomly select $N$ elements from a stream of data (or from a set where $M$ is very large and cannot be stored entirely). The value of $N$ is usually small, and $M$ is typically large, meaning the entire dataset may not be stored in memory.
Basic Algorithm
Given a sequence of size $M$, or when the total size is unknown and data is being streamed, you need to sample $N$ elements.
-
Initialization:
- Create an array that can hold $N$ elements.
- Populate this array with the first $N$ elements from the stream.
-
Subsequent Elements:
- Starting from element $N+1$, for each new element in the stream, decide with a probability of $\frac{N}{M_{\text{so far}}}$ (where $M_{\text{so far}}$ is the total number of elements seen so far) whether the element should replace one of the elements in the array.
- The probability of replacement for any element in the reservoir is equal across the array.
-
Final Result:
- Once all elements have been processed, the array will contain $N$ elements, which are the required samples.
Derivation of the Probability
For each element in the reservoir, the probability of it being retained is derived as follows:
-
Before any new elements are encountered, all elements in the reservoir have a probability of 1 of being retained.
-
At step $N+1$, the probability that an element in the reservoir is replaced by the $N+1$-th element is:
$$\text{probability of replacement} = \frac{N}{N+1} \times \frac{1}{N} = \frac{1}{N+1}$$Therefore, the probability of not being replaced is:
$$1 - \frac{N}{N+1} \times \frac{1}{N} = \frac{N}{N+1}$$ -
At step $N+2$, the probability of not being replaced is:
$$1 - \frac{N}{N+2} \times \frac{1}{N} = \frac{N+1}{N+2}$$ -
This process continues, and after $M$ elements have been processed, the probability of an element being retained in the reservoir is:
$$\text{probability of being retained} = 1 \times \frac{N}{N+1} \times \frac{N+1}{N+2} \times \frac{N+2}{N+3} \times \ldots \times \frac{M-1}{M}=\frac{N}{M}$$
Hence, every element has an equal probability of $\frac{N}{M}$ of being retained in the final sample.
When the total number of elements, $M_{\text{so far}}$, is unknown, this probability becomes $\frac{N}{M_{\text{so far}}}$ at any point during the sampling process.
Weighted Reservoir Sampling
Weighted Reservoir Sampling extends the basic algorithm by allowing each element to have an associated weight $w_i$, which determines its likelihood of being included in the sample.
Pseudocode for Weighted Reservoir Sampling
# Reservoir Sampling Algorithm with Weights
class Reservoir:
y = 0 # Output sample
w_sum = 0 # Sum of weights
M = 0 # Number of samples seen so far
function update(x_i, w_i):
w_sum = w_sum + w_i
M = M + 1
if rand() < (w_i / w_sum):
y = x_i
function reservoirSampling(S):
Reservoir r
for i = 1 to M:
r.update(S[i], weight(S[i]))
return r
Derivation of the Probability
To understand why the probability of the last element being selected in Weighted Reservoir Sampling is $\frac{w_i}{w_{\text{total}}}$, let’s break it down step by step.
Definitions
- $S = \{x_1, x_2, \dots, x_M\}$: A sequence of elements.
- $w_i$: Weight associated with element $x_i$.
- $w_{\text{sofar}} = \sum_{k=1}^{j} w_k$: Total weight of elements processed up to $x_j$.
- $y$: Current reservoir sample.
Key Idea
Each element $x_i$ has a weight $w_i$, which determines the probability it gets selected into the reservoir at the time it is processed. For $x_i$ to remain in the reservoir, it must survive the updates from all subsequent elements $x_{i+1}, x_{i+2}, \dots, x_M$.
Selection Probability for $x_i$
-
Initial Selection
$$P(\text{select } x_i \text{ at step } i) = \frac{w_i}{w_{\text{sofar}}}$$
When $x_i$ is processed, it has a probability of being selected as $y$:Here, $w_{\text{sofar}}$ is the sum of weights of the elements processed up to $x_i$.
-
Survival Probability
$$P(\text{replace } x_i \text{ at step } j) = \frac{w_j}{w_{\text{sofar}}^{(j)}}$$
After $x_i$ is selected, it must survive replacement by $x_{i+1}, x_{i+2}, \dots, x_M$. At each subsequent step $j$, $x_i$ is replaced with probability:where $w_{\text{sofar}}^{(j)}$ is the cumulative weight after including $x_j$.
The probability that $x_i$ survives step $j$ is:
$$P(\text{survive step } j) = 1 - \frac{w_j}{w_{\text{sofar}}^{(j)}} = \frac{w_{\text{sofar}}^{(j)} - w_j}{w_{\text{sofar}}^{(j)}}$$ -
Combined Survival Probability
$$P(\text{survive } x_{i+1}, \dots, x_M) = \prod_{j=i+1}^M \left( \frac{w_{\text{sofar}}^{(j)} - w_j}{w_{\text{sofar}}^{(j)}} \right)$$
For $x_i$ to remain the final reservoir sample, it must survive all replacements from $x_{i+1}$ to $x_M$. The overall survival probability is: -
Final Probability for $x_i$
$$P(x_i \text{ is final sample}) = P(\text{select } x_i \text{ at step } i) \cdot P(\text{survive } x_{i+1}, \dots, x_M)$$
Combining the probability of being selected at step $i$ with the probability of surviving subsequent steps, we have:Substituting the expressions:
$$P(x_i \text{ is final sample}) = \frac{w_i}{w_{\text{sofar}}} \cdot \prod_{j=i+1}^M \left( \frac{w_{\text{sofar}}^{(j)} - w_j}{w_{\text{sofar}}^{(j)}} \right)$$
Simplification Using Recursion
The term $\frac{w_{\text{sofar}}^{(j)} - w_j}{w_{\text{sofar}}^{(j)}}$ can be rewritten as:
$$\frac{w_{\text{sofar}}^{(j)} - w_j}{w_{\text{sofar}}^{(j)}} = \frac{w_{\text{sofar}}^{(j-1)}}{w_{\text{sofar}}^{(j)}}$$This reflects the fact that removing the weight $w_j$ leaves the cumulative weight of the previous step. Substituting this into the product:
$$\prod_{j=i+1}^M \left( \frac{w_{\text{sofar}}^{(j)} - w_j}{w_{\text{sofar}}^{(j)}} \right) = \prod_{j=i+1}^M \left( \frac{w_{\text{sofar}}^{(j-1)}}{w_{\text{sofar}}^{(j)}} \right)$$This is a telescoping product, where intermediate terms cancel out, leaving:
$$\prod_{j=i+1}^M \left( \frac{w_{\text{sofar}}^{(j-1)}}{w_{\text{sofar}}^{(j)}} \right) = \frac{w_{\text{sofar}}^{(i)}}{w_{\text{sofar}}^{(M)}}$$Substituting Back
Substituting this result into the original formula:
$$P(x_i \text{ is final sample}) = \frac{w_i}{w_{\text{sofar}}^{(i)}} \cdot \frac{w_{\text{sofar}}^{(i)}}{w_{\text{sofar}}^{(M)}}$$The $w_{\text{sofar}}^{(i)}$ terms cancel, leaving:
$$P(x_i \text{ is final sample}) = \frac{w_i}{w_{\text{sofar}}^{(M)}}$$Since $w_{\text{sofar}}^{(M)} = w_{\text{total}}$, we conclude:
$$P(x_i \text{ is final sample}) = \frac{w_i}{w_{\text{total}}}$$Combining Weighted Reservoir Sampling with RIS
The following steps illustrate how this combination works:
- Generate a sample stream for each pixel, neighboring pixels, and time-reusable pixels in the image.
- Update the reservoir with the sample, weight, and the total count of samples each time.
- After updating, use the RIS unbiased estimator to perform the final sampling.