The aim of the stors
package is to generate samples from
uni- and multimodal distributions using the Rejection Sampling
technique. Stors stands out for its ability to leverage the system’s
capabilities to optimize the sampling process.
Stors maximizes the sampling speed from the proposal distribution by constructing steps with uniform areas that cover the target distribution from which sampling is needed. To address potential efficiency issues in heavy and light-tailed distributions, we switch to Adaptive Rejection Sampling (ARS) for the tails, specifically for the proposals created by the user. Fortunately, since ARS is used only for the tails, it eliminates the requirement for log concavity in most distributions, allowing us to cover a wide range of distributions that have log-concave tails such as normal multimodal distrebutions.
To use the stors
package, we need to differentiate
between two phases. The first phase involves building the proposal for
the proposal distribution. The second step is caching this proposal in
memory for sampling. This ensures optimum speed performance, as the
proposal data is constructed independently from the sampling process.
Most R base sampling functions are already covered in the stors package,
making sampling straightforward.
To sample from any built-in distribution, a proposal for each distribution is pre-optimized using 4091 steps when the package is loaded for the first time. This optimization process constructs the proposal and stores it in the internal R data directory. The proposal is then cached in memory for efficient sampling whenever needed.
You can also visualize the proposal data using the function to better understand its structure.
Here’s an example of optimizing a proposal for the Normal distribution (note: this process is performed automatically when the package is installed and loaded for the first time):
library(stors)
# Optimize a proposal for the Normal distribution with 4091 steps
proposal <- srnorm_optimize(steps = 4091)
# Visualize the optimized proposal
plot(proposal)
With the optimized proposal ready, sampling from a Normal distribution with specific and is straightforward using the function.
# Generating 10 samples from the standard Normal distribution
standard_normal_samples <- srnorm(10, mean = 2, sd = 1)
print(standard_normal_samples)
#> [1] 2.5074614 2.2062952 1.2156599 1.0447653 4.0094671 1.4974208 2.3262260
#> [8] 1.9863643 1.5214892 0.8225843
On the other hand, users can optimize a proposal for a specific Normal distribution by calling and passing the desired and arguments. In this case, a unique proposal will be optimized and cached in memory. To sample from this custom proposal, the user can use the function by simply specifying the sample size.
# Optimize a custom proposal for N(3, 2)
proposal <- srnorm_optimize(mean = 3, sd = 2, steps = 2000)
# Generate a sample of size 10^3 from N(3, 2)
sample <- srnorm_custom(10^3)
# Visualize the generated sample
hist(sample, main = "Histogram of Samples from N(3, 2)", xlab = "Value", col = "skyblue", border = "white")
To sample from a truncated Normal distribution, users need to optimize either the scalable proposal or a custom proposal and specify the truncation bounds using the (lower bound) and (upper bound) arguments.
# Define the truncation bounds
lower_bound <- -1
upper_bound <- 1
# Optimize the scalable proposal with truncation bounds
proposal <- srnorm_optimize(xl = lower_bound, xr = upper_bound, steps = 4091)
# Generate samples from the truncated standard Normal distribution
sample <- srnorm(10^3)
hist(sample, main = "Truncated Standard Normal (-1, 1)", xlab = "Value", col = "lightblue", border = "white")
# Generate samples from the truncated Normal distribution N(2, 1)
sample <- srnorm(10^3, mean = 2)
hist(sample, main = "Truncated Normal N(2, 1) (-1, 1)", xlab = "Value", col = "lightgreen", border = "white")
The same process can be applied to a custom proposal. A custom proposal can be truncated and used for sampling by following this example:
# Define the truncation bounds
lower_bound <- 2
upper_bound <- 6
# Optimize a custom proposal with truncation bounds and mean = 4
proposal <- srnorm_optimize(mean = 4, xl = lower_bound, xr = upper_bound, steps = 4091)
# Generate samples from the truncated Normal distribution
sample <- srnorm_custom(10^3)
hist(sample, main = "Truncated Custom Normal (2, 6)", xlab = "Value", col = "lightblue", border = "white")
The functions not only optimize the proposal but also return a detailed list describing the properties of the optimized proposal. This list provides insights into the proposal’s structure and characteristics. You can use the base function to view these details or the function to visualize the proposal.
# Optimize a proposal for the Laplace distribution
proposal <- srlaplace_optimize(steps = 4091)
# View the details of the optimized proposal
print(proposal)
#>
#> =========================================
#> Proposal Summary
#> -----------------------------------------
#> Total steps: 4,091
#> Steps range: [-6.611093, 6.335299]
#> Sampling efficiency: 99.84%
#> =========================================
# Visualize the proposal
plot(proposal)
In cases where the proposal is dense and difficult to visualize in its entirety, you can focus on a specific region by using the function with the and parameters. These parameters allow you to set limits on the x-axis, making it easier to zoom in on areas of interest within the proposal.
# Optimize a proposal for the Normal distribution
proposal <- srlaplace_optimize(steps = 1020)
# Visualize the entire proposal
plot(proposal, main = "Full Proposal Visualization")
# Focus on a specific region of the proposal (e.g., x-axis range -2 to 2)
plot(proposal, x_min = -1, x_max = 1, main = "Zoomed-In Proposal (-1, 1)")
These functionalists of the proposal_optimizers and the subsequent visualization tools provide users with in-depth insights into the characteristics of the generated proposal, enabling them to understand and fine-tune the sampling process.
If the user is not satisfied with the automatically optimized
proposal, the proposal_optimizers functions offers several arguments for
customization. One such argument is target_sample_size
,
which defaults to 1000. If the user wishes to optimize the proposal for
significantly larger or smaller sample sizes, this argument can be
altered.
Another important argument is the pre-acceptance threshold
theta
, which essentially is the ratio between the lower and
upper linear bounds for each step. This is typically set to a very small
number close to zero. The reason is that the cost of rejecting samples,
even in heavy-tailed distributions, is usually less than the cost of
using alternative exact sampling methods (like Inverse Transform) for
dense proposals, as the probability of hitting these inefficient steps
is low. However, for very large sample sizes (e.g., 10^9), these
inefficient steps might be hit more frequently, leading to an increase
in sampling time due to lower efficiency. In such cases, the user can
set a higher pre-acceptance threshold using the theta
parameter.
For users who are less familiar with the target distribution, or want
to focus on a specific range of the distribution, the
proposal_range
argument can be used. This parameter limits
the step construction to a specified range on the x-axis, offering more
control over the proposal structure.
# Customizing the proposal for a specific sample size
custom_proposal <- srnorm_optimize(target_sample_size = 10000)
print(custom_proposal)
#>
#> =========================================
#> Proposal Summary
#> -----------------------------------------
#> Total steps: 512
#> Steps range: [-3.502872, 3.502872]
#> Sampling efficiency: 98.98%
#> =========================================
plot(custom_proposal)
# Customizing the proposal with a specific pre-acceptance threshold
custom_proposal_high_theta <- srnorm_optimize(theta = 0.9)
print(custom_proposal_high_theta)
#>
#> =========================================
#> Proposal Summary
#> -----------------------------------------
#> Total steps: 1,934
#> Steps range: [-2.659710, 2.659710]
#> Sampling efficiency: 99.73%
#> =========================================
plot(custom_proposal_high_theta)
# Customizing the proposal within a specific range
custom_proposal_range <- srnorm_optimize(proposal_range = c(-1, 1))
print(custom_proposal_range)
#>
#> =========================================
#> Proposal Summary
#> -----------------------------------------
#> Total steps: 90
#> Steps range: [-1.017693, 1.017693]
#> Sampling efficiency: 86.04%
#> =========================================
plot(custom_proposal_range)
In addition to the target_sample_size
,
theta
, and proposal_range
parameters, users
have the flexibility to directly control the number of steps in the
proposal. This can be particularly useful when users have specific
requirements or preferences for the granularity of the proposal. To
specify the number of steps, the steps
argument can be
passed to the proposal_optimizer function.
By setting the steps
parameter, users can determine the
resolution of the proposal, which can impact the balance between
computational efficiency and the granularity of the sampling. A higher
number of steps typically provides a more detailed proposal, potentially
leading to more precise sampling, but at the cost of increased memory
usage and possibly longer computation times.
# Customizing the proposal with a specific number of steps
custom_proposal_steps <- srnorm_optimize(steps = 50)
plot(custom_proposal_steps)
The user can also create a symmetric proposal for built-in distributions by setting the argument to the value around which the target density is symmetric.
Example: Creating a Symmetric Proposal for the Standard Normal Distribution
In this example, we create a symmetric proposal for the standard Normal distribution. Before doing so, we first delete all previously constructed non-symmetric proposals (both scaled and custom) using the function.
#> proposal number 1 DELETED !
#> proposal number 2 DELETED !
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?
Sampling from Truncated Distributions One of the standout features of STORS is its ability to sample efficiently from truncated distributions. To sample from a truncated distribution, the user simply needs to specify the upper and lower truncation bounds using the (lower bound) and (upper bound) arguments in the function.
Example: Sampling from a Truncated Standard Normal Distribution In this example, we demonstrate how to sample from a truncated standard Normal distribution with truncation bounds at and .
#> proposal number 1 DELETED !
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?
There are many built-in functions in stors, the list below shows all available functions that can be used directly by the user after optimizing the proposals:
#> srnorm
#> srlaplace
#> srexp
#> srchisq
#> srgamma
#> srbeta
#> srpareto
For each built-in distribution, there is a corresponding proposal distribution optimizer that can be invoked using the format . This function optimizes the proposal for the specified distribution, enabling efficient sampling.
For distributions not covered by the built-in functions and that
satisfy the log concavity condition, users have the flexibility to
create their own proposals using the build_proposal()
function. This function is designed to handle a variety of distributions
by accepting a range of parameters tailored to the specific needs of the
distribution being sampled from.
The following example demonstrates the process of sampling from a bimodal normal distribution. Let’s assume that the user wants to sample from a bimodal distribution, which is a combination of two normal distributions f(x) = 0.5w1(x) + 0.5w2(x), where w1(x) ∼ N(0, 1) and w2(x) ∼ N(4, 1):
$$ f(x) = 0.5 \frac{1}{\sqrt{2 \pi}} \exp\left(-\frac{x^2}{2}\right) + 0.5 \frac{1}{\sqrt{2 \pi}} \exp\left(-\frac{(x - 4)^2}{2}\right) $$
To sample from the tails of f(x) using ARS, we need the log transformation of f(x), h(x) = log(f(x)), and its first derivative $h_\text{prime}(x) = \frac{d}{dx} h(x)$. Also, the modes of f(x) are required to build the steps part of the proposal distribution, which in this case are 0.00134865 and 3.99865.
modes_bi <- c(0.00134865, 3.99865)
f_bi <- function(x) {
0.5 * (sqrt(2 * pi))^(-1) * exp(-(x^2) / 2) + 0.5 * (sqrt(2 * pi))^(-1) * exp(-((x - 4)^2) / 2)
}
h_bi <- function(x) log(f_bi(x))
h_prime_bi <- function(x) {
(-(exp(-1 / 2 * (-4 + x)^2) * 0.5 * (-4 + x)) / sqrt(2 * pi) - (exp(-x^2 / 2) * 0.5 * x) / sqrt(2 * pi)) / ((exp(-x^2 / 2) * 0.5) / sqrt(2 * pi) + (exp(-1 / 2 * (-4 + x)^2) * 0.5) / sqrt(2 * pi))
}
bi_proposal <- build_proposal(lower = -Inf, upper = Inf, modes = modes_bi, f = f_bi, h = h_bi, h_prime = h_prime_bi)
plot(bi_proposal)
After building the desired proposal with
build_proposal()
, users can explore its properties using
the print()
function and visualize its structure using the
plot()
function. This step is similar to what was done
previously with the built-in proposal and helps in understanding the
proposal’s configuration and ensuring that it aligns with the
requirements of the target distribution.
# Printing the properties of the custom proposal
print(bi_proposal)
#>
#> =========================================
#> Proposal Summary
#> -----------------------------------------
#> Total steps: 4,315
#> Steps range: [-3.888020, 7.887194]
#> Sampling efficiency: 99.79%
#> =========================================
# Visualizing the custom proposal
plot(bi_proposal)
Once the proposal is configured and validated, it can be passed to
the build_sampler()
function. build_sampler()
caches the proposal in memory, providing fast access for sampling, and
returns a sampler function for the target distribution. This sampler
function can then be used to generate samples as needed.
# Creating a sampler function using the custom proposal
bi_sampler <- build_sampler(bi_proposal)
plot(bi_proposal)
# Generating samples from the target distribution
sample <- bi_sampler(10^3)
hist(sample, main = "Sample Generated From Multi-modal Distribution ", xlab = "Value", col = "lightblue", border = "white")
Sampling from a Truncated User-Defined Distribution To sample from a truncated user-defined distribution, simply provide the lower and upper truncation bounds using the (lower bound) and (upper bound) arguments. Below, we demonstrate this process using a previously created multi-modal distribution.
Example: Sampling from a Truncated Multi-Modal Distribution
# Define modes for the multi-modal distribution
modes_bi <- c(0.00134865, 3.99865)
# Build a truncated proposal for the multi-modal distribution
bi_trunc_proposal <- build_proposal(
lower = -1,
upper = 6,
modes = modes_bi,
f = f_bi,
h = h_bi,
h_prime = h_prime_bi,
steps = 2040
)
# Visualize the truncated proposal
plot(bi_trunc_proposal, main = "Truncated Multi-Modal Proposal (-1, 6)", xlab = "Value", col = "skyblue")
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?
# Create a sampler for the truncated distribution
bi_sampler_trunc_sampler <- build_sampler(bi_trunc_proposal)
#> Detected cached proposal for this distribution already ... replacing with new proposal.
# Generate 10^3 samples from the truncated distribution
sample <- bi_sampler_trunc_sampler(10^3)
# Generate 10^3 samples from the truncated distribution
sample <- bi_sampler_trunc_sampler(10^3)
# Visualize the generated samples
hist(
sample,
main = "Histogram of Samples from Truncated Multi-Modal Distribution",
xlab = "Value",
col = "lightgreen",
border = "white"
)
The proposal stored in bi_proposal
will not be preserved
across sessions. Therefore, if the user is satisfied with the sampling
results and wishes to store the proposal for future use, they should
save the proposal using the save_proposal()
function. This
function takes the optimized proposal and a filename chosen by the user
to save the proposal in the internal R data directory on their machine.
This is recommended if the user plans to use the proposal frequently
across sessions, to avoid re-optimizing the proposal in future
sessions.
To load and use the proposal in future sessions, the user only needs
to use the load_proposal()
function.
In case the user wants to view a list of all previously stored
proposals, the print_proposals()
function will provide the
user with all stored proposals.
print_proposals()
#>
#> =====================================
#> Proposals Data:
#> Name | Size (KB) | Date
#> ---------------------------------------------
#> bimodal_proposal | 132.16 | 2025-02-21 18:26:40
#> bimodal_proposal_1 | 132.16 | 2025-02-21 18:26:40
#> bimodal_proposal_2 | 132.16 | 2025-02-21 18:26:40
#> bimodal_proposal_3 | 132.16 | 2025-02-21 18:26:40
#> ---------------------------------------------
#> Total Size: | 528.63
#> =====================================
Be aware that the saved proposal can be quite large in size,
depending on the user’s system. Therefore, if the user wants to delete a
proposal, the delete_proposal()
function can be used by
providing the proposal’s name.
delete_proposal("bimodal_proposal")
#> bimodal_proposal.rdsproposal deleted successfully
print_proposals()
#>
#> =====================================
#> Proposals Data:
#> Name | Size (KB) | Date
#> ---------------------------------------------
#> bimodal_proposal_1 | 132.16 | 2025-02-21 18:26:40
#> bimodal_proposal_2 | 132.16 | 2025-02-21 18:26:40
#> bimodal_proposal_3 | 132.16 | 2025-02-21 18:26:40
#> ---------------------------------------------
#> Total Size: | 396.47
#> =====================================