---
title: "stors package"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{stors package}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Introduction
The aim of the `stors` package is to generate samples from uni- and multimodal distributions using the Rejection Sampling technique. Stors stands out for its ability to leverage the system's capabilities to optimize the sampling process.
Stors maximizes the sampling speed from the proposal distribution by constructing steps with uniform areas that cover the target distribution from which sampling is needed. To address potential efficiency issues in heavy and light-tailed distributions, we switch to Adaptive Rejection Sampling (ARS) for the tails, specifically for the proposals created by the user. Fortunately, since ARS is used only for the tails, it eliminates the requirement for log concavity in most distributions, allowing us to cover a wide range of distributions that have log-concave tails such as normal multimodal distrebutions.
To use the `stors` package, we need to differentiate between two phases. The first phase involves building the proposal for the proposal distribution. The second step is caching this proposal in memory for sampling. This ensures optimum speed performance, as the proposal data is constructed independently from the sampling process. Most R base sampling functions are already covered in the stors package, making sampling straightforward.
## Sampling from Built-in Distrubutions
To sample from any built-in distribution, a proposal for each distribution is pre-optimized using 4091 steps when the package is loaded for the first time. This optimization process constructs the proposal and stores it in the internal R data directory. The proposal is then cached in memory for efficient sampling whenever needed.
You can also visualize the proposal data using the \code{plot()} function to better understand its structure.
Here’s an example of optimizing a proposal for the Normal distribution (note: this process is performed automatically when the package is installed and loaded for the first time):
```{r setup, echo = TRUE}
library(stors)
# Optimize a proposal for the Normal distribution with 4091 steps
proposal <- srnorm_optimize(steps = 4091)
# Visualize the optimized proposal
plot(proposal)
```
With the optimized proposal ready, sampling from a Normal distribution with specific \code{mean} and \code{sd} is straightforward using the \code{srnorm} function.
```{r , echo = TRUE}
# Generating 10 samples from the standard Normal distribution
standard_normal_samples <- srnorm(10, mean = 2, sd = 1)
print(standard_normal_samples)
```
On the other hand, users can optimize a proposal for a specific Normal distribution by calling \code{srnorm_optimize()} and passing the desired \code{mean} and \code{sd} arguments. In this case, a unique proposal will be optimized and cached in memory. To sample from this custom proposal, the user can use the \code{srnorm_custom()} function by simply specifying the sample size.
```{r , echo = TRUE}
# Optimize a custom proposal for N(3, 2)
proposal <- srnorm_optimize(mean = 3, sd = 2, steps = 2000)
# Generate a sample of size 10^3 from N(3, 2)
sample <- srnorm_custom(10^3)
# Visualize the generated sample
hist(sample, main = "Histogram of Samples from N(3, 2)", xlab = "Value", col = "skyblue", border = "white")
```
To sample from a truncated Normal distribution, users need to optimize either the scalable proposal or a custom proposal and specify the truncation bounds using the \code{xl} (lower bound) and \code{xr} (upper bound) arguments.
```{r , echo = TRUE}
# Define the truncation bounds
lower_bound <- -1
upper_bound <- 1
# Optimize the scalable proposal with truncation bounds
proposal <- srnorm_optimize(xl = lower_bound, xr = upper_bound, steps = 4091)
# Generate samples from the truncated standard Normal distribution
sample <- srnorm(10^3)
hist(sample, main = "Truncated Standard Normal (-1, 1)", xlab = "Value", col = "lightblue", border = "white")
# Generate samples from the truncated Normal distribution N(2, 1)
sample <- srnorm(10^3, mean = 2)
hist(sample, main = "Truncated Normal N(2, 1) (-1, 1)", xlab = "Value", col = "lightgreen", border = "white")
```
The same process can be applied to a custom proposal. A custom proposal can be truncated and used for sampling by following this example:
```{r , echo = TRUE}
# Define the truncation bounds
lower_bound <- 2
upper_bound <- 6
# Optimize a custom proposal with truncation bounds and mean = 4
proposal <- srnorm_optimize(mean = 4, xl = lower_bound, xr = upper_bound, steps = 4091)
# Generate samples from the truncated Normal distribution
sample <- srnorm_custom(10^3)
hist(sample, main = "Truncated Custom Normal (2, 6)", xlab = "Value", col = "lightblue", border = "white")
```
The \code{proposal_optimizer} functions not only optimize the proposal but also return a detailed list describing the properties of the optimized proposal. This list provides insights into the proposal's structure and characteristics. You can use the base \code{print()} function to view these details or the \code{plot()} function to visualize the proposal.
```{r , echo = TRUE}
# Optimize a proposal for the Laplace distribution
proposal <- srlaplace_optimize(steps = 4091)
# View the details of the optimized proposal
print(proposal)
# Visualize the proposal
plot(proposal)
```
In cases where the proposal is dense and difficult to visualize in its entirety, you can focus on a specific region by using the \code{plot()} function with the \code{x_min} and \code{x_max} parameters. These parameters allow you to set limits on the x-axis, making it easier to zoom in on areas of interest within the proposal.
```{r , echo = TRUE}
# Optimize a proposal for the Normal distribution
proposal <- srlaplace_optimize(steps = 1020)
# Visualize the entire proposal
plot(proposal, main = "Full Proposal Visualization")
# Focus on a specific region of the proposal (e.g., x-axis range -2 to 2)
plot(proposal, x_min = -1, x_max = 1, main = "Zoomed-In Proposal (-1, 1)")
```
These functionalists of the proposal_optimizers and the subsequent visualization tools provide users with in-depth insights into the characteristics of the generated proposal, enabling them to understand and fine-tune the sampling process.
If the user is not satisfied with the automatically optimized proposal, the proposal_optimizers functions offers several arguments for customization. One such argument is `target_sample_size`, which defaults to 1000. If the user wishes to optimize the proposal for significantly larger or smaller sample sizes, this argument can be altered.
Another important argument is the pre-acceptance threshold `theta`, which essentially is the ratio between the lower and upper linear bounds for each step. This is typically set to a very small number close to zero. The reason is that the cost of rejecting samples, even in heavy-tailed distributions, is usually less than the cost of using alternative exact sampling methods (like Inverse Transform) for dense proposals, as the probability of hitting these inefficient steps is low. However, for very large sample sizes (e.g., 10^9), these inefficient steps might be hit more frequently, leading to an increase in sampling time due to lower efficiency. In such cases, the user can set a higher pre-acceptance threshold using the `theta` parameter.
For users who are less familiar with the target distribution, or want to focus on a specific range of the distribution, the `proposal_range` argument can be used. This parameter limits the step construction to a specified range on the x-axis, offering more control over the proposal structure.
```{r , echo = TRUE}
# Customizing the proposal for a specific sample size
custom_proposal <- srnorm_optimize(target_sample_size = 10000)
print(custom_proposal)
plot(custom_proposal)
# Customizing the proposal with a specific pre-acceptance threshold
custom_proposal_high_theta <- srnorm_optimize(theta = 0.9)
print(custom_proposal_high_theta)
plot(custom_proposal_high_theta)
# Customizing the proposal within a specific range
custom_proposal_range <- srnorm_optimize(proposal_range = c(-1, 1))
print(custom_proposal_range)
plot(custom_proposal_range)
```
In addition to the `target_sample_size`, `theta`, and `proposal_range` parameters, users have the flexibility to directly control the number of steps in the proposal. This can be particularly useful when users have specific requirements or preferences for the granularity of the proposal. To specify the number of steps, the `steps` argument can be passed to the proposal_optimizer function.
By setting the `steps` parameter, users can determine the resolution of the proposal, which can impact the balance between computational efficiency and the granularity of the sampling. A higher number of steps typically provides a more detailed proposal, potentially leading to more precise sampling, but at the cost of increased memory usage and possibly longer computation times.
```{r , echo = TRUE}
# Customizing the proposal with a specific number of steps
custom_proposal_steps <- srnorm_optimize(steps = 50)
plot(custom_proposal_steps)
custom_proposal_steps <- srnorm_optimize(steps = 500)
plot(custom_proposal_steps)
```
The user can also create a symmetric proposal for built-in distributions by setting the \code{symmetric} argument to the value around which the target density is symmetric.
Example: Creating a Symmetric Proposal for the Standard Normal Distribution
In this example, we create a symmetric proposal for the standard Normal distribution. Before doing so, we first delete all previously constructed non-symmetric proposals (both scaled and custom) using the \code{delete_build_in_proposal()} function.
```{r, echo = FALSE}
# Delete all non-symmetric proposals for the Normal distribution
delete_build_in_proposal(sampling_function = "srnorm", proposal_type = "scaled")
delete_build_in_proposal(sampling_function = "srnorm", proposal_type = "custom")
# Create a symmetric proposal for the standard Normal distribution
proposal <- srnorm_optimize(symmetric = TRUE, steps = 2040)
# Visualize the symmetric proposal
plot(proposal)
```
Sampling from Truncated Distributions
One of the standout features of STORS is its ability to sample efficiently from truncated distributions. To sample from a truncated distribution, the user simply needs to specify the upper and lower truncation bounds using the \code{xl} (lower bound) and \code{xr} (upper bound) arguments in the \code{proposal_optimizer} function.
Example: Sampling from a Truncated Standard Normal Distribution
In this example, we demonstrate how to sample from a truncated standard Normal distribution with truncation bounds at \code{-1} and \code{1}.
```{r, echo = FALSE}
# Delete symmetric proposals for the Normal distribution
delete_build_in_proposal(sampling_function = "srnorm", proposal_type = "scaled")
# Optimize the proposal for a truncated standard Normal distribution
proposal <- srnorm_optimize(mean = 0, sd = 1, xl = -1, xr = 1, steps = 4020)
# Visualize the truncated proposal
plot(proposal, main = "Truncated Standard Normal Proposal (-1, 1)", xlab = "Value", col = "lightblue")
# Generate 10^4 samples from the truncated distribution
sample <- srnorm_custom(10^4)
# Visualize the generated samples
hist(sample, main = "Histogram of Samples from Truncated Standard Normal", xlab = "Value", col = "lightgreen", border = "white")
```
There are many built-in functions in stors, the list below shows all available functions that can be used directly by the user after optimizing the proposals:
```{r, echo = FALSE}
funcs <- character()
for (name in names(stors:::built_in_proposals)) {
funcs <- message(funcs, " ", name)
}
```
For each built-in distribution, there is a corresponding proposal distribution optimizer that can be invoked using the format \code{"sampling_function_name"_optimize()}. This function optimizes the proposal for the specified distribution, enabling efficient sampling.
## Sampling from Users' Distributions
For distributions not covered by the built-in functions and that satisfy the log concavity condition, users have the flexibility to create their own proposals using the `build_proposal()` function. This function is designed to handle a variety of distributions by accepting a range of parameters tailored to the specific needs of the distribution being sampled from.
The following example demonstrates the process of sampling from a bimodal normal distribution. Let's assume that the user wants to sample from a bimodal distribution, which is a combination of two normal distributions $f(x) = 0.5 w_1(x) + 0.5 w_2(x)$, where $w_1(x) \sim N(0, 1)$ and $w_2(x) \sim N(4, 1)$:
$$
f(x) = 0.5 \frac{1}{\sqrt{2 \pi}} \exp\left(-\frac{x^2}{2}\right) + 0.5 \frac{1}{\sqrt{2 \pi}} \exp\left(-\frac{(x - 4)^2}{2}\right)
$$
To sample from the tails of $f(x)$ using ARS, we need the log transformation of $f(x)$, $h(x) = log(f(x))$, and its first derivative $h_\text{prime}(x) = \frac{d}{dx} h(x)$. Also, the modes of $f(x)$ are required to build the steps part of the proposal distribution, which in this case are 0.00134865 and 3.99865.
```{r , echo = TRUE}
modes_bi <- c(0.00134865, 3.99865)
f_bi <- function(x) {
0.5 * (sqrt(2 * pi))^(-1) * exp(-(x^2) / 2) + 0.5 * (sqrt(2 * pi))^(-1) * exp(-((x - 4)^2) / 2)
}
h_bi <- function(x) log(f_bi(x))
h_prime_bi <- function(x) {
(-(exp(-1 / 2 * (-4 + x)^2) * 0.5 * (-4 + x)) / sqrt(2 * pi) - (exp(-x^2 / 2) * 0.5 * x) / sqrt(2 * pi)) / ((exp(-x^2 / 2) * 0.5) / sqrt(2 * pi) + (exp(-1 / 2 * (-4 + x)^2) * 0.5) / sqrt(2 * pi))
}
bi_proposal <- build_proposal(lower = -Inf, upper = Inf, modes = modes_bi, f = f_bi, h = h_bi, h_prime = h_prime_bi)
plot(bi_proposal)
```
After building the desired proposal with `build_proposal()`, users can explore its properties using the `print()` function and visualize its structure using the `plot()` function. This step is similar to what was done previously with the built-in proposal and helps in understanding the proposal's configuration and ensuring that it aligns with the requirements of the target distribution.
```{r , echo = TRUE}
# Printing the properties of the custom proposal
print(bi_proposal)
# Visualizing the custom proposal
plot(bi_proposal)
```
Once the proposal is configured and validated, it can be passed to the `build_sampler()` function. `build_sampler()` caches the proposal in memory, providing fast access for sampling, and returns a sampler function for the target distribution. This sampler function can then be used to generate samples as needed.
```{r , echo = TRUE}
# Creating a sampler function using the custom proposal
bi_sampler <- build_sampler(bi_proposal)
plot(bi_proposal)
# Generating samples from the target distribution
sample <- bi_sampler(10^3)
hist(sample, main = "Sample Generated From Multi-modal Distribution ", xlab = "Value", col = "lightblue", border = "white")
```
Sampling from a Truncated User-Defined Distribution
To sample from a truncated user-defined distribution, simply provide the lower and upper truncation bounds using the \code{lower} (lower bound) and \code{upper} (upper bound) arguments. Below, we demonstrate this process using a previously created multi-modal distribution.
Example: Sampling from a Truncated Multi-Modal Distribution
```{r , echo = TRUE}
# Define modes for the multi-modal distribution
modes_bi <- c(0.00134865, 3.99865)
# Build a truncated proposal for the multi-modal distribution
bi_trunc_proposal <- build_proposal(
lower = -1,
upper = 6,
modes = modes_bi,
f = f_bi,
h = h_bi,
h_prime = h_prime_bi,
steps = 2040
)
# Visualize the truncated proposal
plot(bi_trunc_proposal, main = "Truncated Multi-Modal Proposal (-1, 6)", xlab = "Value", col = "skyblue")
# Create a sampler for the truncated distribution
bi_sampler_trunc_sampler <- build_sampler(bi_trunc_proposal)
# Generate 10^3 samples from the truncated distribution
sample <- bi_sampler_trunc_sampler(10^3)
# Generate 10^3 samples from the truncated distribution
sample <- bi_sampler_trunc_sampler(10^3)
# Visualize the generated samples
hist(
sample,
main = "Histogram of Samples from Truncated Multi-Modal Distribution",
xlab = "Value",
col = "lightgreen",
border = "white"
)
```
## Saving, Loading and Deleting Users' Proposals
The proposal stored in `bi_proposal` will not be preserved across sessions. Therefore, if the user is satisfied with the sampling results and wishes to store the proposal for future use, they should save the proposal using the `save_proposal()` function. This function takes the optimized proposal and a filename chosen by the user to save the proposal in the internal R data directory on their machine. This is recommended if the user plans to use the proposal frequently across sessions, to avoid re-optimizing the proposal in future sessions.
```{r , echo = TRUE}
save_proposal(bi_proposal, "bimodal_proposal")
```
```{r , include = FALSE}
for (i in 1:3) {
save_proposal(bi_proposal, paste0("bimodal_proposal_", i))
}
```
To load and use the proposal in future sessions, the user only needs to use the `load_proposal()` function.
```{r , echo = TRUE}
bi_proposal <- load_proposal("bimodal_proposal")
```
In case the user wants to view a list of all previously stored proposals, the `print_proposals()` function will provide the user with all stored proposals.
```{r , echo = TRUE}
print_proposals()
```
Be aware that the saved proposal can be quite large in size, depending on the user's system. Therefore, if the user wants to delete a proposal, the `delete_proposal()` function can be used by providing the proposal's name.
```{r , echo = TRUE}
delete_proposal("bimodal_proposal")
print_proposals()
```
```{r , include = FALSE}
srnorm_optimize() # optimize the scaled non-symmetric proposal, since we have deleted it earlier
for (i in 1:3) {
delete_proposal(paste0("bimodal_proposal_", i))
}
for (name in names(stors:::built_in_proposals)) {
fun_name <- paste0(name, "_optimize")
do.call(fun_name, list(steps = 4091))
}
```