Optimising the rsims package for Fast Backtesting in R – Part II

This version of cash_backtest takes a long data frame of prices and weights by date, which is a very convenient format for data analysis. We can make such a data frame of randomly generated prices and weights:

# function for generating prices from GBM process
gbm_sim <- function(nsim = 100, t = 25, mu = 0, sigma = 0.1, S0 = 100, dt = 1./365) {
  # matrix of random draws - one for each day for each simulation
  epsilon <- matrix(rnorm(t*nsim), ncol = nsim, nrow = t)
  # get GBM paths
  gbm <- exp((mu - sigma * sigma / 2) * dt + sigma * epsilon * sqrt(dt)) - 1
  # convert to price paths
  gbm <- apply(rbind(rep(S0, nsim), gbm + 1), 2, cumprod)
# generate prices and weights
years <- 20
universe <- 100
x <- 1
tickers <- vector() 
  tickers[[x]] <- paste0(sample(LETTERS, 5, replace = TRUE), collapse = "")
  x <- x + 1
  if(x == universe + 1) 
stopifnot(n_distinct(tickers) == universe)
date <- seq(as.numeric(as.Date("1980-01-01")), as.numeric(as.Date("1980-01-01"))+(years*365))
prices <- cbind(date, gbm_sim(nsim = universe, t = years*365, mu = 0.1, sigma = 0.1))
colnames(prices) <- c("date", tickers)
weights <- cbind(date, rbind(rep(0, universe), matrix(rnorm(years*365*universe), nrow = years*365)))
colnames(weights) <- c("date", tickers)
backtest_df_long <- prices %>% %>% 
  mutate(date = as.Date(date, origin ="1970-01-01")) %>% 
  pivot_longer(-date, names_to = "ticker", values_to = "price") %>% 
    weights %>% %>% 
      mutate(date = as.Date(date, origin ="1970-01-01")) %>% 
      pivot_longer(-date, names_to = "ticker", values_to = "theo_weight"),
    by = c("date", "ticker")
#> # A tibble: 6 x 4
#>   date       ticker price theo_weight
#>   <date>     <chr>  <dbl>       <dbl>
#> 1 1980-01-01 TVEZI    100           0
#> 2 1980-01-01 XIERO    100           0
#> 3 1980-01-01 XGYMU    100           0
#> 4 1980-01-01 PMVPF    100           0
#> 5 1980-01-01 KNCIP    100           0
#> 6 1980-01-01 JEBOY    100           0

backtest_df_long has prices and weights for 100 tickers over 7301 days:

backtest_df_long %>% 
    num_days = n_distinct(date),
    num_tickers = n_distinct(ticker)
#> # A tibble: 1 x 2
#>   num_days num_tickers
#>      <int>       <int>
#> 1     7301         100

To get some insight into how quickly the backtest runs and where the bottlenecks are, profvis is your friend:

profvis({cash_backtest_original(backtest_df_long)}, interval = 0.01)

(To get deeper insight, you can extract the function logic as a series of expressions and pass these directly to profvis – but this shortcut is fine for our purposes)

profvis tells us that cash_backtest_original took about 7.5 seconds to run and that most of the time was spent messing around with data frames:

Optimising the rsims package for Fast Backtesting in R

Stay tuned for the next installment in which Kris will discuss how we can benefit from switching our data frames to matrixes.

