A lot my work recently has focused on using forward simulations to test demographic models. Often times, this requires thousands (or more) of simulations. Here, I’d like to talk about my approach to running these simulations and keeping track of everything.
We use a cluster with the Slurm job queue system and while each cluster has different quirks, the basic ideas are mostly the same. First, we use SLiM to simulate our populations. The basic pipeline is:
slim | msstats sim.stats.$JOBID.txt
Slim uses a configuration file to create its models. This complicates things a little bit. In order to maintain flexibility of our model (i.e. drawing mutation rate from a distribution or from data and setting complex population scenarios) I have created an R script to write a configuration file for slim to read from (can be read here).
This R script has a few features. First, it keeps track of the job ID from the queueing system so that we can keep track of the file names. Next, each numerical value is set independently, allowing us to draw any of them from a distribution.
It also has a function to shape how the population grows over time. For example, in addition to choosing the initial and final population sizes, we can choose how it gets to that final population size, whether it is linear or exponential growth.
After the configuration file is written and saved to disk, slim is called from within R and the simulation starts. The simulation is piped to an awk script that prepares the output for msstats, and the output of msstats is saved with the same job ID.
This approach allows for flexible simulations and makes it easy to perform thousands of them. The pipeline is available on Github (extensive documentation pending): https://github.com/arundurvasula/domestication_sims