rsyncrosim
: introduction to
pipelinesvignettes/a03_rsyncrosim_vignette_pipelines.Rmd
a03_rsyncrosim_vignette_pipelines.Rmd
This vignette will cover how to implement model pipelines using the
rsyncrosim
package within the
SyncroSim software
framework. For an overview of
SyncroSim and
rsyncrosim
,
as well as a basic usage tutorial for rsyncrosim
, see the
Introduction
to rsyncrosim
vignette. To learn how to use iterations
in the rsyncrosim
interface, see the
rsyncrosim
:
introduction to uncertainty vignette.
helloworldPipeline
To demonstrate how to link models in a pipeline using the
rsyncrosim
interface, we will need the
helloworldPipeline
SyncroSim package. helloworldPipeline
was designed to be a
simple package to introduce pipelines to SyncroSim modeling workflows.
Models (i.e. Transformers) connected by pipelines allow the user to
implement multiple Transformers in a modeling workflow and access
intermediate outputs of a Transformer without having to create multiple
Scenarios.
The package takes from the user 3 inputs, mMean, mSD, and b. For each iteration, a value m, representing the slope, is sampled from a normal distribution with mean of mMean and standard deviation of mSD. The b value represents the intercept. In the first model in the pipeline, these input values are run through a linear model, y=mt+b, where t is time, and the y value is returned as output. The second model takes y as input and calculates the cumulative sum of y over time, returning a new variable yCum as output.
For more details on the different features of the
helloworldPipeline
SyncroSim package, consult the SyncroSim
Enhancing
a Package: Linking Models tutorial.
Before using rsyncrosim
you will first need to
download and
install the SyncroSim software. Versions of SyncroSim exist for both
Windows and Linux.
You will need to install the rsyncrosim
R package,
either using
CRAN or from
the rsyncrosim
GitHub
repository. Versions of rsyncrosim
are available for
both Windows and Linux.
In a new R script, load the rsyncrosim
package.
# Load R package for working with SyncroSim
library(rsyncrosim)
## Warning: package 'rsyncrosim' was built under R version 4.3.3
session()
Finish setting up the R environment for the rsyncrosim
workflow by creating a SyncroSim Session object. Use the
session()
function to connect R to your installed copy of
the SyncroSim software.
mySession <- session("path/to/install_folder") # Create a Session based SyncroSim install folder
mySession <- session() # Using default install folder (Windows only)
mySession # Displays the Session object
## class : Session
## filepath [character]: C:/Program Files/SyncroSim
## silent [logical] : TRUE
## printCmd [logical] : FALSE
## condaFilepath [NULL]:
Use the version()
function to ensure you are using the
latest version of SyncroSim.
version(mySession)
## [1] "2.5.11"
addPackage()
Install helloworldPipeline
using the
rynscrosim
function addPackage()
. This
function takes a package name as input and then queries the SyncroSim
package server for the specified package.
# Install helloworldPipeline
addPackage("helloworldPipeline")
## Package <helloworldPipeline> installed
helloworldPipeline
should now be included in the package
list returned by the package()
function in
rsyncrosim
:
# Get list of installed packages
package()
## name description version
## 1 helloworldPipeline Example demonstrating how to use pipelines 1.1.0
## location status
## 1 C:\\Users\\sarah\\SyncroSim\\Packages\\helloworldPipeline OK
When creating a new modeling workflow from scratch, we need to create objects of the following scopes:
For more information on these scopes, see the Introduction
to rsyncrosim
vignette.
# Create a new Library
myLibrary <- ssimLibrary(name = "helloworldLibrary.ssim",
session = mySession,
package = "helloworldPipeline",
overwrite = TRUE)
# Open the default Project
myProject = project(ssimObject = myLibrary, project = "Definitions")
# Create a new Scenario (associated with the default Project)
myScenario = scenario(ssimObject = myProject, scenario = "My first scenario")
datasheet()
View the Datasheets associated with your new Scenario using the
datasheet()
function from rsyncrosim
.
# View all Datasheets associated with a Library, Project, or Scenario
datasheet(myScenario)
## scope name displayName
## 1 library core_Backup Backup
## 2 library core_CondaConfig Conda Configuration
## 3 library core_JlConfig Julia Configuration
## 4 library core_LNGPackage Last Known Good Packages
## 5 library core_Multiprocessing Multiprocessing
## 6 library core_Options Options
## 7 library core_ProcessorGroupOption Processor Group Options
## 8 library core_ProcessorGroupValue Processor Group Values
## 9 library core_PyConfig Python Configuration
## 10 library core_RConfig R Configuration
## 11 library core_Settings Settings
## 12 library core_SysFolder Folders
## 13 library corestime_Options Spatial Options
## 14 project core_AutoGenTag Auto Generation Tags
## 15 project core_RunSchedulerOption Run Scheduler Options
## 16 project core_RunSchedulerScenario Run Scheduler Scenarios
## 17 project core_StageName Stage Groups
## 18 project core_StageValue Stages by Group
## 19 project core_Transformer Stages
## 20 project corestime_Charts Charts
## 21 project corestime_DistributionType Distributions
## 22 project corestime_ExternalVariableType External Variables
## 23 project corestime_MapFacet Map Faceting
## 24 project corestime_Maps Maps
## 25 scenario core_AutoGenTagValue Auto Generation Tag Values
## 26 scenario core_Pipeline Pipeline
## 27 scenario corestime_DistributionValue Distributions
## 28 scenario corestime_External External
## 29 scenario corestime_ExternalVariableValue External Variables
## 30 scenario corestime_Multiprocessing Spatial Multiprocessing
## 31 scenario helloworldPipeline_InputDatasheet InputDatasheet
## 32 scenario helloworldPipeline_IntermediateDatasheet IntermediateDatasheet
## 33 scenario helloworldPipeline_OutputDatasheet OutputDatasheet
## 34 scenario helloworldPipeline_RunControl Run Control
From the list of Datasheets above, we can see that there are four
Datasheets specific to the helloworldPipeline
package,
including an Input Datasheet, an Intermediate Datasheet, and an Output
Datasheet. These three Datasheets are connected by Transformers. The
values from the Input Datasheet are used as the input for the first
Transformer, which transforms the input data to output data
through a series of model calculations. The output data from the first
Transformer is contained within the Intermediate Datasheet. The values
from the Intermediate Datasheet are then used as input for the second
Transformer. The output from the second Transformer is stored in the
Output Datasheet.
datasheet()
and
addRow()
Currently our input Scenario Datasheets are empty! We need to add
some values to our input Datasheet (InputDatasheet
) and Run
Control Datasheet (RunControl
) so we can run our model.
Since this package also uses pipelines, we also need to add some
information to the core Pipeline
Datasheet to specify which
Transformers are run in which order.
Input Datasheet
First, assign the contents of the input Datasheet to a new data frame
variable using datasheet()
, then check the columns that
need input values.
# Load input Datasheet to a new R data frame
myInputDataframe <- datasheet(myScenario,
name = "helloworldPipeline_InputDatasheet")
# Check the columns of the input data frame
str(myInputDataframe)
## 'data.frame': 0 obs. of 3 variables:
## $ mMean: num
## $ mSD : num
## $ b : num
The input Datasheet requires three values:
mMean
: the mean of the slope normal distribution.mSD
: the standard deviation of the slope normal
distribution.b
: the intercept of the linear equation.Add these values to a new data frame, then use the
addRow()
function from rsyncrosim
to update
the input data frame
# Create input data and add it to the input data frame
myInputRow <- data.frame(mMean = 2, mSD = 4, b = 3)
myInputDataframe <- addRow(myInputDataframe, myInputRow)
# Check values
myInputDataframe
## mMean mSD b
## 1 2 4 3
Finally, save the updated R data frame to a SyncroSim Datasheet using
saveDatasheet()
.
# Save input R data frame to a SyncroSim Datasheet
saveDatasheet(ssimObject = myScenario, data = myInputDataframe,
name = "helloworldPipeline_InputDatasheet")
## Datasheet <helloworldPipeline_InputDatasheet> saved
RunControl Datasheet
The RunControl
Datasheet provides information about how
many time steps and iterations to use in the model. Here, we set the
number of iterations, as well as the minimum and maximum time steps for
our model. Let’s take a look at the columns that need input values.
# Load RunControl Datasheet to a new R data frame
runSettings <- datasheet(myScenario, name = "helloworldPipeline_RunControl")
# Check the columns of the RunControl data frame
str(runSettings)
## 'data.frame': 0 obs. of 3 variables:
## $ MaximumIteration: num
## $ MinimumTimestep : num
## $ MaximumTimestep : num
The RunControl Datasheet requires the following 3 columns:
MaximumIteration
: total number of iterations to run
the model for.MinimumTimestep
: the starting time point of the
simulation.MaximumTimestep
: the end time point of the
simulation.Note: A fourth hidden column, MinimumIteration
,
also exists in the RunControl Datasheet (default=1).
We’ll add this information to a new data frame and then add it to the
Run Control data frame using addRow()
.
# Create RunControl data and add it to the RunControl data frame
runSettingsRow <- data.frame(MaximumIteration = 5,
MinimumTimestep = 1,
MaximumTimestep = 10)
runSettings <- addRow(runSettings, runSettingsRow)
# Check values
runSettings
## MaximumIteration MinimumTimestep MaximumTimestep
## 1 5 1 10
Finally, save the R data frame to a SyncroSim Datasheet using
saveDatasheet()
.
# Save RunControl R data frame to a SyncroSim Datasheet
saveDatasheet(ssimObject = myScenario, data = runSettings,
name = "helloworldPipeline_RunControl")
## Datasheet <helloworldPipeline_RunControl> saved
Pipeline Datasheet
We must modify a third Datasheet to be able to use the output of one
Transformer as the input of a second Transformer. To implement pipelines
in our package, we need to specify the order in which to run the
Transformers in our pipeline by editing the Pipeline
Datasheet. The Pipeline
Datasheet is part of the built-in
SyncroSim core, so we access it using the “core_” prefix with the
datasheet()
function. From viewing the structure of the
Pipeline
Datasheet we know that the
StageNameID
is a factor with two levels: “First Model” and
“Second Model”. We will set the data for this Datasheet such that “First
Model” is run first, then “Second Model”. This way, the output from
“First Model” is used as the input for “Second Model”.
# Load Pipeline Datasheet to a new R data frame
myPipelineDataframe <- datasheet(myScenario, name = "core_Pipeline")
# Check the columns of the Pipeline data frame
str(myPipelineDataframe)
# Create Pipeline data and add it to the Pipeline data frame
myPipelineRow <- data.frame(StageNameID = c("First Model", "Second Model"),
RunOrder = c(1, 2))
myPipelineDataframe <- addRow(myPipelineDataframe, myPipelineRow)
# Check values
myPipelineDataframe
# Save Pipeline R data frame to a SyncroSim Datasheet
saveDatasheet(ssimObject = myScenario, data = myPipelineDataframe,
name = "core_Pipeline")
run()
We will now run our Scenario using the run()
function in
rsyncrosim
. If we have a large modeling workflow and we
want to parallelize the run using multiprocessing, we can set the
jobs
argument to be a value greater than one.
# Run the first Scenario we created
myResultScenario <- run(myScenario, jobs = 5)
## [1] "Running scenario [1] My first scenario"
Once the run is complete, we can compare the original Scenario to the
Results Scenario to see which Datasheets have been modified. Using the
datasheet()
function with the optional
argument set to TRUE
, we see that data has been added to
both the Intermediate and Output Datasheets after running the Scenario
(see data
column below).
# Datasheets for original Scenario
datasheet(myScenario, optional = TRUE)
## scope package name
## 3 library core core_Backup
## 4 library core core_CondaConfig
## 5 library core core_JlConfig
## 6 library core core_LNGPackage
## 7 library core core_Multiprocessing
## 8 library core core_Options
## 10 library core core_ProcessorGroupOption
## 11 library core core_ProcessorGroupValue
## 12 library core core_PyConfig
## 13 library core core_RConfig
## 16 library core core_Settings
## 19 library core core_SysFolder
## 30 library corestime corestime_Options
## 1 project core core_AutoGenTag
## 14 project core core_RunSchedulerOption
## 15 project core core_RunSchedulerScenario
## 17 project core core_StageName
## 18 project core core_StageValue
## 20 project core core_Transformer
## 21 project corestime corestime_Charts
## 22 project corestime corestime_DistributionType
## 25 project corestime corestime_ExternalVariableType
## 27 project corestime corestime_MapFacet
## 28 project corestime corestime_Maps
## 2 scenario core core_AutoGenTagValue
## 9 scenario core core_Pipeline
## 23 scenario corestime corestime_DistributionValue
## 24 scenario corestime corestime_External
## 26 scenario corestime corestime_ExternalVariableValue
## 29 scenario corestime corestime_Multiprocessing
## 31 scenario helloworldPipeline helloworldPipeline_InputDatasheet
## 32 scenario helloworldPipeline helloworldPipeline_IntermediateDatasheet
## 33 scenario helloworldPipeline helloworldPipeline_OutputDatasheet
## 34 scenario helloworldPipeline helloworldPipeline_RunControl
## displayName isSingle isOutput displayMember data
## 3 Backup TRUE FALSE N/A NA
## 4 Conda Configuration TRUE FALSE N/A NA
## 5 Julia Configuration TRUE FALSE N/A NA
## 6 Last Known Good Packages FALSE FALSE N/A NA
## 7 Multiprocessing TRUE FALSE N/A NA
## 8 Options TRUE FALSE N/A NA
## 10 Processor Group Options TRUE FALSE N/A NA
## 11 Processor Group Values FALSE FALSE N/A NA
## 12 Python Configuration TRUE FALSE N/A NA
## 13 R Configuration TRUE FALSE N/A NA
## 16 Settings FALSE FALSE N/A NA
## 19 Folders TRUE FALSE N/A NA
## 30 Spatial Options TRUE FALSE N/A NA
## 1 Auto Generation Tags FALSE FALSE Name NA
## 14 Run Scheduler Options TRUE FALSE N/A NA
## 15 Run Scheduler Scenarios FALSE FALSE N/A NA
## 17 Stage Groups FALSE FALSE Name NA
## 18 Stages by Group FALSE FALSE StageValueID NA
## 20 Stages FALSE FALSE TransformerDisplayName NA
## 21 Charts FALSE FALSE Name NA
## 22 Distributions FALSE FALSE Name NA
## 25 External Variables FALSE FALSE Name NA
## 27 Map Faceting FALSE FALSE N/A NA
## 28 Maps FALSE FALSE Name NA
## 2 Auto Generation Tag Values FALSE FALSE N/A FALSE
## 9 Pipeline FALSE FALSE N/A FALSE
## 23 Distributions FALSE FALSE N/A FALSE
## 24 External TRUE FALSE N/A FALSE
## 26 External Variables FALSE FALSE N/A FALSE
## 29 Spatial Multiprocessing TRUE FALSE N/A FALSE
## 31 InputDatasheet TRUE FALSE N/A TRUE
## 32 IntermediateDatasheet FALSE FALSE N/A FALSE
## 33 OutputDatasheet FALSE FALSE N/A FALSE
## 34 Run Control TRUE FALSE N/A TRUE
## scenario
## 3 NA
## 4 NA
## 5 NA
## 6 NA
## 7 NA
## 8 NA
## 10 NA
## 11 NA
## 12 NA
## 13 NA
## 16 NA
## 19 NA
## 30 NA
## 1 NA
## 14 NA
## 15 NA
## 17 NA
## 18 NA
## 20 NA
## 21 NA
## 22 NA
## 25 NA
## 27 NA
## 28 NA
## 2 1
## 9 1
## 23 1
## 24 1
## 26 1
## 29 1
## 31 1
## 32 1
## 33 1
## 34 1
# Datasheets for Results Scenario
datasheet(myResultScenario, optional = TRUE)
## scope package name
## 3 library core core_Backup
## 4 library core core_CondaConfig
## 5 library core core_JlConfig
## 6 library core core_LNGPackage
## 7 library core core_Multiprocessing
## 8 library core core_Options
## 10 library core core_ProcessorGroupOption
## 11 library core core_ProcessorGroupValue
## 12 library core core_PyConfig
## 13 library core core_RConfig
## 16 library core core_Settings
## 19 library core core_SysFolder
## 30 library corestime corestime_Options
## 1 project core core_AutoGenTag
## 14 project core core_RunSchedulerOption
## 15 project core core_RunSchedulerScenario
## 17 project core core_StageName
## 18 project core core_StageValue
## 20 project core core_Transformer
## 21 project corestime corestime_Charts
## 22 project corestime corestime_DistributionType
## 25 project corestime corestime_ExternalVariableType
## 27 project corestime corestime_MapFacet
## 28 project corestime corestime_Maps
## 2 scenario core core_AutoGenTagValue
## 9 scenario core core_Pipeline
## 23 scenario corestime corestime_DistributionValue
## 24 scenario corestime corestime_External
## 26 scenario corestime corestime_ExternalVariableValue
## 29 scenario corestime corestime_Multiprocessing
## 31 scenario helloworldPipeline helloworldPipeline_InputDatasheet
## 32 scenario helloworldPipeline helloworldPipeline_IntermediateDatasheet
## 33 scenario helloworldPipeline helloworldPipeline_OutputDatasheet
## 34 scenario helloworldPipeline helloworldPipeline_RunControl
## displayName isSingle isOutput displayMember data
## 3 Backup TRUE FALSE N/A NA
## 4 Conda Configuration TRUE FALSE N/A NA
## 5 Julia Configuration TRUE FALSE N/A NA
## 6 Last Known Good Packages FALSE FALSE N/A NA
## 7 Multiprocessing TRUE FALSE N/A NA
## 8 Options TRUE FALSE N/A NA
## 10 Processor Group Options TRUE FALSE N/A NA
## 11 Processor Group Values FALSE FALSE N/A NA
## 12 Python Configuration TRUE FALSE N/A NA
## 13 R Configuration TRUE FALSE N/A NA
## 16 Settings FALSE FALSE N/A NA
## 19 Folders TRUE FALSE N/A NA
## 30 Spatial Options TRUE FALSE N/A NA
## 1 Auto Generation Tags FALSE FALSE Name NA
## 14 Run Scheduler Options TRUE FALSE N/A NA
## 15 Run Scheduler Scenarios FALSE FALSE N/A NA
## 17 Stage Groups FALSE FALSE Name NA
## 18 Stages by Group FALSE FALSE StageValueID NA
## 20 Stages FALSE FALSE TransformerDisplayName NA
## 21 Charts FALSE FALSE Name NA
## 22 Distributions FALSE FALSE Name NA
## 25 External Variables FALSE FALSE Name NA
## 27 Map Faceting FALSE FALSE N/A NA
## 28 Maps FALSE FALSE Name NA
## 2 Auto Generation Tag Values FALSE FALSE N/A FALSE
## 9 Pipeline FALSE FALSE N/A FALSE
## 23 Distributions FALSE FALSE N/A FALSE
## 24 External TRUE FALSE N/A FALSE
## 26 External Variables FALSE FALSE N/A FALSE
## 29 Spatial Multiprocessing TRUE FALSE N/A FALSE
## 31 InputDatasheet TRUE FALSE N/A TRUE
## 32 IntermediateDatasheet FALSE FALSE N/A TRUE
## 33 OutputDatasheet FALSE FALSE N/A FALSE
## 34 Run Control TRUE FALSE N/A TRUE
## scenario
## 3 NA
## 4 NA
## 5 NA
## 6 NA
## 7 NA
## 8 NA
## 10 NA
## 11 NA
## 12 NA
## 13 NA
## 16 NA
## 19 NA
## 30 NA
## 1 NA
## 14 NA
## 15 NA
## 17 NA
## 18 NA
## 20 NA
## 21 NA
## 22 NA
## 25 NA
## 27 NA
## 28 NA
## 2 1
## 9 1
## 23 1
## 24 1
## 26 1
## 29 1
## 31 1
## 32 1
## 33 1
## 34 1
The next step is to view the output Datasheets added to the Result Scenario when it was run.
datasheet()
First, we will view the intermediate output Datasheet from the
Results Scenario. We can load the result tables using the
datasheet()
function. The Intermediate Datasheet
corresponds to the results from the first model.
# Results of first Scenario
resultsSummary <- datasheet(myResultScenario,
name = "helloworldPipeline_IntermediateDatasheet")
# View results table
head(resultsSummary)
## Iteration Timestep y
## 1 1 1 4.9203
## 2 1 2 6.8406
## 3 1 3 8.7609
## 4 1 4 10.6812
## 5 1 5 12.6015
## 6 1 6 14.5218
We can see that for every timestep in an iteration we have a new value of y corresponding to y=mt+b.
datasheet()
Now, we will view the final output Datasheet from the Results
Scenario. Again, we will use datasheet()
to load the result
table. The Output Datasheet corresponds to the results from the second
model.
# Results of first Scenario
resultsSummary <- datasheet(myResultScenario,
name = "helloworldPipeline_OutputDatasheet")
# View results table
head(resultsSummary)
## [1] Iteration Timestep yCum
## <0 rows> (or 0-length row.names)
We can see for each timestep in an iteration, we have a new value of yCum, representing the cumulative value of y over time.