INCLUDE
The INCLUDE command allows the contents of one DAG file to be parsed as if they were physically included in the referencing DAG file. The syntax for INCLUDE is
INCLUDE FileName
For example, if we have two DAG files like this:
# File name: foo.dag
JOB A A.sub
INCLUDE bar.dag
# File name: bar.dag
JOB B B.sub
JOB C C.sub
this is equivalent to the single DAG file:
JOB A A.sub
JOB B B.sub
JOB C C.sub
Note that the included file must be in proper DAG syntax. Also, there are many cases where a valid included DAG file will cause a parse error, such as the included files defining nodes with the same name.
INCLUDEs can be nested to any depth (be sure not to create a cycle of includes!).
Example: Using INCLUDE to simplify multiple similar workflows
One use of the INCLUDE command is to simplify the DAG files when we have a single workflow that we want to run on a number of data sets. In that case, we can do something like this:
# File name: workflow.dag
# Defines the structure of the workflow
JOB Split split.sub
JOB Process00 process.sub
...
JOB Process99 process.sub
JOB Combine combine.sub
PARENT Split CHILD Process00 ... Process99
PARENT Process00 ... Process99 CHILD Combine
# File name: split.sub
executable = my_split
input = $(dataset).phase1
output = $(dataset).phase2
...
# File name: data57.vars
VARS Split dataset="data57"
VARS Process00 dataset="data57"
...
VARS Process99 dataset="data57"
VARS Combine dataset="data57"
# File name: run_dataset57.dag
INCLUDE workflow.dag
INCLUDE data57.vars
Then, to run our workflow on dataset 57, we run the following command:
$ condor_submit_dag run_dataset57.dag
This avoids having to duplicate the JOB and PARENT/CHILD commands
for every dataset - we can just re-use the workflow.dag
file, in
combination with a dataset-specific vars file.