The INCLUDE command allows the contents of one DAG file to be parsed as if they were physically included in the referencing DAG file. The syntax for INCLUDE is
For example, if we have two DAG files like this:
# File name: foo.dag JOB A A.sub INCLUDE bar.dag
# File name: bar.dag JOB B B.sub JOB C C.sub
this is equivalent to the single DAG file:
JOB A A.sub JOB B B.sub JOB C C.sub
Note that the included file must be in proper DAG syntax. Also, there are many cases where a valid included DAG file will cause a parse error, such as the included files defining nodes with the same name.
INCLUDEs can be nested to any depth (be sure not to create a cycle of includes!).
Example: Using INCLUDE to simplify multiple similar workflows
One use of the INCLUDE command is to simplify the DAG files when we have a single workflow that we want to run on a number of data sets. In that case, we can do something like this:
# File name: workflow.dag # Defines the structure of the workflow JOB Split split.sub JOB Process00 process.sub ... JOB Process99 process.sub JOB Combine combine.sub PARENT Split CHILD Process00 ... Process99 PARENT Process00 ... Process99 CHILD Combine
# File name: split.sub executable = my_split input = $(dataset).phase1 output = $(dataset).phase2 ...
# File name: data57.vars VARS Split dataset="data57" VARS Process00 dataset="data57" ... VARS Process99 dataset="data57" VARS Combine dataset="data57"
# File name: run_dataset57.dag INCLUDE workflow.dag INCLUDE data57.vars
Then, to run our workflow on dataset 57, we run the following command:
$ condor_submit_dag run_dataset57.dag
This avoids having to duplicate the JOB and PARENT/CHILD commands
for every dataset - we can just re-use the
workflow.dag file, in
combination with a dataset-specific vars file.