xAH_run.py

xAH_run.py is the xAODAnaHelpers macro written fully in python. The goal is to make it easier for a user to spin up an analysis without (potentially) writing any C++ code at all!

Introduction

An analysis job is defined by a few key things: - the files to run over - where to run the code - what algorithms to run

and a few other minor features such as submission directory or how many events to run. Primarily, these three things listed above are all you need to get started. xAH_run.py manages all of these for you.

A configuration file, written in json or python, is used to specify what algorithms to run, and in what order. You pass in a list of files you want to run over to the script itself, as well as where to run the code. It will take care of the rest for you.

Getting Started

To get started, we assume you are little bit familiar with xAODAnaHelpers and AnalysisBase in general. Recall that when you compile a bunch of packages, you generate a namespace under ROOT that all your algorithms are loaded into so that one could create an algorithm by something like ROOT.AlgorithmName() and then start configuring it. In fact, this is how one normally does it within python. Namespaces are automatically linked up by something like ROOT.Namespace.AlgorithmName() in case you wrapped the entire algorithm in a namespace.

A simple plotting example

To get started, let’s just ask a simple question: “How can I make plots of Anti-Kt, R=0.4, LC-calibrated jets?” Let’s assume xAODAnaHelpers has already been checked out and everything is compiled. We only need to know the three key things.

What algorithms to run

We will run 2 algorithms. First is BasicEventSelection to filter/clean events. The second is JetHistsAlgo which will allow us to plot the jets we want. So start with the template JSON file:

[
  { "class": "BasicEventSelection",
    "configs": {
    }
  },
  {
    "class": "JetHistsAlgo",
    "configs": {
    }
  }
]

This gets us started. We make a list of algorithms that we want to run, this list is considered sorted. Each list contains a dictionary object, one which defines the class to run and another which defines a dictionary of configurations to pass into that algorithm. An equivalent script in python looks like

from xAODAnaHelpers import Config
c = Config()

c.setalg("BasicEventSelection", {})
c.setalg("JetHistsAlgo", {})

Next, we should probably add some obvious configurations that work for us. I look up the header files of each and decide to flesh it out as below:

[
  { "class": "BasicEventSelection",
    "configs": {
      "m_truthLevelOnly": false,
      "m_applyGRLCut": true,
      "m_GRLxml": "$ROOTCOREBIN/data/xAODAnaHelpers/data12_8TeV.periodAllYear_DetStatus-v61-pro14-02_DQDefects-00-01-00_PHYS_StandardGRL_All_Good.xml",
      "m_doPUreweighting": false,
      "m_vertexContainerName": "PrimaryVertices",
      "m_PVNTrack": 2,
      "m_name": "myBaseEventSel"
    }
  },
  {
    "class": "JetHistsAlgo",
    "configs": {
      "m_inContainerName": "AntiKt4EMTopoJets",
      "m_detailStr": "kinematic",
      "m_name": "NoPreSel"
    }
  }
]

and I save this into xah_run_example.json. If you want more variables in your plots, add other possibilities in the detailStr field, separated by a space. Equivalently in python

from xAODAnaHelpers import Config
c = Config()

c.setalg("BasicEventSelection", {"m_truthLevelOnly": false,
                                 "m_applyGRLCut": true,
                                 "m_GRLxml": "$ROOTCOREBIN/data/xAODAnaHelpers/data12_8TeV.periodAllYear_DetStatus-v61-pro14-02_DQDefects-00-01-00_PHYS_StandardGRL_All_Good.xml",
                                 "m_doPUreweighting": false,
                                 "m_vertexContainerName": "PrimaryVertices",
                                 "m_PVNTrack": 2,
                                 "m_name": "myBaseEventSel"})
c.setalg("JetHistsAlgo", {"m_inContainerName": "AntiKt4EMTopoJets",
                          "m_detailStr": "kinematic",
                          "m_name": "NoPreSel"})

The similarity is on purpose, to make it incredibly easy to switch back and forth between the two formats.

Running the script

I pretty much have everything I need to work with. So, I run the following command

xAH_run.py --files file1.root file2.root --config xah_run_example.json direct

which will run over two ROOT files locally (direct), using the configuration we made. Running with the python form of the configuration is just as easy

xAH_run.py --files file1.root file2.root --config xah_run_example.py direct

We’re all done! That was easy :beers: .

Configuring Samples

Sample configuration can be done with a python script like so

from xAODAnaHelpers import Config
c = Config()

c.sample(410000, foo='bar', hello='world')
c.sample("p9495", foo='bar', hello='world', b=1, c=2.0, d=True)

where the pattern specified in Config::sample will be searched for inside the name of the dataset (not the name of the file!). Specifically, we just do something like if pattern in sample.name() in order to flag that sample. Given this, you can make this pattern generic enough to apply a configuration to a specific p-tag, or to a specific dataset ID (DSID) as well. The above will produce the following output when running

[WARNING]  No matching sample found for pattern 410000
[INFO   ]  Setting sample metadata for example.sample.p9495.root
[INFO   ]       - sample.meta().setDouble(c, 2.0)
[INFO   ]       - sample.meta().setString(foo, bar)
[INFO   ]       - sample.meta().setInteger(b, 1)
[INFO   ]       - sample.meta().setString(hello, world)
[INFO   ]       - sample.meta().setBool(d, True)

which should make it easy for you to understand what options are being set and for which sample.

Configuration Details

As mentioned previous, there are multiple facets to xAH_run.py. The below details the configurations that are possible for the script itself, not for the algorithms you use. For details on what can be configured, look up the header files of the algorithms themselves.

For everything listed below, the script contains all this information and is self-documenting. Simply type

xAH_run.py -h

to see all the help information.

Note

The {driver} option tells the script where to run the code. There are lots of supported drivers and more can be added if you request it. For more information, you can type xAH_run.py -h drivers of available drivers.

API Reference

Note

If you are using a CMake-based release, or you have argcomplete in your python environment, you can enable automatic completion of the options. For example, running something like this:

eval “$(register-python-argcomplete xAH_run.py)”

Spin up an analysis instantly!

usage: xAH_run.py --files ... file [file ...]
                  --config path/to/file.json
                  [options]
                  driver [driver options]
Options:
-h, --help show this help message and exit. You can also pass in the name of a subsection.
--files input file(s) to read. This gives all the input files for the script to use. Depending on the other options specified, these could be DQ2 sample names, local paths, or text files containing a list of filenames/paths.
--config configuration for the algorithms. This tells the script which algorithms to load, configure, run, and in which order. Without it, it becomes a headless chicken.
--submitDir=submitDir
 Output directory to store the output.
--nevents=0 Number of events to process for all datasets. (0 = no limit)
--skip=0 Number of events to skip at start for all datasets. (0 = no limit)
-f=False, --force=False
 Overwrite previous directory if it exists.
--version 01-00-00
--mode=class

run using class access mode, branch access mode, or athena access mode

class access mode or branch access mode

Possible choices: class, branch, athena

--treeName=CollectionTree
 Tree Name to run on
--isMC=False Running MC
--isAFII=False Running on AFII
--extraOptions=
 Pass in extra options straight into the python config file. These can be accessed by using argparse: `parser.parse_args(shlex.split(args.extra_options))`.
--inputList=False
 If enabled, will read in a text file containing a list of paths/filenames.
--inputTag= A wildcarded name of input files to run on.
--inputDQ2=False
 [DEPRECATION] Use inputRucio instead.
--inputRucio=False
 If enabled, will search using Rucio. Can be combined with `–inputList`.
--inputEOS=False
 If enabled, will search using EOS. Can be combined with `–inputList and inputTag`.
--inputSH=False
 If enabled, will assume the input file is a directory of ROOT files of saved SH instances to use. Call SH::SampleHandler::load() on it.
--scanXRD=False
 If enabled, will search the xrootd server for the given pattern
-l=info, --log-level=info
 Logging level. See https://docs.python.org/3/howto/logging.html for more info.
--stats=False If enabled, will variable usage statistics.
Sub-commands:
direct

Run your jobs locally.

usage: xAH_run.py --files ... file [file ...]
                  --config path/to/file.json
                  [options]
                  direct [direct options]
Options:
--optSubmitFlags
 the name of the option for supplying extra submit parameters to batch systems
--optEventsPerWorker
 the name of the option for selecting the number of events per batch job. (only BatchDriver and derived drivers). warning: this option will be ignored unless you have called SH::scanNEvents first.
--optFilesPerWorker
 the name of the option for selecting the number of files per batch job. (only BatchDriver and derived drivers).
--optDisableMetrics
 the option to turn off collection of performance data
--optPrintPerFileStats
 the option to turn on printing of i/o statistics at the end of each file. warning: this is not supported for all drivers.
--optRemoveSubmitDir
 the name of the option for overwriting the submission directory. if you set this to a non-zero value it will remove any existing submit-directory before tryingto create a new one. You can also use -f/–force as well in xAH_run.py.
--optBatchSharedFileSystem=False
 enable to signify whether your batch driver is running on a shared filesystem
--optBatchWait=False
 submit using the submit() command. This causes the code to wait until all jobs are finished and then merge all of the outputs automatically
--optBatchShellInit=
 extra code to execute on each batch node before starting EventLoop
prooflite

Run your jobs using ProofLite

usage: xAH_run.py --files ... file [file ...]
                  --config path/to/file.json
                  [options]
                  prooflite [prooflite options]
Options:
--optSubmitFlags
 the name of the option for supplying extra submit parameters to batch systems
--optEventsPerWorker
 the name of the option for selecting the number of events per batch job. (only BatchDriver and derived drivers). warning: this option will be ignored unless you have called SH::scanNEvents first.
--optFilesPerWorker
 the name of the option for selecting the number of files per batch job. (only BatchDriver and derived drivers).
--optDisableMetrics
 the option to turn off collection of performance data
--optPrintPerFileStats
 the option to turn on printing of i/o statistics at the end of each file. warning: this is not supported for all drivers.
--optRemoveSubmitDir
 the name of the option for overwriting the submission directory. if you set this to a non-zero value it will remove any existing submit-directory before tryingto create a new one. You can also use -f/–force as well in xAH_run.py.
--optBatchSharedFileSystem=False
 enable to signify whether your batch driver is running on a shared filesystem
--optBatchWait=False
 submit using the submit() command. This causes the code to wait until all jobs are finished and then merge all of the outputs automatically
--optBatchShellInit=
 extra code to execute on each batch node before starting EventLoop
--optPerfTree the option to turn on the performance tree in PROOF. if this is set to 1, it will write out the tree
--optBackgroundProcess
 the option to do processing in a background process in PROOF
prun

Run your jobs on the grid using prun. Use prun –help for descriptions of the options.

usage: xAH_run.py --files ... file [file ...]
                  --config path/to/file.json
                  [options]
                  prun [prun options]
Options:
--optSubmitFlags
 the name of the option for supplying extra submit parameters to batch systems
--optEventsPerWorker
 the name of the option for selecting the number of events per batch job. (only BatchDriver and derived drivers). warning: this option will be ignored unless you have called SH::scanNEvents first.
--optFilesPerWorker
 the name of the option for selecting the number of files per batch job. (only BatchDriver and derived drivers).
--optDisableMetrics
 the option to turn off collection of performance data
--optPrintPerFileStats
 the option to turn on printing of i/o statistics at the end of each file. warning: this is not supported for all drivers.
--optRemoveSubmitDir
 the name of the option for overwriting the submission directory. if you set this to a non-zero value it will remove any existing submit-directory before tryingto create a new one. You can also use -f/–force as well in xAH_run.py.
--optBatchSharedFileSystem=False
 enable to signify whether your batch driver is running on a shared filesystem
--optBatchWait=False
 submit using the submit() command. This causes the code to wait until all jobs are finished and then merge all of the outputs automatically
--optBatchShellInit=
 extra code to execute on each batch node before starting EventLoop
--optGridDestSE
 Undocumented
--optGridSite Undocumented
--optGridCloud Undocumented
--optGridExcludedSite
 Undocumented
--optGridNGBPerJob=2
 Undocumented
--optGridMemory
 Undocumented
--optGridMaxCpuCount
 Undocumented
--optGridNFiles
 Undocumented
--optGridNFilesPerJob
 Undocumented
--optGridNJobs Undocumented
--optGridMaxFileSize
 Undocumented
--optGridMaxNFilesPerJob
 Undocumented
--optGridUseChirpServer
 Undocumented
--optGridExpress
 Undocumented
--optGridNoSubmit
 Undocumented
--optGridMergeOutput
 Undocumented
--optTmpDir Undocumented
--optRootVer Undocumented
--optCmtConfig Undocumented
--optGridDisableAutoRetry
 Undocumented
--optOfficial Undocumented
--optVoms Undocumented
--optGridOutputSampleName=user.%nickname%.%in:name[2]%.%in:name[3]%.%in:name[6]%.%in:name[7]%_xAH
 Define output grid sample name
condor

Flock your jobs to condor

usage: xAH_run.py --files ... file [file ...]
                  --config path/to/file.json
                  [options]
                  condor [condor options]
Options:
--optSubmitFlags
 the name of the option for supplying extra submit parameters to batch systems
--optEventsPerWorker
 the name of the option for selecting the number of events per batch job. (only BatchDriver and derived drivers). warning: this option will be ignored unless you have called SH::scanNEvents first.
--optFilesPerWorker
 the name of the option for selecting the number of files per batch job. (only BatchDriver and derived drivers).
--optDisableMetrics
 the option to turn off collection of performance data
--optPrintPerFileStats
 the option to turn on printing of i/o statistics at the end of each file. warning: this is not supported for all drivers.
--optRemoveSubmitDir
 the name of the option for overwriting the submission directory. if you set this to a non-zero value it will remove any existing submit-directory before tryingto create a new one. You can also use -f/–force as well in xAH_run.py.
--optBatchSharedFileSystem=False
 enable to signify whether your batch driver is running on a shared filesystem
--optBatchWait=False
 submit using the submit() command. This causes the code to wait until all jobs are finished and then merge all of the outputs automatically
--optBatchShellInit=
 extra code to execute on each batch node before starting EventLoop
--optCondorConf=stream_output = true
 the name of the option for supplying extra parameters for condor systems
lsf

Flock your jobs to lsf

usage: xAH_run.py --files ... file [file ...]
                  --config path/to/file.json
                  [options]
                  lsf [lsf options]
Options:
--optSubmitFlags
 the name of the option for supplying extra submit parameters to batch systems
--optEventsPerWorker
 the name of the option for selecting the number of events per batch job. (only BatchDriver and derived drivers). warning: this option will be ignored unless you have called SH::scanNEvents first.
--optFilesPerWorker
 the name of the option for selecting the number of files per batch job. (only BatchDriver and derived drivers).
--optDisableMetrics
 the option to turn off collection of performance data
--optPrintPerFileStats
 the option to turn on printing of i/o statistics at the end of each file. warning: this is not supported for all drivers.
--optRemoveSubmitDir
 the name of the option for overwriting the submission directory. if you set this to a non-zero value it will remove any existing submit-directory before tryingto create a new one. You can also use -f/–force as well in xAH_run.py.
--optBatchSharedFileSystem=False
 enable to signify whether your batch driver is running on a shared filesystem
--optBatchWait=False
 submit using the submit() command. This causes the code to wait until all jobs are finished and then merge all of the outputs automatically
--optBatchShellInit=
 extra code to execute on each batch node before starting EventLoop
--optResetShell=False
 the option to reset the shell on the worker nodes
slurm

Flock your jobs to SLURM

usage: xAH_run.py --files ... file [file ...]
                  --config path/to/file.json
                  [options]
                  slurm [slurm options]
Options:
--optSubmitFlags
 the name of the option for supplying extra submit parameters to batch systems
--optEventsPerWorker
 the name of the option for selecting the number of events per batch job. (only BatchDriver and derived drivers). warning: this option will be ignored unless you have called SH::scanNEvents first.
--optFilesPerWorker
 the name of the option for selecting the number of files per batch job. (only BatchDriver and derived drivers).
--optDisableMetrics
 the option to turn off collection of performance data
--optPrintPerFileStats
 the option to turn on printing of i/o statistics at the end of each file. warning: this is not supported for all drivers.
--optRemoveSubmitDir
 the name of the option for overwriting the submission directory. if you set this to a non-zero value it will remove any existing submit-directory before tryingto create a new one. You can also use -f/–force as well in xAH_run.py.
--optBatchSharedFileSystem=False
 enable to signify whether your batch driver is running on a shared filesystem
--optBatchWait=False
 submit using the submit() command. This causes the code to wait until all jobs are finished and then merge all of the outputs automatically
--optBatchShellInit=
 extra code to execute on each batch node before starting EventLoop
--optSlurmAccount=atlas
 the name of the account to use
--optSlurmPartition=shared-chos
 the name of the partition to use
--optSlurmRunTime=24:00:00
 the maximum runtime
--optSlurmMemory=1800
 the maximum memory usage of the job (MB)
--optSlurmConstrain=
 the extra constrains on the nodes
--optSlurmExtraConfigLines=
 the extra config lines to pass to SLURM
--optSlurmWrapperExec=
 the wrapper around the run script
local

Run using the LocalDriver

usage: xAH_run.py --files ... file [file ...]
                  --config path/to/file.json
                  [options]
                  local [local options]
Options:
--optSubmitFlags
 the name of the option for supplying extra submit parameters to batch systems
--optEventsPerWorker
 the name of the option for selecting the number of events per batch job. (only BatchDriver and derived drivers). warning: this option will be ignored unless you have called SH::scanNEvents first.
--optFilesPerWorker
 the name of the option for selecting the number of files per batch job. (only BatchDriver and derived drivers).
--optDisableMetrics
 the option to turn off collection of performance data
--optPrintPerFileStats
 the option to turn on printing of i/o statistics at the end of each file. warning: this is not supported for all drivers.
--optRemoveSubmitDir
 the name of the option for overwriting the submission directory. if you set this to a non-zero value it will remove any existing submit-directory before tryingto create a new one. You can also use -f/–force as well in xAH_run.py.
--optBatchSharedFileSystem=False
 enable to signify whether your batch driver is running on a shared filesystem
--optBatchWait=False
 submit using the submit() command. This causes the code to wait until all jobs are finished and then merge all of the outputs automatically
--optBatchShellInit=
 extra code to execute on each batch node before starting EventLoop