GPI classes¶
-
class
AfsToken
¶ An object specifying the requirements of an AFS token
Plugin category: CredentialRequirement
-
class
VomsProxy
¶ An object specifying the requirements of a VOMS proxy file
Plugin category: CredentialRequirement
-
identity
¶ Identity for the proxy {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
vo
¶ Virtual Organisation for the proxy. Defaults to LGC/VirtualOrganisation {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
role
¶ Role that the proxy must have {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
group
¶ Group for the proxy - either “group” or “group/subgroup” {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
-
class
MetadataDict
¶ MetadataDict class
Class that represents the dictionary of metadata.
Plugin category: metadata
-
class
MultiPostProcessor
¶ Contains and executes many postprocessors. This is the object which is attached to a job. Should behave like a list to the user.
Plugin category: postprocessor
-
class
TextMerger
¶ Merger class for text
TextMerger will append specified text files in the order that they are encountered in the list of Jobs. Each file will be separated by a header giving some very basic information about the individual files.
Usage:
tm = TextMerger() tm.files = [‘job.log’,’results.txt’] tm.overwrite = True #False by default tm.ignorefailed = True #False by default
# will produce the specified files j = Job() j.outputsandbox = [‘job.log’,’results.txt’] j.splitter = SomeSplitter() j.postprocessors = [tm] j.submit()
The merge object will be used to merge the output of each subjob into j.outputdir. This will be run when the job completes. If the ignorefailed flag has been set then the merge will also be run as the job enters the killed or failed states.
The above merger object can also be used independently to merge a list of jobs or the subjobs of an single job.
#tm defined above tm.merge(j, outputdir = ‘~/merge_dir’) tm.merge([.. list of jobs …], ‘~/merge_dir’, ignorefailed = True, overwrite = False)
If ignorefailed or overwrite are set then they override the values set on the merge object.
If outputdir is not specified, the default location specfied in the [Mergers] section of the .gangarc file will be used.
For large text files it may be desirable to compress the merge result using gzip. This can be done by setting the compress flag on the TextMerger object. In this case, the merged file will have a ‘.gz’ appended to its filename.
A summary of all the files merged will be created for each entry in files. This will be created when the merge of those files completes successfully. The name of this is the same as the output file, with the ‘.merge_summary’ extension appended and will be placed in the same directory as the merge results.
Plugin category: postprocessor
-
files
¶ A list of files to merge. {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
ignorefailed
¶ Jobs that are in the failed or killed states will be excluded from the merge when this flag is set to True. {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
overwrite
¶ The default behaviour for this Merger object. Will overwrite output files. {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
compress
¶ Output should be compressed with gzip. {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
-
class
RootMerger
¶ Merger class for ROOT files
RootMerger will use the version of ROOT configured in the .gangarc file to add together histograms and trees using the ‘hadd’ command provided by ROOT. Further details of the hadd command can be found in the ROOT documentation.
Usage:
rm = RootMerger() rm.files = [‘hist.root’,’trees.root’] rm.overwrite = True #False by default rm.ignorefailed = True #False by default rm.args = ‘-f2’ #pass arguments to hadd
# will produce the specified files j = Job() j.outputsandbox = [‘hist.root’,’trees.root’] j.splitter = SomeSplitter() j.postprocessors = [rm] j.submit()
The merge object will be used to merge the output of each subjob into j.outputdir. This will be run when the job completes. If the ignorefailed flag has been set then the merge will also be run as the job enters the killed or failed states.
The above merger object can also be used independently to merge a list of jobs or the subjobs of an single job.
#rm defined above rm.merge(j, outputdir = ‘~/merge_dir’) rm.merge([.. list of jobs …], ‘~/merge_dir’, ignorefailed = True, overwrite = False)
If ignorefailed or overwrite are set then they override the values set on the merge object.
A summary of all the files merged will be created for each entry in files. This will be created when the merge of those files completes successfully. The name of this is the same as the output file, with the ‘.merge_summary’ extension appended and will be placed in the same directory as the merge results.
If outputdir is not specified, the default location specfied in the [Mergers] section of the .gangarc file will be used.
Plugin category: postprocessor
-
files
¶ A list of files to merge. {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
ignorefailed
¶ Jobs that are in the failed or killed states will be excluded from the merge when this flag is set to True. {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
overwrite
¶ The default behaviour for this Merger object. Will overwrite output files. {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
args
¶ Arguments to be passed to hadd. {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
-
class
CustomMerger
¶ User tool for writing custom merging tools with Python
Allows a script to be supplied that performs the merge of some custom file type. The script must be a python file which defines the following function:
def mergefiles(file_list, output_file):
#perform the merge if not success:
return -1- else:
- return 0
This module will be imported and used by the CustomMerger. The file_list is a list of paths to the files to be merged. output_file is a string path for the output of the merge. This file must exist by the end of the merge or the merge will fail. If the merge cannot proceed, then the function should return a non-zero integer. If the merger is in the file mymerger.py, the usage can be
cm = CustomMerger() cm.module = ‘~/mymerger.py’ cm.files = [‘file.txt’]
# This will call the merger once all jobs are finished. j = Job() j.outputsandbox = [‘file.txt’] j.splitter = SomeSplitter() j.postprocessors = [cm] j.submit()
Clearly this tool is provided for advanced ganga usage only, and should be used with this in mind.
Plugin category: postprocessor
-
files
¶ A list of files to merge. {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
ignorefailed
¶ Jobs that are in the failed or killed states will be excluded from the merge when this flag is set to True. {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
overwrite
¶ The default behaviour for this Merger object. Will overwrite output files. {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
module
¶ Path to a python module to perform the merge. {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
class
SmartMerger
¶ Allows the different types of merge to be run according to file extension in an automatic way.
SmartMerger accepts a list of files which it will delegate to individual Merger objects based on the file extension of the file. The mapping between file extensions and Merger objects can be defined in the [Mergers] section of the .gangarc file. Extensions are treated in a case insensitive way. If a file extension is not recognized than the file will be ignored if the ignorefailed flag is set, or the merge will fail.
Example:
sm = SmartMerger() sm.files = [‘stderr’,’histo.root’,’job.log’,’summary.txt’,’trees.root’,’stdout’] sm.merge([… list of jobs …], outputdir = ‘~/merge_dir’)#also accepts a single Job
If outputdir is not specified, the default location specfied in the [Mergers] section of the .gangarc file will be used.
If files is not specified, then it will be taken from the list of jobs given to the merge method. Only files which appear in all jobs will be merged.
Mergers can also be attached to Job objects in the same way as other Merger objects.
#sm defined above j = Job() j.splitter = SomeSplitter() j.postprocessors = [sm] j.submit()
Plugin category: postprocessor
-
files
¶ A list of files to merge. {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
ignorefailed
¶ Jobs that are in the failed or killed states will be excluded from the merge when this flag is set to True. {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
overwrite
¶ The default behaviour for this Merger object. Will overwrite output files. {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
-
class
FileChecker
¶ Checks if string is in file. self.searchStrings are the files you would like to check for. self.files are the files you would like to check. self.failIfFound (default = True) decides whether to fail the job if the string is found. If you set this to false the job will fail if the string isnt found. self.fileMustExist toggles whether to fail the job if the specified file doesn’t exist (default is True).
Plugin category: postprocessor
-
checkSubjobs
¶ Run on subjobs {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
checkMaster
¶ Run on master {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
files
¶ File to search in {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
filesMustExist
¶ Toggle whether to fail job if a file isn’t found. {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
searchStrings
¶ String to search for {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
failIfFound
¶ Toggle whether job fails if string is found or not found. {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
-
class
CustomChecker
¶ User tool for writing custom check with Python. Make a file, e.g customcheck.py, In that file, do something like:
- def check(j):
- if j has passed:
- return True
- else:
- return False
When the job is about to be completed, Ganga will call this function and fail the job if False is returned.
Plugin category: postprocessor
-
checkSubjobs
¶ Run on subjobs {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
checkMaster
¶ Run on master {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
module
¶ Path to a python module to perform the check. {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
class
RootFileChecker
¶ Checks ROOT files to see if they are zombies. For master job, also checks to see if merging performed correctly. self.files are the files you would like to check. self.fileMustExist toggles whether to fail the job if the specified file doesn’t exist (default is True).
Plugin category: postprocessor
-
checkSubjobs
¶ Run on subjobs {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
checkMaster
¶ Run on master {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
files
¶ File to search in {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
filesMustExist
¶ Toggle whether to fail job if a file isn’t found. {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
checkMerge
¶ Toggle whether to check the merging proceedure {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
-
class
Notifier
¶ Object which emails a user about jobs status are they have finished. The default behaviour is to email when a job has failed or when a master job has completed. Notes: * Ganga must be running to send the email, so this object is only really useful if you have a ganga session running the background (e.g. screen session). * Will not send emails about failed subjobs if autoresubmit is on.
Plugin category: postprocessor
-
verbose
¶ Email on subjob completion {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
address
¶ Email address {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
-
class
File
¶ Represent the files, both local and remote and provide an interface to transparently get access to them.
Typically in the context of job submission, the files are copied to the directory where the application runs on the worker node. The ‘subdir’ attribute influances the destination directory. The ‘subdir’ feature is not universally supported however and needs a review.
Plugin category: files
-
name
¶ path to the file source {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
subdir
¶ destination subdirectory (a relative path) {‘protected’: 0, ‘defvalue’: ‘.’, ‘changable_at_resubmit’: 0}
-
Represents the directory used to store resources that are shared amongst multiple Ganga objects.
Currently this is only used in the context of the prepare() method for certain applications, such as the Executable() application. A single (“prepared”) application can be associated to multiple jobs.
Plugin category: shareddirs
path to the file source {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
destination subdirectory (a relative path) {‘protected’: 0, ‘defvalue’: ‘.’, ‘changable_at_resubmit’: 0}
A list of files associated with the sharedir {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
class
LocalFile
¶ LocalFile represents base class for output files, such as MassStorageFile, LCGSEFile, etc
Plugin category: gangafiles
-
namePattern
¶ pattern of the file name {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
localDir
¶ local dir where the file is stored, used from get and put methods {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
compressed
¶ wheather the output file should be compressed before sending somewhere {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
-
class
MassStorageFile
¶ MassStorageFile represents a class marking a file to be written into mass storage (like Castor at CERN)
Plugin category: gangafiles
-
namePattern
¶ pattern of the file name {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
localDir
¶ local dir where the file is stored, used from get and put methods {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
joboutputdir
¶ outputdir of the job with which the outputsandbox file object is associated {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
locations
¶ list of locations where the outputfiles are uploaded {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
outputfilenameformat
¶ keyword path to where the output should be uploaded, i.e. /some/path/here/{jid}/{sjid}/{fname}, if this field is not set, the output will go in {jid}/{sjid}/{fname} or in {jid}/{fname} depending on whether the job is split or not {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
inputremotedirectory
¶ Directory on mass storage where the file is stored {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
failureReason
¶ reason for the upload failure {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
compressed
¶ wheather the output file should be compressed before sending somewhere {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
SharedFile. Special case of MassStorage for locally accessible fs through the standard lsb commands.
Plugin category: gangafiles
pattern of the file name {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
local dir where the file is stored, used from get and put methods {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
outputdir of the job with which the outputsandbox file object is associated {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
list of locations where the outputfiles are uploaded {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
keyword path to where the output should be uploaded, i.e. /some/path/here/{jid}/{sjid}/{fname}, if this field is not set, the output will go in {jid}/{sjid}/{fname} or in {jid}/{fname} depending on whether the job is split or not {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
Directory on mass storage where the file is stored {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
reason for the upload failure {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
wheather the output file should be compressed before sending somewhere {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
class
LCGSEFile
¶ LCGSEFile represents a class marking an output file to be written into LCG SE
Plugin category: gangafiles
-
namePattern
¶ pattern of the file name {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
localDir
¶ local dir where the file is stored, used from get and put methods {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
joboutputdir
¶ outputdir of the job with which the outputsandbox file object is associated {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
se
¶ the LCG SE hostname {‘protected’: 0, ‘defvalue’: ‘srm-public.cern.ch’, ‘changable_at_resubmit’: 0}
-
se_type
¶ the LCG SE type {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
se_rpath
¶ the relative path to the file from the VO directory on the SE {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
lfc_host
¶ the LCG LFC hostname {‘protected’: 0, ‘defvalue’: ‘lfc-dteam.cern.ch’, ‘changable_at_resubmit’: 0}
-
srm_token
¶ the SRM space token, meaningful only when se_type is set to srmv2 {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
SURL
¶ the LCG SE SURL {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
port
¶ the LCG SE port {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
locations
¶ list of locations where the outputfiles were uploaded {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
failureReason
¶ reason for the upload failure {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
compressed
¶ wheather the output file should be compressed before sending somewhere {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
credential_requirements
¶ {‘protected’: 0, ‘defvalue’: ‘VomsProxy’, ‘changable_at_resubmit’: 0}
-
-
class
GoogleFile
¶ The GoogleFile outputfile type allows for files to be directly uploaded, downloaded, removed and restored from the GoogleDrive service. It can be used as part of a job to output data directly to GoogleDrive, or standalone through the Ganga interface.
example job: j=Job(application=Executable(exe=File(‘/home/hep/hs4011/Tests/testjob.sh’), args=[]),outputfiles=[GoogleFile(‘TestJob.txt’)])
j.submit()
### This job will automatically upload the outputfile ‘TestJob.txt’ to GoogleDrive.
example of standalone submission:
g=GoogleFile(‘TestFile.txt’)
g.localDir = ‘~/TestDirectory’ ### The file’s location must be specified for standalone submission
g.put() ### The put() method uploads the file to GoogleDrive directly
The GoogleFile outputfile is also compatible with the Dirac backend, making outputfiles from Dirac-run jobs upload directly to GoogleDrive.
The first time GoogleFile is used for upload or download, an interactive process will start to get authenticated.
Plugin category: gangafiles
-
namePattern
¶ pattern of the file name {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
localDir
¶ local dir where the file is stored, used from get and put methods {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
failureReason
¶ reason for the upload failure {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
compressed
¶ wheather the output file should be compressed before sending somewhere {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
downloadURL
¶ download URL assigned to the file upon upload to GoogleDrive {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
-
class
JobTime
¶ Job timestamp access. In development
Changes in the status of a Job are timestamped - a datetime object is stored in the dictionary named ‘timestamps’, in Coordinated Universal Time(UTC). More information on datetime objects can be found at:
http://docs.python.org/library/datetime.html
Datetime objects can be subtracted to produce a ‘timedelta’ object. More information about these can be found at the above address. ‘+’, ‘*’, and ‘/’ are not supported by datetime objects.
Datetime objects can be formatted into strings using the .strftime(format_string) application, and the strftime codes. e.g. %Y -> year as integer
%a -> abbreviated weekday name %M -> minutes as inetgerThe full list can be found at: http://docs.python.org/library/datetime.html#strftime-behavior
Standard status types with built in access methods are: -‘new’ -‘submitted’ -‘running’ -‘completed’ -‘killed’ -‘failed’
These return a string with default format %Y/%m/%d @ %H:%M:%S. A custom format can be specified in the arguement.
Any information stored within the timestamps dictionary can also be extracted in the way as in would be for a standard, non-application specific python dictionary.
For a table display of the Job’s timestamps use .time.display(). For timestamps details from the backend use .time.details()
Plugin category: jobtime
-
timestamps
¶ Dictionary containing timestamps for job {‘protected’: 0, ‘defvalue’: {}, ‘changable_at_resubmit’: 0}
-
-
class
EmptyDataset
¶ Documentation missing.
Plugin category: datasets
-
class
EmptyDataset
Documentation missing.
Plugin category: datasets
-
class
GangaDataset
¶ Class for handling generic datasets of input files
Plugin category: datasets
-
files
¶ list of file objects that will be the inputdata for the job {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
treat_as_inputfiles
¶ Treat the inputdata as inputfiles, i.e. copy the inputdata to the WN {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
-
class
TaskChainInput
¶ Dummy dataset to map the output of a transform to the input of another transform
Plugin category: datasets
-
input_trf_id
¶ Input Transform ID {‘protected’: 0, ‘defvalue’: -1, ‘changable_at_resubmit’: 0}
-
single_unit
¶ Create a single unit from all inputs in the transform {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
use_copy_output
¶ Use the copied output instead of default output (e.g. use local copy instead of grid copy) {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
include_file_mask
¶ List of Regular expressions of which files to include for input {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
exclude_file_mask
¶ List of Regular expressions of which files to exclude for input {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
-
class
TaskLocalCopy
¶ Dummy dataset to force Tasks to copy the output from a job to local storage somewhere
Plugin category: datasets
-
local_location
¶ Local location to copy files to {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
include_file_mask
¶ List of Regular expressions of which files to include in copy {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
exclude_file_mask
¶ List of Regular expressions of which files to exclude from copy {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
files
¶ List of successfully downloaded files {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
-
class
Local
¶ Run jobs in the background on local host.
The job is run in the workdir (usually in /tmp).
Plugin category: backends
-
id
¶ Process id. {‘protected’: 1, ‘defvalue’: -1, ‘changable_at_resubmit’: 0}
-
exitcode
¶ Process exit code. {‘protected’: 1, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
workdir
¶ Working directory. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
actualCE
¶ Hostname where the job was submitted. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
nice
¶ adjust process priority using nice -n command {‘protected’: 0, ‘defvalue’: 0, ‘changable_at_resubmit’: 0}
-
force_parallel
¶ should jobs really be submitted in parallel {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
batchsize
¶ Run a maximum of this number of subjobs in parallel. If value is negative use number of available CPUs {‘protected’: 0, ‘defvalue’: -1, ‘changable_at_resubmit’: 0}
-
-
class
Local
Run jobs in the background on local host.
The job is run in the workdir (usually in /tmp).
Plugin category: backends
-
id
Process id. {‘protected’: 1, ‘defvalue’: -1, ‘changable_at_resubmit’: 0}
-
exitcode
Process exit code. {‘protected’: 1, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
workdir
Working directory. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
actualCE
Hostname where the job was submitted. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
nice
adjust process priority using nice -n command {‘protected’: 0, ‘defvalue’: 0, ‘changable_at_resubmit’: 0}
-
force_parallel
should jobs really be submitted in parallel {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
batchsize
Run a maximum of this number of subjobs in parallel. If value is negative use number of available CPUs {‘protected’: 0, ‘defvalue’: -1, ‘changable_at_resubmit’: 0}
-
-
class
LCG
¶ LCG backend - submit jobs to the EGEE/LCG Grid using gLite middleware.
If the input sandbox exceeds the limit specified in the ganga configuration, it is automatically uploaded to a storage element. This overcomes sandbox size limits on the resource broker.
For gLite middleware bulk (faster) submission is supported so splitting jobs may be more efficient than submitting bunches of individual jobs.
For more options see help on LCGRequirements.
See also: http://cern.ch/glite/documentation
Plugin category: backends
-
CE
¶ Request a specific Computing Element {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
jobtype
¶ Job type: Normal, MPICH {‘protected’: 0, ‘defvalue’: ‘Normal’, ‘changable_at_resubmit’: 0}
-
requirements
¶ Requirements for the resource selection {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
sandboxcache
¶ Interface for handling oversized input sandbox {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
id
¶ Middleware job identifier {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
status
¶ Middleware job status {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
middleware
¶ Middleware type {‘protected’: 0, ‘defvalue’: ‘GLITE’, ‘changable_at_resubmit’: 0}
-
exitcode
¶ Application exit code {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
exitcode_lcg
¶ Middleware exit code {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
reason
¶ Reason of causing the job status {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
perusable
¶ Enable the job perusal feature of GLITE {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
actualCE
¶ Computing Element where the job actually runs. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
credential_requirements
¶ {‘protected’: 0, ‘defvalue’: VomsProxy(), ‘changable_at_resubmit’: 0}
-
-
class
CREAM
¶ CREAM backend - direct job submission to gLite CREAM CE
Plugin category: backends
-
CE
¶ CREAM CE endpoint {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
jobtype
¶ Job type: Normal, MPICH {‘protected’: 0, ‘defvalue’: ‘Normal’, ‘changable_at_resubmit’: 0}
-
requirements
¶ Requirements for the resource selection {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
sandboxcache
¶ Interface for handling oversized input sandbox {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
id
¶ Middleware job identifier {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
status
¶ Middleware job status {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
exitcode
¶ Application exit code {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
exitcode_cream
¶ Middleware exit code {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
actualCE
¶ The CREAM CE where the job actually runs. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
reason
¶ Reason of causing the job status {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
workernode
¶ The worker node on which the job actually runs. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
isbURI
¶ The input sandbox URI on CREAM CE {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
osbURI
¶ The output sandbox URI on CREAM CE {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
credential_requirements
¶ {‘protected’: 0, ‘defvalue’: VomsProxy(), ‘changable_at_resubmit’: 0}
-
-
class
ARC
¶ ARC backend - direct job submission to an ARC CE
Plugin category: backends
-
CE
¶ ARC CE endpoint {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
jobtype
¶ Job type: Normal, MPICH {‘protected’: 0, ‘defvalue’: ‘Normal’, ‘changable_at_resubmit’: 0}
-
requirements
¶ Requirements for the resource selection {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
sandboxcache
¶ Interface for handling oversized input sandbox {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
id
¶ Middleware job identifier {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
status
¶ Middleware job status {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
exitcode
¶ Application exit code {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
exitcode_arc
¶ Middleware exit code {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
actualCE
¶ The ARC CE where the job actually runs. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
queue
¶ The queue to send the job to. {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
xRSLextras
¶ Extra things to put into the xRSL for submission. {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
reason
¶ Reason of causing the job status {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
workernode
¶ The worker node on which the job actually runs. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
isbURI
¶ The input sandbox URI on ARC CE {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
osbURI
¶ The output sandbox URI on ARC CE {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
verbose
¶ Use verbose options for ARC commands {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
credential_requirements
¶ {‘protected’: 0, ‘defvalue’: VomsProxy(), ‘changable_at_resubmit’: 0}
-
-
class
Condor
¶ Condor backend - submit jobs to a Condor pool.
For more options see help on CondorRequirements.
Plugin category: backends
-
requirements
¶ Requirements for selecting execution host {‘protected’: 0, ‘defvalue’: ‘CondorRequirements’, ‘changable_at_resubmit’: 0}
-
env
¶ Environment settings for execution host {‘protected’: 0, ‘defvalue’: {}, ‘changable_at_resubmit’: 0}
-
getenv
¶ Flag to pass current envrionment to execution host {‘protected’: 0, ‘defvalue’: ‘False’, ‘changable_at_resubmit’: 0}
-
rank
¶ Ranking scheme to be used when selecting execution host {‘protected’: 0, ‘defvalue’: ‘Memory’, ‘changable_at_resubmit’: 0}
-
submit_options
¶ Options passed to Condor at submission time {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
id
¶ Condor jobid {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
status
¶ Condor status {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
cputime
¶ CPU time used by job {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
actualCE
¶ Machine where job has been submitted {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
Flag indicating if Condor nodes have shared filesystem {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
universe
¶ Type of execution environment to be used by Condor {‘protected’: 0, ‘defvalue’: ‘vanilla’, ‘changable_at_resubmit’: 0}
-
globusscheduler
¶ Globus scheduler to be used (required for Condor-G submission) {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
globus_rsl
¶ Globus RSL settings (for Condor-G submission) {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
spool
¶ Spool all required input files, job event log, and proxy over the connection to the condor_schedd. Required for EOS, see: http://batchdocs.web.cern.ch/batchdocs/troubleshooting/eos_submission.html {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
accounting_group
¶ Provide an accounting group for this job. {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
cdf_options
¶ Additional options to set in the CDF file given by a dictionary {‘protected’: 0, ‘defvalue’: {}, ‘changable_at_resubmit’: 0}
-
-
class
Interactive
¶ Run jobs interactively on local host.
Interactive job prints output directly on screen and takes the input from the keyboard. So it may be interupted with Ctrl-C
Plugin category: backends
-
id
¶ Process id {‘protected’: 1, ‘defvalue’: 0, ‘changable_at_resubmit’: 0}
-
status
¶ Backend status {‘protected’: 1, ‘defvalue’: ‘new’, ‘changable_at_resubmit’: 0}
-
exitcode
¶ Process exit code {‘protected’: 1, ‘defvalue’: 0, ‘changable_at_resubmit’: 0}
-
workdir
¶ Work directory {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
actualCE
¶ Name of machine where job is run {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
-
class
LSF
¶ LSF backend - submit jobs to Load Sharing Facility.
Plugin category: backends
-
queue
¶ queue name as defomed in your local Batch installation {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
extraopts
¶ extra options for Batch. See help(Batch) for more details {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 1}
-
id
¶ Batch id of the job {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
exitcode
¶ Process exit code {‘protected’: 1, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
actualqueue
¶ queue name where the job was submitted. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
actualCE
¶ hostname where the job is/was running. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
-
class
PBS
¶ PBS backend - submit jobs to Portable Batch System.
Plugin category: backends
-
queue
¶ queue name as defomed in your local Batch installation {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
extraopts
¶ extra options for Batch. See help(Batch) for more details {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 1}
-
id
¶ Batch id of the job {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
exitcode
¶ Process exit code {‘protected’: 1, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
actualqueue
¶ queue name where the job was submitted. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
actualCE
¶ hostname where the job is/was running. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
-
class
SGE
¶ SGE backend - submit jobs to Sun Grid Engine.
Plugin category: backends
-
queue
¶ queue name as defomed in your local Batch installation {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
extraopts
¶ extra options for Batch. See help(Batch) for more details {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 1}
-
id
¶ Batch id of the job {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
exitcode
¶ Process exit code {‘protected’: 1, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
actualqueue
¶ queue name where the job was submitted. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
actualCE
¶ hostname where the job is/was running. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
-
class
Slurm
¶ Slurm backend - submit jobs to Slurm.
Plugin category: backends
-
queue
¶ queue name as defomed in your local Batch installation {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
extraopts
¶ extra options for Batch. See help(Batch) for more details {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 1}
-
id
¶ Batch id of the job {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
exitcode
¶ Process exit code {‘protected’: 1, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
actualqueue
¶ queue name where the job was submitted. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
actualCE
¶ hostname where the job is/was running. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
-
class
Remote
¶ Remote backend - submit jobs to a Remote pool.
The remote backend works as an SSH tunnel to a remote site where a ganga session is opened and the job submitted there using the specified remote_backend. It is (in theory!) transparent to the user and should allow submission of any jobs to any backends that are already possible in GangaCore.
NOTE: Due to the file transfers required, there can be some slow down during submission and monitoring
E.g. 1 - Hello World example submitted to local backend:
j = Job(application=Executable(exe=’/bin/echo’,args=[‘Hello World’]), backend=”Remote”) j.backend.host = “bluebear.bham.ac.uk” # Host name j.backend.username = “slatermw” # User name j.backend.ganga_cmd = “/bb/projects/Ganga/runGanga” # Ganga Command line on remote site j.backend.ganga_dir = “/bb/phy/slatermw/gangadir/remote_jobs” # Where to store the jobs j.backend.remote_backend = Local() j.submit()
E.g. 2 - Root example submitted to PBS backend:
r = Root() r.version = ‘5.14.00’ r.script = ‘gengaus.C’
j = Job(application=r,backend=”Remote”) j.backend.host = “bluebear.bham.ac.uk” j.backend.username = “slatermw” j.backend.ganga_cmd = “/bb/projects/Ganga/runGanga” j.backend.ganga_dir = “/bb/phy/slatermw/gangadir/remote_jobs” j.outputsandbox = [‘gaus.txt’] j.backend.remote_backend = PBS() j.submit()
E.g. 3 - Athena example submitted to LCG backend NOTE: you don’t need a grid certificate (or UI) available on the local machine, just the remote machine
j = Job() j.name=’Ex3_2_1’ j.application=Athena() j.application.prepare(athena_compile=False) j.application.option_file=’/disk/f8b/home/mws/athena/testarea/13.0.40/PhysicsAnalysis/AnalysisCommon/UserAnalysis/run/AthExHelloWorld_jobOptions.py’
j.backend = Remote() j.backend.host = “bluebear.bham.ac.uk” j.backend.username = “slatermw” j.backend.ganga_cmd = “/bb/projects/Ganga/runGanga” j.backend.ganga_dir = “/bb/phy/slatermw/gangadir/remote_jobs” j.backend.environment = {‘ATLAS_VERSION’ : ‘13.0.40’} # Additional environment variables j.backend.remote_backend = LCG() j.backend.remote_backend.CE = ‘epgce2.ph.bham.ac.uk:2119/jobmanager-lcgpbs-short’
j.submit()
E.g. 4 - Hello World submitted at CERN on LSF using atlas startup
j = Job() j.backend = Remote() j.backend.host = “lxplus.cern.ch” j.backend.username = “mslater” j.backend.ganga_cmd = “ganga” j.backend.ganga_dir = “/afs/cern.ch/user/m/mslater/gangadir/remote_jobs” j.backend.pre_script = [‘source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.csh’] # source the atlas setup script before running ganga j.backend.remote_backend = LSF() j.submit()
Plugin category: backends
-
remote_backend
¶ specification of the resources to be used (e.g. batch system) {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
host
¶ The remote host and port number (‘host:port’) to use. Default port is 22. {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
ssh_key
¶ Set to true to the location of the the ssh key to use for authentication, e.g. /home/mws/.ssh/id_rsa. Note, you should make sure ‘key_type’ is also set correctly. {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
key_type
¶ Set to the type of ssh key to use (if required). Possible values are ‘RSA’ and ‘DSS’. {‘protected’: 0, ‘defvalue’: ‘RSA’, ‘changable_at_resubmit’: 0}
-
username
¶ The username at the remote host {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
ganga_dir
¶ The directory to use for the remote workspace, repository, etc. {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
ganga_cmd
¶ Command line to start ganga on the remote host {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
environment
¶ Overides any environment variables set in the job {‘protected’: 0, ‘defvalue’: {}, ‘changable_at_resubmit’: 0}
-
pre_script
¶ Sequence of commands to execute before running Ganga on the remote site {‘protected’: 0, ‘defvalue’: [‘’], ‘changable_at_resubmit’: 0}
-
remote_job_id
¶ Remote job id. {‘protected’: 1, ‘defvalue’: 0, ‘changable_at_resubmit’: 0}
-
exitcode
¶ Application exit code {‘protected’: 1, ‘defvalue’: 0, ‘changable_at_resubmit’: 0}
-
actualCE
¶ Computing Element where the job actually runs. {‘protected’: 1, ‘defvalue’: 0, ‘changable_at_resubmit’: 0}
-
-
class
Executable
¶ Executable application – running arbitrary programs.
When you want to run on a worker node an exact copy of your script you should specify it as a File object. Ganga will then ship it in a sandbox:
app.exe = File(‘/path/to/my/script’)When you want to execute a command on the worker node you should specify it as a string. Ganga will call the command with its full path on the worker node:
app.exe = ‘/bin/date’A command string may be either an absolute path (‘/bin/date’) or a command name (‘echo’). Relative paths (‘a/b’) or directory paths (‘/a/b/’) are not allowed because they have no meaning on the worker node where the job executes.
- The arguments may be specified in the following way:
- app.args = [‘-v’,File(‘/some/input.dat’)]
This will yield the following shell command: executable -v input.dat The input.dat will be automatically added to the input sandbox.
- If only one argument is specified the the following abbreviation may be used:
- apps.args = ‘-v’
Plugin category: applications
-
exe
¶ A path (string) or a File object specifying an executable. {‘protected’: 0, ‘defvalue’: ‘echo’, ‘changable_at_resubmit’: 0}
-
args
¶ List of arguments for the executable. Arguments may be strings, numerics or File objects. {‘protected’: 0, ‘defvalue’: [‘Hello World’], ‘changable_at_resubmit’: 0}
-
env
¶ Dictionary of environment variables that will be replaced in the running environment. {‘protected’: 0, ‘defvalue’: {}, ‘changable_at_resubmit’: 0}
-
platform
¶ Platform where the job will be executed, for example “x86_64-centos7-gcc8-opt” {‘protected’: 0, ‘defvalue’: ‘ANY’, ‘changable_at_resubmit’: 0}
-
is_prepared
¶ Location of shared resources. Presence of this attribute implies the application has been prepared. {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
hash
¶ MD5 hash of the string representation of applications preparable attributes {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
class
Root
¶ Root application – running ROOT
To run a job in ROOT you need to specify the CINT script to be executed. Additional files required at run time (shared libraries, source files, other scripts, Ntuples) should be placed in the inputsandbox of the job. Arguments can be passed onto the script using the ‘args’ field of the application.
Defining a Simple Job:
As an example the script analysis.C in the directory ~/abc might contain:
- void analysis(const char* type, int events) {
- std::cout << type << ” ” << events << std::endl;
}
To define an LCG job on the Ganga command line with this script, running in ROOT version 5.14.00b with the arguments ‘MinBias’ and 10, you would do the following:
r = Root() r.version = ‘6.04.02’ r.script = ‘~/abc/analysis.C’ r.args = [‘Minbias’, 10]
j = Job(application=r, backend=LCG())
Using Shared Libraries:
If you have private shared libraries that should be loaded you need to include them in the inputsandbox. Files you want back as a result of running your job should be placed in your outputsandbox.
The shared library mechanism is particularly useful in order to create a thin wrapper around code that uses precompiled libraries, or that has not been designed to work in the CINT environment.
For more detailed instructions, see the following Wiki page:
https://twiki.cern.ch/twiki/bin/view/ArdaGrid/HowToRootJobsSharedObject
A summary of this page is given below:
Consider the follow in CINT script, runMain.C, that makes use of a ROOT compatible shared library:
- void runMain(){
//set up main, eg command line opts char* argv[] = {“runMain.C”,”–muons”,”100”}; int argc = 3;
//load the shared library gSystem->Load(“libMain”);
//run the code Main m(argv,argc); int returnCode = m.run();
}
The class Main is as follows and has been compiled into a shared library, libMain.so.
Main.h:
#ifndef MAIN_H #define MAIN_H #include “TObject.h”
class Main : public TObject {
- public:
Main(){}//needed by Root IO Main(char* argv[], int argc); int run();
ClassDef(Main,1)//Needed for CINT
}; #endif
Main.cpp:
#include <iostream> using std::cout; using std::endl; #include “Main.h”
ClassImp(Main)//needed for CINT Main::Main(char* arvv[], int argc){
//do some setup, command line opts etc}
- int Main::run(){
- cout << “Running Main…” << endl; return 0;
}
To run this on LCG, a Job could be created as follows:
r = Root() r.version = ‘5.12.00’ #version must be on LCG external site r.script = ‘runMain.C’
j = Job(application=r,backend=LCG()) j.inputsandbox = [‘libMain.so’]
It is a requirement that your script contains a function with the same name as the script itself and that the shared library file is built to be binary compatible with the Grid environment (e.g. same architecture and version of gcc). As shown above, the wrapper class must be made CINT compatible. This restriction does not, however, apply to classes used by the wrapper class. When running remote (e.g. LCG) jobs, the architecture used is ‘slc3_ia32_gcc323’ if the Root version is 5.16 or earlier and ‘slc4_ia32_gcc34’ otherwise. This reflects the availability of builds on the SPI server:
http://service-spi.web.cern.ch/service-spi/external/distribution/
For backends that use a local installation of ROOT the location should be set correctly in the [Root] section of the configuration.
Using Python and Root:
The Root project provides bindings for Python, the language supported by the Ganga command line interface. These bindings are referred to as PyRoot. A job is run using PyRoot if the script has the ‘.py’ extension or the usepython flag is set to True.
There are many example PyRoot scripts available in the Root tutorials. A short example is given below:
gengaus.py:
- if __name__ == ‘__main__’:
from ROOT import gRandom
output = open(‘gaus.txt’,’w’) try:
- for i in range(100):
- print(gRandom.Gaus(), file=output)
- finally:
- output.close()
The above script could be run in Ganga as follows:
r = Root() r.version = ‘5.14.00’ r.script = ‘~/gengaus.py’ r.usepython = True #set automatically for ‘.py’ scripts
j = Job(application=r,backend=Local()) j.outputsandbox = [‘gaus.txt’] j.submit()
When running locally, the python interpreter used for running PyRoot jobs will default to the one being used in the current Ganga session. The Root binaries selected must be binary compatible with this version.
The pythonhome variable in the [Root] section of .gangarc controls which interpreter will be used for PyRoot jobs.
When using PyRoot on a remote backend, e.g. LCG, the python version that is used will depend on that used to build the Root version requested.
Plugin category: applications
-
script
¶ A File object specifying the script to execute when Root starts {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
args
¶ List of arguments for the script. Accepted types are numerics and strings {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
version
¶ The version of Root to run {‘protected’: 0, ‘defvalue’: ‘6.04.02’, ‘changable_at_resubmit’: 0}
-
usepython
¶ Execute ‘script’ using Python. The PyRoot libraries are added to the PYTHONPATH. {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
is_prepared
¶ Location of shared resources. Presence of this attribute implies the application has been prepared. {‘protected’: 1, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
class
Notebook
¶ Notebook application – execute Jupyter notebooks.
All cells in the notebooks given as inputfiles will be evaluated and the results returned in the same notebooks.
A simple example is
app = Notebook() infiles = [LocalFile(‘/abc/test.ipynb’)] outfiles = [LocalFile(‘test.ipynb’)] j = Job(application=app, inputfiles=files, backend=Local()) j.submit()
The input can come from any GangaFile type supported and the same is the case for the output.
All inputfiles matching the regular expressions (default all files ending in .ipynb) given are executed. Other files will simply be unpacked and available.
Plugin category: applications
-
version
¶ Version of the notebook. If None, it will be assumed that it is the latest one. {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
timeout
¶ Timeout in seconds for executing a notebook. If None, the default value will be taken. {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
kernel
¶ The kernel to use for the notebook execution. Depending on configuration, python3, Root and R might be available. {‘protected’: 0, ‘defvalue’: ‘python2’, ‘changable_at_resubmit’: 0}
-
regexp
¶ Regular expression for the inputfiles to match for executing. {‘protected’: 0, ‘defvalue’: [‘.+.ipynb$’], ‘changable_at_resubmit’: 0}
-
is_prepared
¶ Location of shared resources. Presence of this attribute implies the application has been prepared. {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
hash
¶ MD5 hash of the string representation of applications preparable attributes {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
-
class
JobInfo
¶ Additional job information. Partially implemented
Plugin category: jobinfos
-
submit_counter
¶ job submission/resubmission counter {‘protected’: 1, ‘defvalue’: 0, ‘changable_at_resubmit’: 0}
-
monitor
¶ job monitor instance {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
uuid
¶ globally unique job identifier {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
monitoring_links
¶ list of tuples of monitoring links {‘protected’: 1, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
-
class
Job
¶ Job is an interface for submision, killing and querying the jobs :-).
Basic configuration:
The “application” attribute defines what should be run. Applications may be generic arbitrary executable scripts or complex, predefined objects.
The “backend” attribute defines where and how to run. Backend object represents a resource or a batch system with various configuration parameters.
Available applications, backends and other job components may be listed using the plugins() function. See help on plugins() function.
The “status” attribute represents the state of Ganga job object. It is automatically updated by the monitoring loop. Note that typically at the backends the jobs have their own, more detailed status. This information is typically available via “job.backend.status” attribute.
Bookkeeping and persistency:
Job objects contain basic book-keeping information: “id”, “status” and “name”. Job objects are automatically saved in a job repository which may be a special directory on a local filesystem or a remote database.
Input/output and file workspace:
There is an input/output directory called file workspace associated with each job (“inputdir” and “outputdir” properties). When a job is submitted, all input files are copied to the file workspace to keep consistency of the input while the job is running. Ganga then ships all files in the input workspace to the backend systems in a sandbox.
The list of input files is defined by the application (implicitly). Additional files may be explicitly specified in the “inputsandbox” attribute.
Job splitting:
The “splitter” attributes defines how a large job may be divided into smaller subjobs. The subjobs are automatically created when the main (master) job is submitted. The “subjobs” attribute gives access to individual subjobs. The “master” attribute of a subjob points back to the master job.
Postprocessors:
The “postprocessors” attribute is a list of actions to perform once the job has completed. This includes how the output of the subjobs may be merged, user defined checks which may fail the job, and an email notification.
Datasets: PENDING Datasets are highly application and virtual organisation specific.
Plugin category: jobs
-
inputsandbox
¶ list of File objects shipped to the worker node {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
outputsandbox
¶ list of filenames or patterns shipped from the worker node {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
info
¶ JobInfo {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
comment
¶ comment of the job {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 1}
-
time
¶ provides timestamps for status transitions {‘protected’: 1, ‘defvalue’: <GangaCore.GPIDev.Lib.Job.JobTime.JobTime object at 0x7f268a6c8710>, ‘changable_at_resubmit’: 0}
-
application
¶ specification of the application to be executed {‘protected’: 0, ‘defvalue’: <GangaCore.Lib.Executable.Executable.Executable object at 0x7f268a6c9890>, ‘changable_at_resubmit’: 0}
-
backend
¶ specification of the resources to be used (e.g. batch system) {‘protected’: 0, ‘defvalue’: <GangaCore.Lib.Localhost.Localhost.Localhost object at 0x7f268a6c98f0>, ‘changable_at_resubmit’: 0}
-
inputfiles
¶ list of file objects that will act as input files for a job {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
outputfiles
¶ list of file objects decorating what have to be done with the output files after job is completed {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
id
¶ unique Ganga job identifier generated automatically {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
status
¶ current state of the job, one of “new”, “submitted”, “running”, “completed”, “completed_frozen”, “failed_frozen”, “killed”, “unknown”, “incomplete” {‘protected’: 1, ‘defvalue’: ‘new’, ‘changable_at_resubmit’: 0}
-
name
¶ optional label which may be any combination of ASCII characters {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
inputdir
¶ location of input directory (file workspace) {‘protected’: 1, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
outputdir
¶ location of output directory (file workspace) {‘protected’: 1, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
inputdata
¶ dataset definition (typically this is specific either to an application, a site or the virtual organization {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
outputdata
¶ dataset definition (typically this is specific either to an application, a site or the virtual organization {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
splitter
¶ optional splitter {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
subjobs
¶ list of subjobs (if splitting) {‘protected’: 1, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
master
¶ master job {‘protected’: 1, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
postprocessors
¶ list of postprocessors to run after job has finished {‘protected’: 0, ‘defvalue’: <GangaCore.GPIDev.Adapters.IPostProcessor.MultiPostProcessor object at 0x7f268a265650>, ‘changable_at_resubmit’: 0}
-
virtualization
¶ optional virtualization to be used {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
do_auto_resubmit
¶ Automatically resubmit failed subjobs {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
metadata
¶ the metadata {‘protected’: 1, ‘defvalue’: <GangaCore.GPIDev.Lib.Job.MetadataDict.MetadataDict object at 0x7f268a265770>, ‘changable_at_resubmit’: 0}
-
fqid
¶ fully qualified job identifier {‘protected’: 1, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
parallel_submit
¶ Enable Submission of subjobs in parallel {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
-
class
JobTemplate
¶ A placeholder for Job configuration parameters.
JobTemplates are normal Job objects but they are never submitted. They have their own JobRegistry, so they do not get mixed up with normal jobs. They have always a “template” status.
Create a job with an existing job template t:
j = Job(t)Save a job j as a template t:
t = JobTemplate(j)You may save commonly used job parameters in a template and create new jobs easier and faster.
Plugin category: jobs
-
inputsandbox
¶ list of File objects shipped to the worker node {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
outputsandbox
¶ list of filenames or patterns shipped from the worker node {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
info
¶ JobInfo {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
comment
¶ comment of the job {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 1}
-
time
¶ provides timestamps for status transitions {‘protected’: 1, ‘defvalue’: <GangaCore.GPIDev.Lib.Job.JobTime.JobTime object at 0x7f268a26fad0>, ‘changable_at_resubmit’: 0}
-
application
¶ specification of the application to be executed {‘protected’: 0, ‘defvalue’: <GangaCore.Lib.Executable.Executable.Executable object at 0x7f268a26fb30>, ‘changable_at_resubmit’: 0}
-
backend
¶ specification of the resources to be used (e.g. batch system) {‘protected’: 0, ‘defvalue’: <GangaCore.Lib.Localhost.Localhost.Localhost object at 0x7f268a26fb90>, ‘changable_at_resubmit’: 0}
-
inputfiles
¶ list of file objects that will act as input files for a job {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
outputfiles
¶ list of file objects decorating what have to be done with the output files after job is completed {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
id
¶ unique Ganga job identifier generated automatically {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
status
¶ current state of the job, one of “new”, “submitted”, “running”, “completed”, “killed”, “unknown”, “incomplete” {‘protected’: 1, ‘defvalue’: ‘template’, ‘changable_at_resubmit’: 0}
-
name
¶ optional label which may be any combination of ASCII characters {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
inputdir
¶ location of input directory (file workspace) {‘protected’: 1, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
outputdir
¶ location of output directory (file workspace) {‘protected’: 1, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
inputdata
¶ dataset definition (typically this is specific either to an application, a site or the virtual organization {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
outputdata
¶ dataset definition (typically this is specific either to an application, a site or the virtual organization {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
splitter
¶ optional splitter {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
subjobs
¶ list of subjobs (if splitting) {‘protected’: 1, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
master
¶ master job {‘protected’: 1, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
postprocessors
¶ list of postprocessors to run after job has finished {‘protected’: 0, ‘defvalue’: <GangaCore.GPIDev.Adapters.IPostProcessor.MultiPostProcessor object at 0x7f268a26fcb0>, ‘changable_at_resubmit’: 0}
-
virtualization
¶ optional virtualization to be used {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
do_auto_resubmit
¶ Automatically resubmit failed subjobs {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
metadata
¶ the metadata {‘protected’: 1, ‘defvalue’: <GangaCore.GPIDev.Lib.Job.MetadataDict.MetadataDict object at 0x7f268a26fd10>, ‘changable_at_resubmit’: 0}
-
fqid
¶ fully qualified job identifier {‘protected’: 1, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
parallel_submit
¶ Enable Submission of subjobs in parallel {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
The shareref table (shared directory reference counter table) provides a mechanism for storing metadata associated with Shared Directories (see help(ShareDir)), which may be referenced by other Ganga objects, such as prepared applications. When a Shared Directory is associated with a persisted Ganga object (e.g. Job, Box) its reference counter is incremented by 1. Shared Directories with a reference counter of 0 will be removed (i.e. the directory deleted) the next time Ganga exits.
Plugin category: sharerefs
-
class
ITask
¶ This is the framework of a task without special properties
Plugin category: tasks
-
transforms
¶ list of transforms {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
id
¶ ID of the Task {‘protected’: 1, ‘defvalue’: -1, ‘changable_at_resubmit’: 0}
-
name
¶ Name of the Task {‘protected’: 0, ‘defvalue’: ‘NewTask’, ‘changable_at_resubmit’: 0}
-
comment
¶ comment of the task {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
status
¶ Status - new, running, pause or completed {‘protected’: 1, ‘defvalue’: ‘new’, ‘changable_at_resubmit’: 0}
-
float
¶ Number of Jobs run concurrently {‘protected’: 0, ‘defvalue’: 0, ‘changable_at_resubmit’: 0}
-
metadata
¶ the metadata {‘protected’: 1, ‘defvalue’: <GangaCore.GPIDev.Lib.Job.MetadataDict.MetadataDict object at 0x7f268a608a70>, ‘changable_at_resubmit’: 0}
-
creation_date
¶ Creation date of the task {‘protected’: 1, ‘defvalue’: ‘19700101’, ‘changable_at_resubmit’: 0}
-
check_all_trfs
¶ Check all Transforms during each monitoring loop cycle {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
-
class
CoreTask
¶ General non-experimentally specific Task
Plugin category: tasks
-
transforms
¶ list of transforms {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
id
¶ ID of the Task {‘protected’: 1, ‘defvalue’: -1, ‘changable_at_resubmit’: 0}
-
name
¶ Name of the Task {‘protected’: 0, ‘defvalue’: ‘NewTask’, ‘changable_at_resubmit’: 0}
-
comment
¶ comment of the task {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
status
¶ Status - new, running, pause or completed {‘protected’: 1, ‘defvalue’: ‘new’, ‘changable_at_resubmit’: 0}
-
float
¶ Number of Jobs run concurrently {‘protected’: 0, ‘defvalue’: 0, ‘changable_at_resubmit’: 0}
-
metadata
¶ the metadata {‘protected’: 1, ‘defvalue’: <GangaCore.GPIDev.Lib.Job.MetadataDict.MetadataDict object at 0x7f268a608a70>, ‘changable_at_resubmit’: 0}
-
creation_date
¶ Creation date of the task {‘protected’: 1, ‘defvalue’: ‘19700101’, ‘changable_at_resubmit’: 0}
-
check_all_trfs
¶ Check all Transforms during each monitoring loop cycle {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
-
class
CoreUnit
¶ Documentation missing.
Plugin category: units
-
status
¶ Status - running, pause or completed {‘protected’: 1, ‘defvalue’: ‘new’, ‘changable_at_resubmit’: 0}
-
name
¶ Name of the unit (cosmetic) {‘protected’: 0, ‘defvalue’: ‘Simple Unit’, ‘changable_at_resubmit’: 0}
-
application
¶ Application of the Transform. {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
inputdata
¶ Input dataset {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
outputdata
¶ Output dataset {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
copy_output
¶ The dataset to copy the output of this unit to, e.g. Grid dataset -> Local Dataset {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
merger
¶ Merger to be run after this unit completes. {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
splitter
¶ Splitter used on each unit of the Transform. {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
postprocessors
¶ list of postprocessors to run after job has finished {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
inputsandbox
¶ list of File objects shipped to the worker node {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
inputfiles
¶ list of file objects that will act as input files for a job {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
outputfiles
¶ list of OutputFile objects to be copied to all jobs {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
info
¶ Info showing status transitions and unit info {‘protected’: 1, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
id
¶ ID of the Unit {‘protected’: 1, ‘defvalue’: -1, ‘changable_at_resubmit’: 0}
-
-
class
ArgSplitter
¶ Split job by changing the args attribute of the application.
This splitter only applies to the applications which have args attribute (e.g. Executable, Root), or those with extraArgs (GaudiExec). If an application has both, args takes precedence. It is a special case of the GenericSplitter.
This splitter allows the creation of a series of subjobs where the only difference between different jobs are their arguments. Below is an example that executes a ROOT script ~/analysis.C
- void analysis(const char* type, int events) {
- std::cout << type << ” ” << events << std::endl;
}
with 3 different sets of arguments.
s = ArgSplitter(args=[[‘AAA’,1],[‘BBB’,2],[‘CCC’,3]]) r = Root(version=’5.10.00’,script=’~/analysis.C’) j.Job(application=r, splitter=s)
Notice how each job takes a list of arguments (in this case a list with a string and an integer). The splitter thus takes a list of lists, in this case with 3 elements so there will be 3 subjobs.
Running the subjobs will produce the output: subjob 1 : AAA 1 subjob 2 : BBB 2 subjob 3 : CCC 3
Plugin category: splitters
-
args
¶ A list of lists of arguments to pass to script {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
class
GenericSplitter
¶ Split job by changing arbitrary job attribute.
This splitter allows the creation of a series of subjobs where the only difference between different jobs can be defined by giving the “attribute” and “values” of the splitter object.
For example, to split a job according to the given application arguments:
s = GenericSplitter() s.attribute = ‘application.args’ s.values = [[“hello”,”1”],[“hello”,”2”]] … … j = Job(splitter=s) j.submit()To split a job into two LCG jobs running on two different CEs:
s = GenericSplitter() s.attribute = ‘backend.CE’ s.value = [“quanta.grid.sinica.edu.tw:2119/jobmanager-lcgpbs-atlas”,”lcg00125.grid.sinica.edu.tw:2119/jobmanager-lcgpbs-atlas”] … … j = Job(backend=LCG(),splitter=s) j.submit()to split over mulitple attributes, use the multi_args option:
j = Job() j.splitter = GenericSplitter() j.splitter.multi_attrs = { “application.args”:[“hello1”, “hello2”], “application.env”:[{“MYENV”:”test1”}, {“MYENV”:”test2”}] }this will result in two subjobs, one with args set to ‘hello1’ and the MYENV set to ‘test1’, the other with args set to ‘hello2’ and the MYENV set to ‘test2’.
- Known issues of this generic splitter:
- it will not work if specifying different backends for the subjobs
Plugin category: splitters
-
attribute
¶ The attribute on which the job is splitted {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
values
¶ A list of the values corresponding to the attribute of the subjobs {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
multi_attrs
¶ Dictionary to specify multiple attributes to split over {‘protected’: 0, ‘defvalue’: {}, ‘changable_at_resubmit’: 0}
-
class
GangaDatasetSplitter
¶ Split job based on files given in GangaDataset inputdata field
Plugin category: splitters
-
files_per_subjob
¶ the number of files per subjob {‘protected’: 0, ‘defvalue’: 5, ‘changable_at_resubmit’: 0}
-
maxFiles
¶ Maximum number of files to use in a masterjob (None or -1 = all files) {‘protected’: 0, ‘defvalue’: -1, ‘changable_at_resubmit’: 0}
-
-
class
CoreTransform
¶ Documentation missing.
Plugin category: transforms
-
status
¶ Status - running, pause or completed {‘protected’: 1, ‘defvalue’: ‘new’, ‘changable_at_resubmit’: 0}
-
name
¶ Name of the transform (cosmetic) {‘protected’: 0, ‘defvalue’: ‘Simple Transform’, ‘changable_at_resubmit’: 0}
-
application
¶ Application of the Transform. {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
inputsandbox
¶ list of File objects shipped to the worker node {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
outputsandbox
¶ list of filenames or patterns shipped from the worker node {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
backend
¶ Backend of the Transform. {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
splitter
¶ Splitter used on each unit of the Transform. {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
postprocessors
¶ list of postprocessors to run after job has finished {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
unit_merger
¶ Merger to be copied and run on each unit separately. {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
copy_output
¶ The dataset to copy all units output to, e.g. Grid dataset -> Local Dataset {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
unit_copy_output
¶ The dataset to copy each individual unit output to, e.g. Grid dataset -> Local Dataset {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
run_limit
¶ Number of times a partition is tried to be processed. {‘protected’: 1, ‘defvalue’: 8, ‘changable_at_resubmit’: 0}
-
minor_run_limit
¶ Number of times a unit can be resubmitted {‘protected’: 1, ‘defvalue’: 3, ‘changable_at_resubmit’: 0}
-
major_run_limit
¶ Number of times a junit can be rebrokered {‘protected’: 1, ‘defvalue’: 3, ‘changable_at_resubmit’: 0}
-
units
¶ list of units {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
inputdata
¶ Input datasets to run over {‘protected’: 1, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
outputdata
¶ Output dataset template {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
inputfiles
¶ list of file objects that will act as input files for a job {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
outputfiles
¶ list of OutputFile objects to be copied to all jobs {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
metadata
¶ the metadata {‘protected’: 1, ‘defvalue’: <GangaCore.GPIDev.Lib.Job.MetadataDict.MetadataDict object at 0x7f268a4ccd70>, ‘changable_at_resubmit’: 0}
-
rebroker_on_job_fail
¶ Rebroker if too many minor resubs {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
abort_loop_on_submit
¶ Break out of the Task Loop after submissions {‘protected’: 0, ‘defvalue’: True, ‘changable_at_resubmit’: 0}
-
required_trfs
¶ IDs of transforms that must complete before this unit will start. NOTE DOESN’T COPY OUTPUT DATA TO INPUT DATA. Use TaskChainInput Dataset for that. {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
chain_delay
¶ Minutes delay between a required/chained unit completing and starting this one {‘protected’: 0, ‘defvalue’: 0, ‘changable_at_resubmit’: 0}
-
submit_with_threads
¶ Use Ganga Threads for submission {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
max_active_threads
¶ Maximum number of Ganga Threads to use. Note that the number of simultaneous threads is controlled by the queue system (default is 5) {‘protected’: 0, ‘defvalue’: 10, ‘changable_at_resubmit’: 0}
-
info
¶ Info showing status transitions and unit info {‘protected’: 1, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
id
¶ ID of the Transform {‘protected’: 1, ‘defvalue’: -1, ‘changable_at_resubmit’: 0}
-
unit_splitter
¶ Splitter to be used to create the units {‘protected’: 0, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
chaindata_as_inputfiles
¶ Treat the inputdata as inputfiles, i.e. copy the inputdata to the WN {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
files_per_unit
¶ Number of files per unit if possible. Set to -1 to just create a unit per input dataset {‘protected’: 0, ‘defvalue’: -1, ‘changable_at_resubmit’: 0}
-
fields_to_copy
¶ A list of fields that should be copied when creating units, e.g. application, inputfiles. Empty (default) implies all fields are copied unless the GeenricSplitter is used {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
-
class
GridFileIndex
¶ Data object for indexing a file on the grid.
@author: Hurng-Chun Lee @contact: hurngchunlee@gmail.com
Plugin category: GridFileIndex
-
id
¶ the main identity of the file {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
name
¶ the name of the file {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
md5sum
¶ the md5sum of the file {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
attributes
¶ a key:value pairs of file metadata {‘protected’: 0, ‘defvalue’: {}, ‘changable_at_resubmit’: 0}
-
-
class
GridftpFileIndex
¶ Data object containing Gridftp file index information.
- id: gsiftp URI
- name: basename of the file
- md5sum: md5 checksum
- attributes[‘fpath’]: path of the file on local machine
@author: Hurng-Chun Lee @contact: hurngchunlee@gmail.com
Plugin category: GridFileIndex
-
id
¶ the main identity of the file {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
name
¶ the name of the file {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
md5sum
¶ the md5sum of the file {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
attributes
¶ a key:value pairs of file metadata {‘protected’: 0, ‘defvalue’: {}, ‘changable_at_resubmit’: 0}
-
class
LCGFileIndex
¶ Data object containing LCG file index information.
@author: Hurng-Chun Lee @contact: hurngchunlee@gmail.com
Plugin category: GridFileIndex
-
id
¶ the main identity of the file {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
name
¶ the name of the file {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
md5sum
¶ the md5sum of the file {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
attributes
¶ a key:value pairs of file metadata {‘protected’: 0, ‘defvalue’: {}, ‘changable_at_resubmit’: 0}
-
lfc_host
¶ the LFC hostname {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
local_fpath
¶ the original file path on local machine {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
-
class
GridSandboxCache
¶ Helper class for upladong/downloading/deleting sandbox files on a grid cache.
@author: Hurng-Chun Lee @contact: hurngchunlee@gmail.com
Plugin category: GridSandboxCache
-
protocol
¶ file transfer protocol {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
max_try
¶ max. number of tries in case of failures {‘protected’: 0, ‘defvalue’: 1, ‘changable_at_resubmit’: 0}
-
-
class
GridftpSandboxCache
¶ Helper class for upladong/downloading/deleting sandbox files using lcg-cp/lcg-del commands with gsiftp protocol.
@author: Hurng-Chun Lee @contact: hurngchunlee@gmail.com
Plugin category: GridSandboxCache
-
protocol
¶ file transfer protocol {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
max_try
¶ max. number of tries in case of failures {‘protected’: 0, ‘defvalue’: 1, ‘changable_at_resubmit’: 0}
-
baseURI
¶ the base URI for storing cached files {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
copyCommand
¶ the command to be exectued to copy files {‘protected’: 0, ‘defvalue’: ‘globus-copy-url’, ‘changable_at_resubmit’: 0}
-
-
class
LCGSandboxCache
¶ Helper class for upladong/downloading/deleting sandbox files using lcg-cr/lcg-cp/lcg-del commands.
@author: Hurng-Chun Lee @contact: hurngchunlee@gmail.com
Plugin category: GridSandboxCache
-
protocol
¶ file transfer protocol {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
max_try
¶ max. number of tries in case of failures {‘protected’: 0, ‘defvalue’: 1, ‘changable_at_resubmit’: 0}
-
se
¶ the LCG SE hostname {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
se_type
¶ the LCG SE type {‘protected’: 0, ‘defvalue’: ‘srmv2’, ‘changable_at_resubmit’: 0}
-
se_rpath
¶ the relative path to the VO directory on the SE {‘protected’: 0, ‘defvalue’: ‘generated’, ‘changable_at_resubmit’: 0}
-
lfc_host
¶ the LCG LFC hostname {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
srm_token
¶ the SRM space token, meaningful only when se_type is set to srmv2 {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
-
class
LCGRequirements
¶ Helper class to group LCG requirements.
See also: JDL Attributes Specification at http://cern.ch/glite/documentation
Plugin category: LCGRequirements
-
software
¶ Software Installations {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
nodenumber
¶ Number of Nodes for MPICH jobs {‘protected’: 0, ‘defvalue’: 1, ‘changable_at_resubmit’: 0}
-
memory
¶ Mininum available memory (MB) {‘protected’: 0, ‘defvalue’: 0, ‘changable_at_resubmit’: 0}
-
cputime
¶ Minimum available CPU time (min) {‘protected’: 0, ‘defvalue’: 0, ‘changable_at_resubmit’: 0}
-
walltime
¶ Mimimum available total time (min) {‘protected’: 0, ‘defvalue’: 0, ‘changable_at_resubmit’: 0}
-
ipconnectivity
¶ External connectivity {‘protected’: 0, ‘defvalue’: False, ‘changable_at_resubmit’: 0}
-
allowedCEs
¶ allowed CEs in regular expression {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
excludedCEs
¶ excluded CEs in regular expression {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
datarequirements
¶ The DataRequirements entry for the JDL. A list of dictionaries, each with “InputData”, “DataCatalogType” and optionally “DataCatalog” entries {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
dataaccessprotocol
¶ A list of strings giving the available DataAccessProtocol protocols {‘protected’: 0, ‘defvalue’: [‘gsiftp’], ‘changable_at_resubmit’: 0}
-
other
¶ Other Requirements {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
-
class
CondorRequirements
¶ Helper class to group Condor requirements.
See also: http://www.cs.wisc.edu/condor/manual
Plugin category: condor_requirements
-
machine
¶ Requested execution hosts, given as a string of space-separated names: ‘machine1 machine2 machine3’; or as a list of names: [ ‘machine1’, ‘machine2’, ‘machine3’ ] {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
excluded_machine
¶ Excluded execution hosts, given as a string of space-separated names: ‘machine1 machine2 machine3’; or as a list of names: [ ‘machine1’, ‘machine2’, ‘machine3’ ] {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
opsys
¶ Operating system {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
arch
¶ System architecture {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
memory
¶ Mininum physical memory {‘protected’: 0, ‘defvalue’: 0, ‘changable_at_resubmit’: 0}
-
virtual_memory
¶ Minimum virtual memory {‘protected’: 0, ‘defvalue’: 0, ‘changable_at_resubmit’: 0}
-
other
¶ Other requirements, given as a list of strings, for example: [ ‘OSTYPE == “SLC4”’, ‘(POOL == “GENERAL” || POOL == “GEN_FARM”)’ ]; the final requirement is the AND of all elements in the list {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
-
class
Docker
¶ The job will be run inside a container using Docker or UDocker as the virtualization method. Docker is tried first and if not installed or permission do not allow it, UDocker is installed and used.
j=Job() j.virtualization = Docker(“fedora:latest”)
The mode of the UDocker running can be modified. The P1 mode is working almost everywhere but might not give the best performance. See https://github.com/indigo-dc/udocker for more details about Udocker.
If the image is a private image, the username and password of the deploy token can be given like
j.virtualization.tokenuser = ‘gitlab+deploy-token-123’ j.virtualization.tokenpassword = ‘gftrh84dgel-245^ghHH’
Note that images stored in a docker repository hosted by Github at present doesn’t work with uDocker as uDocker is not updated to the latest version of the API.
Directories can be mounted from the host to the container using key-value pairs to the mounts option.
j.virtualization.mounts = {‘/cvmfs’:’/cvmfs’}
Plugin category: virtualization
-
image
¶ Link to the container image {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
tokenuser
¶ Deploy token username {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
tokenpassword
¶ Deploy token password {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
mounts
¶ Mounts to attempt from the host system. The key is the directory name on the host, and the value inside the container. If the directory is not available on the host, it will just be silently dropped from the list of mount points. {‘protected’: 0, ‘defvalue’: {‘/cvmfs’: ‘/cvmfs’}, ‘changable_at_resubmit’: 0}
-
options
¶ A list of options to pass onto the virtualization command. {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
mode
¶ Mode of container execution {‘protected’: 0, ‘defvalue’: ‘P1’, ‘changable_at_resubmit’: 0}
-
-
class
Singularity
¶ The Singularity class can be used for either Singularity or Docker images. It requires that singularity is installed on the worker node.
For Singularity images you provide the image name and tag from Singularity hub like
j=Job() j.application=Executable(exe=File(‘my/full/path/to/executable’)) j.virtualization = Singularity(“shub://image:tag”)Notice how the executable is given as a File object. This ensures that it is copied to the working directory and thus will be accessible inside the container.
The container can also be provided as a Docker image from a repository. The default repository is Docker hub.
j.virtualization = Singularity(“docker://gitlab-registry.cern.ch/lhcb-core/lbdocker/centos7-build:v3”)
j.virtualization = Singularity(“docker://fedora:latest”)
Another option is to provide a GangaFile Object which points to a singularity file. In that case the singularity image file will be copied to the worker node. The first example is with an image located on some shared disk. This will be effective for running on a local backend or a batch system with a shared disk system.
imagefile = SharedFile(‘myimage.sif’, locations=[‘/my/full/path/myimage.sif’]) j.virtualization = Singularity(image= imagefile)while a second example is with an image located in the Dirac Storage Element. This will be effective when using the Dirac backend.
imagefile = DiracFile(‘myimage.sif’, lfn=[‘/some/lfn/path’]) j.virtualization = Singularity(image= imagefile)If the image is a private image, the username and password of the deploy token can be given like the example below. Look inside Gitlab setting for how to set this up. The token will only need access to the images and nothing else.
j.virtualization.tokenuser = ‘gitlab+deploy-token-123’ j.virtualization.tokenpassword = ‘gftrh84dgel-245^ghHH’Directories can be mounted from the host to the container using key-value pairs to the mounts option. If the directory is not available on the host, a warning will be written to stderr of the job and no mount will be attempted.
j.virtualization.mounts = {‘/cvmfs’:’/cvmfs’}By default the container is started in singularity with the –nohome option. Extra options can be provided through the options attribute. See the Singularity documentation for what is possible.
If the singularity binary is not available in the PATH on the remote node - or has a different name, it is possible to give the name of it like
j.virtualization.binary=’/cvmfs/oasis.opensciencegrid.org/mis/singularity/current/bin/singularity’Plugin category: virtualization
-
image
¶ Link to the container image. This can either be a singularity URL or a GangaFile object {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
tokenuser
¶ Deploy token username {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
tokenpassword
¶ Deploy token password {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
mounts
¶ Mounts to attempt from the host system. The key is the directory name on the host, and the value inside the container. If the directory is not available on the host, it will just be silently dropped from the list of mount points. {‘protected’: 0, ‘defvalue’: {‘/cvmfs’: ‘/cvmfs’}, ‘changable_at_resubmit’: 0}
-
options
¶ A list of options to pass onto the virtualization command. {‘protected’: 0, ‘defvalue’: [], ‘changable_at_resubmit’: 0}
-
binary
¶ The virtualization binary itself. Can be an absolute path if required. {‘protected’: 0, ‘defvalue’: ‘singularity’, ‘changable_at_resubmit’: 0}
-
-
class
LSF
LSF backend - submit jobs to Load Sharing Facility.
Plugin category: backends
-
queue
queue name as defomed in your local Batch installation {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
extraopts
extra options for Batch. See help(Batch) for more details {‘protected’: 0, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 1}
-
id
Batch id of the job {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
exitcode
Process exit code {‘protected’: 1, ‘defvalue’: None, ‘changable_at_resubmit’: 0}
-
actualqueue
queue name where the job was submitted. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
actualCE
hostname where the job is/was running. {‘protected’: 1, ‘defvalue’: ‘’, ‘changable_at_resubmit’: 0}
-
-
class
GangaList
¶ Documentation missing.
Plugin category: internal