soprano.hpc.submitter.submit#
Definition of Submitter class
Base class for all Submitters to inherit from.
Classes
|
Submitter object |
- class soprano.hpc.submitter.submit.Submitter(name, queue, submit_script, max_jobs=4, check_time=10, max_time=3600, temp_folder=None, remote_workdir=None, remote_getfiles=['*.*'], ssh_timeout=1.0, continuation=False)[source]#
Bases:
object
Submitter object
Template to derive all specialised Submitters. These are meant to generate, submit and post-process any number of jobs on a queueing system in the form of a background process running on a head node. It implements methods that should be mostly overridden by the child classes. The following methods define its core behaviour:
next_job is the function that outputs the specification for each new job to submit. The specification should be a dict with two members, ‘name’ (a string) and ‘args’ (ideally a dict). If no more jobs are available it should return None;
setup_job takes as arguments name, args and folder (a temporary one created independently) and is supposed to generate the input files for the job before submission. It returns a boolean, confirming that the setup went well; if False, the job will be skipped;
check_job takes as arguments job ID, name, args and folder and should return a bool confirmation of whether the job has finished or not. By default it simply checks whether the job is still listed in the queue, however other checks can be implemented in its place;
finish_job takes as arguments name, args and folder and takes care of the post processing once a job is complete. Here meaningful data should be extracted and useful files copied to permament locations, as the temporary folder will be deleted immediately afterwards. It returns nothing;
start_run takes no arguments, executes at the beginning of a run;
finish_run takes no arguments, executes at the end of a run.
save_state takes no arguments, returns a dict. It is executed when continuation=True is used and a run terminates. It will allow the user to add class-specific data to the dictionary that is stored in the pickle file (in addition to the default, namely the list and info on currently running jobs). This should be used for example to store state information that is necessary for job generation. It should be composed of serialisable objects.
load_state takes as arguments the loaded data in dictionary form. It should perform the reverse operation of save_state, grabbing the info and restoring the Submitter’s state to its previous condition.
In addition, the Submitter takes a template launching script which can be tagged with keywords, mainly <name> for the job name or any other arguments present in args. These will be replaced with the appropriate values when the script is submitted.
Initialize the Submitter object
Args:name (str): name to be used for this Submitter (two Submitterswith the same name can’t be launched in the sameworking directory)queue (QueueInterface): object describing the properties of theinterface to the queue system in usesubmit_script (str): text of the script to use when submitting ajob to the queue. All tags of the form <name>will be replaced with the job’s name, and allsimilar tags of the form <[arg]> will bereplaced if the argument name is present inthe job’s args dictionarymax_jobs (Optional[int]): maximum number of jobs to submit at agiven time. Default is 4check_time (Optional[float]): time in seconds between consecutivechecks for the queue status andattempts to submit new jobs. Defaultis 10max_time (Optional[float]): time in seconds the Submitter will runfor before shutting down. If set tozero the thread won’t stop untilkilled with Submitter.stop.temp_folder (Optional[str]): where to store the temporary foldersfor the calculations. By default it’sthe current folder.remote_workdir (Optional[str]): if present, uses a directory on aremote machine by logging in viaSSH. Must be in the format<host>:<path/to/directory>.Host must be defined in the user’s~/.ssh/config file - check thedocs for RemoteTarget for moreinformation. It is possible toomit the colon and directory, thatwill use the home directory of thegiven folder; that is HEAVILYDISCOURAGED though. Best practicewould be to create an emptydirectory on the remote machineand use that, to avoid accidentaloverwriting/deleting of importantfiles.remote_getfiles (Optional[list(str)]): list of files to bedownloaded from the remotecopy of the job’s temporarydirectory. By default, allof them. Can be a listusing specific names,wildcards etc. Filenamescan also use theplaceholder {name} tosignify the job name, aswell as any other elementfrom the arguments.ssh_timeout (Optional[float]): connection timeout in seconds(default is 1 second)continuation (Optional[bool]): if True, when the Submitter isstopped it will not terminate thecurrent jobs; rather, it will storethe list in a pickle file.If the submitter is ran from thesame folder then it will “pick upfrom where it left” and tryrecovering those jobs, thenrestart. If one wishes foradditional values to be saved andrestored, the save_state andload_state methods need to bedefined.- _main_loop()[source]#
Main loop run as separate thread. Should not be edited when inheriting from the class
- add_signal(command, callback)[source]#
Add a signal listener to this submitter. Unix systems only allow for up to TWO user-defined signals to be specified.
Args:command (str): command that should be used to call this signal.This would be used as:python -m soprano.hpc.submitter <command> <file>and will trigger the callback’s executioncallback (function<self> => None): method of the user definedSubmitter class to use as a callback when thegiven signal is sent. Should accept and returnnothing.
- finish_job(name, args, folder)[source]#
Performs completion operations on the job. At this point any relevant output files should be copied from ‘folder’ to their final destination as the temporary folder itself will be deleted immediately after
- remove_signal(command)[source]#
Remove a previously defined custom signal by its assigned command.
Args:command (str): command assigned to the signal handler to remove.
- save_state()[source]#
Return a dictionary containing serialisable data to be saved from one run to the next