soprano.hpc.submitter.submit#

Definition of Submitter class

Base class for all Submitters to inherit from.

Classes

Submitter(name, queue, submit_script[, ...])

Submitter object

class soprano.hpc.submitter.submit.Submitter(name, queue, submit_script, max_jobs=4, check_time=10, max_time=3600, temp_folder=None, remote_workdir=None, remote_getfiles=['*.*'], ssh_timeout=1.0, continuation=False)[source]#

Bases: object

Submitter object

Template to derive all specialised Submitters. These are meant to generate, submit and post-process any number of jobs on a queueing system in the form of a background process running on a head node. It implements methods that should be mostly overridden by the child classes. The following methods define its core behaviour:

  1. next_job is the function that outputs the specification for each new job to submit. The specification should be a dict with two members, ‘name’ (a string) and ‘args’ (ideally a dict). If no more jobs are available it should return None;

  2. setup_job takes as arguments name, args and folder (a temporary one created independently) and is supposed to generate the input files for the job before submission. It returns a boolean, confirming that the setup went well; if False, the job will be skipped;

  3. check_job takes as arguments job ID, name, args and folder and should return a bool confirmation of whether the job has finished or not. By default it simply checks whether the job is still listed in the queue, however other checks can be implemented in its place;

  4. finish_job takes as arguments name, args and folder and takes care of the post processing once a job is complete. Here meaningful data should be extracted and useful files copied to permament locations, as the temporary folder will be deleted immediately afterwards. It returns nothing;

  5. start_run takes no arguments, executes at the beginning of a run;

  6. finish_run takes no arguments, executes at the end of a run.

  7. save_state takes no arguments, returns a dict. It is executed when continuation=True is used and a run terminates. It will allow the user to add class-specific data to the dictionary that is stored in the pickle file (in addition to the default, namely the list and info on currently running jobs). This should be used for example to store state information that is necessary for job generation. It should be composed of serialisable objects.

  8. load_state takes as arguments the loaded data in dictionary form. It should perform the reverse operation of save_state, grabbing the info and restoring the Submitter’s state to its previous condition.

In addition, the Submitter takes a template launching script which can be tagged with keywords, mainly <name> for the job name or any other arguments present in args. These will be replaced with the appropriate values when the script is submitted.

Initialize the Submitter object

Args:
name (str): name to be used for this Submitter (two Submitters
with the same name can’t be launched in the same
working directory)
queue (QueueInterface): object describing the properties of the
interface to the queue system in use
submit_script (str): text of the script to use when submitting a
job to the queue. All tags of the form <name>
will be replaced with the job’s name, and all
similar tags of the form <[arg]> will be
replaced if the argument name is present in
the job’s args dictionary
max_jobs (Optional[int]): maximum number of jobs to submit at a
given time. Default is 4
check_time (Optional[float]): time in seconds between consecutive
checks for the queue status and
attempts to submit new jobs. Default
is 10
max_time (Optional[float]): time in seconds the Submitter will run
for before shutting down. If set to
zero the thread won’t stop until
killed with Submitter.stop.
temp_folder (Optional[str]): where to store the temporary folders
for the calculations. By default it’s
the current folder.
remote_workdir (Optional[str]): if present, uses a directory on a
remote machine by logging in via
SSH. Must be in the format
<host>:<path/to/directory>.
Host must be defined in the user’s
~/.ssh/config file - check the
docs for RemoteTarget for more
information. It is possible to
omit the colon and directory, that
will use the home directory of the
given folder; that is HEAVILY
DISCOURAGED though. Best practice
would be to create an empty
directory on the remote machine
and use that, to avoid accidental
overwriting/deleting of important
files.
remote_getfiles (Optional[list(str)]): list of files to be
downloaded from the remote
copy of the job’s temporary
directory. By default, all
of them. Can be a list
using specific names,
wildcards etc. Filenames
can also use the
placeholder {name} to
signify the job name, as
well as any other element
from the arguments.
ssh_timeout (Optional[float]): connection timeout in seconds
(default is 1 second)
continuation (Optional[bool]): if True, when the Submitter is
stopped it will not terminate the
current jobs; rather, it will store
the list in a pickle file.
If the submitter is ran from the
same folder then it will “pick up
from where it left” and try
recovering those jobs, then
restart. If one wishes for
additional values to be saved and
restored, the save_state and
load_state methods need to be
defined.
_main_loop()[source]#

Main loop run as separate thread. Should not be edited when inheriting from the class

_putjob_remote(njob)[source]#

Copy the files generated for a job to a remote work directory

add_signal(command, callback)[source]#

Add a signal listener to this submitter. Unix systems only allow for up to TWO user-defined signals to be specified.

Args:
command (str): command that should be used to call this signal.
This would be used as:
python -m soprano.hpc.submitter <command> <file>
and will trigger the callback’s execution
callback (function<self> => None): method of the user defined
Submitter class to use as a callback when the
given signal is sent. Should accept and return
nothing.
check_job(job_id, name, args, folder)[source]#

Checks if given job is complete or not

finish_job(name, args, folder)[source]#

Performs completion operations on the job. At this point any relevant output files should be copied from ‘folder’ to their final destination as the temporary folder itself will be deleted immediately after

finish_run()[source]#

Operations to perform after the daemon thread stops running

load_state(loaded)[source]#

Replace attributes from loaded data in dictionary form

next_job()[source]#

Return a dictionary definition of the next job in line

remove_signal(command)[source]#

Remove a previously defined custom signal by its assigned command.

Args:
command (str): command assigned to the signal handler to remove.
save_state()[source]#

Return a dictionary containing serialisable data to be saved from one run to the next

set_parameters()[source]#

Set additional parameters. In this generic example class it has no arguments, but in specific implementations it will be used to add more variables without overriding __init__.

setup_job(name, args, folder)[source]#

Perform preparatory operations on the job

start_run()[source]#

Operations to perform when the daemon thread starts running

static stop(fname, subname)[source]#

Stop Submitter process from filename and name, return False if failed