Scheduling Policy

The scheduling policy on KIDS is designed to facilitate three primary use cases. During weekdays, a reservation for interactive and development jobs ensures that users needing to run short test jobs have quick access to compute nodes. Capacity jobs, which require more walltime, but less user interaction are able to use the whole machine (in aggregate) nights and weekends, as well as a portion of the machine during weekdays. On Tuesdays, KIDS is taken down for preventative maintenance (PM) if necessary, after which, capability (full-machine) jobs are run. If there is demand for it, capability jobs may be run on Tuesday even if there is no maintenance.

Use Case Time Frame Nodes available to this class Max Job Time Max Job Size (nodes)
Development 9am–9pm ET M–F 60 Nodes 4 hours Up to 60 nodes
Capability Exclusive access to compute nodes, 24 nodes minimum All available nodes
Preventative Maintenance (PM) Tuesdays beginning at 8am ET N/A N/A N/A

Interactive and Development Jobs

Monday and Wednesday–Friday from 9:00 am – 9:00 pm ET, a reservation of 48 nodes will be set up to allow users fast turnaround for small test jobs, such as might be needed for development and debugging. Other jobs will not be allowed to run on this reservation to ensure availability.

Jobs which request 0–12 nodes for 0–4 hours are considered "development" jobs, and may run in the 48 node reservation. If the development reservation is already full, or not in place (on the weekend, for example), it will run on the first available nodes.

Note: this reservation is intended for development use, in particular, for jobs in which user interaction is critical between each job or (small) set of jobs. While production jobs may be able to fit on this reservation, that will be regarded as circumventing the scheduling policy.

Capability

Capability jobs are run weekly after the maintenance period. The maintenance period starts at 9:00am ET on Tuesday. If there is no need for maintenance, capability jobs may start immediately at 9:00. This is done so that the system does not need to "drain the queue" to free up enough resources to start the large jobs.

Capability jobs may take up to 48 hours, but if you plan to run for more than 4 hours in a week, please queue the job and contact us at help@xsede.org to let us know the job ID, how long you expect it to run, and what it is doing.

Fair Share

It may be possible for a single user or project to dominate the system by submitting a large number of jobs. To prevent this, a fair share strategy is used: The priority given to a job takes into account the recently run jobs by that user or project -- jobs from projects that have consumed a significant amount of processing time will have lower priority than jobs from projects that have not run many jobs recently. Thus, if users from a project submit a large number of jobs, other users can still cut in to access a portion of the machine.