Seminars People Information Computing Research

LSF - Load Sharing Facility

The Center for Computational Science's oneSIS Linux cluster uses Platform computing's LSF Load Sharing Facility software. Please familiarize yourself with the oneSIS cluster before continuing here.

All jobs that require continous cpu time must be run under LSF. All jobs that require continous cpu time that are run without using the LSF queueing system will be terminated. Following are links and information on how to use the LSF queueing system

Queues
QSUB directives in scripts
LSF commands
Example scripts
Multiple CPU Jobs


Commands
These are the 3 commands that you need to know how to use.

bsub - submits request.
bkill - deletes request.
bjobs - displays status of request(s).
bqueues - displays the available queue(s).

When logged in on the oneSIS cluster type man command for more information.


QSUB Directives
These directives are placed at the top of the script and tell the LSF system where to put output files and the amount of resources you need. For more information read the man pages for bsub

#BSUB -e filename Where to place stderr messages from job.
#BSUB -o filename Where to place stdout messages from job.
#BSUB -R "mem=800" Amount of memory the job requires in MB.


LSF Commands
These are the commands you will use to submit, check the status of, and delete jobs.

bjobs -u all Displays all running requests.

bkill ## Deletes request ##.

bsub -qqueue-name < scriptname Submits scriptname to the LSF system.


Scripts and LSF

Example 1

% bsub < my_script

Where the file my_script contains:
#!/bin/sh
#BSUB -J single_cpu
#BSUB -q ccs_short
#BSUB -W 00:15
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -n 1

echo Start Job
date
pwd
./a.out
echo End Job

bsub will accept parameters both from the command line and from the lines in the script file preceded by #BSUB. When a bsub option is found both in the command line and in a script, the command line specification takes precedence.

The -q ccs_short option specifies that this job will be spooled to the ccs_short queue. The "-J single_cpu" option specifies a jobname. The -e %J.err option behaves for stderr as -o does for stdout. The -W option specifies that the job should terminate after running for 15 minutes. -n 1 specifies that the job will need one CPU.


Links

The above examples were based on information available on the following web sites which are listed with links below:

Los Alamos National Laboratory

CERN

MIMAS

Tulane Tulane University
201 Lindy Boggs Center
Computational Science
6823 St. Charles Ave.
New Orleans, LA 70118
(504)862-8391 ccs@tulane.edu