LSF - Load Sharing Facility
The Center for Computational Science's oneSIS Linux cluster uses
Platform computing's LSF Load Sharing Facility software. Please
familiarize yourself with the oneSIS cluster before continuing
here.
All jobs that require continous cpu time must be run under LSF.
All jobs that require continous cpu
time that are run without using the LSF queueing system will be
terminated.
Following are links and information on how to use the LSF queueing system
Queues
QSUB directives in scripts
LSF commands
Example scripts
Multiple CPU Jobs
Commands
These are the 3 commands that you need to know how to use.
bsub - submits request.
bkill - deletes request.
bjobs - displays status of request(s).
bqueues - displays the available queue(s).
When logged in on the oneSIS cluster type man command for more information.
QSUB Directives
These directives are placed at the top of the script and tell the LSF
system where to put output files and the amount of resources you
need. For more information read the man pages for bsub
#BSUB -e filename Where to place stderr messages from job.
#BSUB -o filename Where to place stdout messages from
job.
#BSUB -R "mem=800" Amount of memory the job requires in MB.
LSF Commands
These are the commands you will use to submit, check the status of, and
delete jobs.
bjobs -u all Displays all running requests.
bkill ## Deletes request ##.
bsub -qqueue-name < scriptname Submits scriptname to the
LSF system.
Scripts and LSF
Example 1
% bsub < my_script
Where the file my_script contains:
#!/bin/sh
#BSUB -J single_cpu
#BSUB -q ccs_short
#BSUB -W 00:15
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -n 1
echo Start Job
date
pwd
./a.out
echo End Job
bsub will accept parameters both from the command line and from the
lines in the script file preceded by #BSUB. When a bsub
option is found both in the command line and in a script, the command line
specification takes precedence.
The -q ccs_short option specifies that this job will be spooled to the ccs_short queue. The "-J single_cpu" option
specifies a jobname.
The -e %J.err option behaves for stderr as -o
does for stdout. The -W option specifies that the job should
terminate after running for 15 minutes. -n 1 specifies that the job will need one CPU.
Links
The above examples were based on information available on the following
web sites which are listed with links below:
Los Alamos
National Laboratory
CERN
MIMAS
|