Introduction
This document describes how to run your simulations on the CCS oneSIS
cluster.
Model
The oneSIS cluster is organized as follows:
bsub
/-----------------> compute node 1 --> cpu 1
ssh / |-------> cpu 2
your computer -----> ares -------------------> compute node 2 --> cpu 3
| \ |-------> cpu 4
| \------------------> compute node 3 --> cpu 5
| \ |-------> cpu 6
| \ ... ....
| \---------------> compute node n --> cpu 2n-1
| |-------> cpu 2n
| |
|--------------|------------------------------|
nfs V nfs
/scratch00
As you can see from the above diagram, you cannot run simulations
directly on the compute nodes. To use the cluster, ssh into the head
node ares, then submit your jobs using the LSF
queuing system command bsub. Both
ares and the compute nodes share /scratch00 through nfs.
The program bsub allocates compute nodes and runs
programs on the compute cpus. Each compute node can run two or 4 simulations depending on the number of cpu cores it has.
NOTE: THE COMPUTE NODES CAN ONLY SEE /scratch00!! Keep this in mind when writing your scripts.
Getting Started
The compute nodes have no shared libraries and no executables. Any
executable you run on the compute nodes must be statically linked for
this reason. Here is a sample compile to run on the oneSIS cluster:
- Choose a compiler.
>module load intel
- Compile the code statically
>icc -static program.c -o a.out
- Now that you have your staticly linked executable a.out
you can copy the executable and any data you might need to start the run
to the shared file system /scratch00.
>mkdir -p /scratch00/john/run1
>cp ./a.out /scratch00/john/run1/
>cp ./data.txt /scratch00/john/run1/
- Now that you have your executable and all your data in
/scratch00 you need to use the LSF queuing system
to submit your job.
For more details
look at the LSF page here
|