Seminars People Information Computing Research

Introduction

This document describes how to run your simulations on the CCS oneSIS cluster.

Model

The oneSIS cluster is organized as follows:

                               bsub 
                            /-----------------> compute node 1 --> cpu 1
               ssh         /                             |-------> cpu 2
your computer ----->  ares -------------------> compute node 2 --> cpu 3
                       |  \                              |-------> cpu 4
                       |   \------------------> compute node 3 --> cpu 5
                       |    \                            |-------> cpu 6
                       |     \ ...                                 ....
                       |      \---------------> compute node n --> cpu 2n-1
                       |                                 |-------> cpu 2n
                       |                                             |
                       |--------------|------------------------------|
                           nfs        V                    nfs
                                   /scratch00

As you can see from the above diagram, you cannot run simulations directly on the compute nodes. To use the cluster, ssh into the head node ares, then submit your jobs using the LSF queuing system command bsub. Both ares and the compute nodes share /scratch00 through nfs.

The program bsub allocates compute nodes and runs programs on the compute cpus. Each compute node can run two or 4 simulations depending on the number of cpu cores it has.

NOTE: THE COMPUTE NODES CAN ONLY SEE /scratch00!! Keep this in mind when writing your scripts.

Getting Started

The compute nodes have no shared libraries and no executables. Any executable you run on the compute nodes must be statically linked for this reason. Here is a sample compile to run on the oneSIS cluster:

  1. Choose a compiler.
    >module load intel
    
  2. Compile the code statically
    >icc -static program.c -o a.out
    
  3. Now that you have your staticly linked executable a.out you can copy the executable and any data you might need to start the run to the shared file system /scratch00.
    >mkdir -p /scratch00/john/run1
    >cp ./a.out /scratch00/john/run1/
    >cp ./data.txt /scratch00/john/run1/
    
  4. Now that you have your executable and all your data in /scratch00 you need to use the LSF queuing system to submit your job. For more details look at the LSF page here
Tulane Tulane University
201 Lindy Boggs Center
Computational Science
6823 St. Charles Ave.
New Orleans, LA 70118
(504)862-8391 ccs@tulane.edu