Cluster: Difference between revisions

From IIHE Wiki
Jump to navigationJump to search
No edit summary
No edit summary
 
(42 intermediate revisions by the same user not shown)
Line 1: Line 1:
IIHE local cluster
IIHE local cluster
<center><span style="font-size: 300%;">'''Deprecated''', see [http://t2bwiki.iihe.ac.be T2B wiki]</span></center>


== Overview ==
== Overview ==
Line 9: Line 11:
This is the cluster front-end, to use the cluster, you need to log into those machines<br>
This is the cluster front-end, to use the cluster, you need to log into those machines<br>


Servers : ui01, ui02<br>
Servers&nbsp;: ui01, ui02<br>


*Computing Element (CE)<br>  
*Computing Element (CE)<br>  
Line 15: Line 17:
This server is the core of the batch system&nbsp;: it run submitted jobs on worker nodes<br>
This server is the core of the batch system&nbsp;: it run submitted jobs on worker nodes<br>


Servers : ce<br>
Servers&nbsp;: ce<br>


*Worker Nodes (WN)<br>  
*Worker Nodes (WN)<br>  
Line 21: Line 23:
This is the power of the cluster&nbsp;: they run jobs and send the status back to the CE<br>
This is the power of the cluster&nbsp;: they run jobs and send the status back to the CE<br>


Servers : slave*<br>
Servers&nbsp;: slave*<br>


*Storage Elements<br>  
*Storage Elements<br>  
Line 27: Line 29:
This is the memory of the cluster&nbsp;: they contains data, software, ...<br>
This is the memory of the cluster&nbsp;: they contains data, software, ...<br>


Servers : datang (/data, /software), lxserv (/user), x4500 (/ice3)
Servers&nbsp;: datang (/data, /software), lxserv (/user), x4500 (/ice3)
 
 
 
 
 
[[Image:Batch.png|center]]


== How to connect<br> ==
== How to connect<br> ==


To connect to the cluster, you must use your IIHE credentials (same as for wifi)<br>
To connect to the cluster, you must use your IIHE credentials (same as for wifi)<br>
[[Image:Rrdns.png|center]]
<pre>ssh username@icecube.iihe.ac.be
<pre>ssh username@icecube.iihe.ac.be
</pre>  
</pre>  
TIP : icecube.iihe.ac.be points automatically to available UI's
TIP&nbsp;: ''icecube.iihe.ac.be'' &amp; ''lxpub.iihe.ac.be'' points automatically to available UI's (ui01, ui02, ...)<br>
 


<br>


After a successful login, you'll see this message&nbsp;:<br>
After a successful login, you'll see this message&nbsp;:<br>
Line 42: Line 52:
Welcome on the IIHE ULB-VUB cluster
Welcome on the IIHE ULB-VUB cluster


Cluster status http://ganglia.iihe.ac.be
Cluster status http://ganglia.iihe.ac.be
IT Help support-iihe@ulb.ac.be
Documentation  http://wiki.iihe.ac.be/index.php/Cluster
IT Help         support-iihe@ulb.ac.be
==========================================
==========================================


username@uiXX:~$
username@uiXX:~$
</pre>  
</pre>  
Your default current working directory is your home folder (/user/username), this one is shared amoung the whole cluster<br>
Your default current working directory is your home folder.<br>


<br>
<br>


<br>
== Directory Structure<br> ==
 
Here is a description of most useful directories<br>
 
=== /user/{username}<br> ===
 
Your home folder<br>
 
=== /data<br> ===
 
Main data repository<br>
 
==== /data/user/{username} ====
 
Users data folder
 
==== /data/ICxx<br> ====
 
IceCube datasets
 
=== /software<br> ===
 
The local software area<br>
 
=== /ice3<br> ===
 
This folder is the old software area. We strongly recommend you to build your tools in the /software directory
 
=== /cvmfs<br> ===
 
Centralised CVMFS software repository for IceCube and CERN (see [[IceCube_Software_Cluster#CVMFS|CMFS]])


== Batch System ==
== Batch System ==
Line 60: Line 101:
The cluster is decomposed in queues
The cluster is decomposed in queues


{| width="580" cellspacing="1" cellpadding="1" border="1"
{| width="1064" cellspacing="1" cellpadding="5" border="1" align="center"
|-
|-
! scope="col" |  
! scope="col" | <br>
! scope="col" | any
! scope="col" | any
! scope="col" | lowmem
! scope="col" | lowmem
! scope="col" | standard
! scope="col" | standard
! scope="col" | highmem
! scope="col" | highmem
! scope="col" | express
! scope="col" | gpu
! scope="col" | gpu
|-
|-
! scope="row" | Description
! scope="row" | Description
|  
| nowrap="nowrap" align="center" | default queue, all available nodes (except GPUs & express)<br>
|  
| nowrap="nowrap" align="center" | 2 Gb RAM<br>
|  
| nowrap="nowrap" align="center" | 3 Gb RAM<br>
|  
| nowrap="nowrap" align="center" | 4 Gb RAM<br>
|  
| nowrap="nowrap" align="center" | Limited walltime<br>
| nowrap="nowrap" align="center" | GPU's dedicated queue<br>
|-
|-
! scope="row" | CPU's
! scope="row" | CPU's (Jobs)
|  
| nowrap="nowrap" align="center" | 494<br>
|  
| nowrap="nowrap" align="center" | 88<br>
|  
| nowrap="nowrap" align="center" | 384<br>
|  
| nowrap="nowrap" align="center" | 8<br>
|  
| nowrap="nowrap" align="center" | 24<br>
| nowrap="nowrap" align="center" | 14<br>
|-
|-
! scope="row" | Walltime default/limit
! scope="row" | Walltime default/limit
|  
| nowrap="nowrap" align="center" colspan="4" | 144 hours (6 days) / 240 hours (10 days)
|  
| nowrap="nowrap" align="center" | 15 hours / 15 hours<br>
|  
| nowrap="nowrap" align="center" | 50 hours / 50 hours<br>
|  
|  
|-
|-
! scope="row" | Memory default/limit
! scope="row" | Memory default/limit
|  
| nowrap="nowrap" align="center" | 2 Gb<br>
|  
| nowrap="nowrap" align="center" | 2 Gb<br>
|  
| nowrap="nowrap" align="center" | 3 Gb<br>
|  
| nowrap="nowrap" align="center" | 4 Gb<br>
|  
| nowrap="nowrap" align="center" | 3 Gb<br>
| nowrap="nowrap" align="center" | 6 Gb<br>
|}
|}


Line 102: Line 145:
=== Job submission ===
=== Job submission ===


To submit a job, you just have to use the following command&nbsp;:
To submit a job, you just have to use the '''qsub''' command&nbsp;:
<pre>qsub myjob.sh
<pre>qsub myjob.sh
</pre>  
</pre>  
OPTIONS
''OPTIONS''
 
*-q queueName&nbsp;: choose the queue (default: any)


-q queueName&nbsp;: choose the queue (default: any)
*-N jobName&nbsp;: name of the job
 
*-I&nbsp;: (capital i) pass in interactive mode
 
*-m mailaddress&nbsp;: set mail address (use in conjonction with -m) : MUST be @ulb.ac.be or @vub.ac.be
 
*-m [a|b|e]&nbsp;: send mail on job status change (a = aborted , b = begin, e = end)
 
*-l&nbsp;: resources options
 
<br>
 
=== Job management ===
 
To see all jobs (running / queued), you can use the '''qstat''' command or go to the [http://ganglia.iihe.ac.be/addons/job_monarch/?c=nodes JobMonArch] page
<pre>qstat
</pre>
''OPTIONS''
 
* -u username&nbsp;: list only jobs submitted by username
 
* -n&nbsp;: show nodes where jobs are running
 
* -q&nbsp;: show the job repartition on queues
 
<br>
 
== Backup ==
 
/user, /data and /ice3 use snapshot backup system.
<br>
They're located in :


-N jobName : name of the job
*/backup_user
**daily : 5
**weekly : 4
**monthly : 3


-I : pass in interactive mode
*/backup_data
**daily : 6


-m&nbsp;: mail options
*/backup_ice3
**daily : 5
**weekly : 4
**monthly : 3


-l&nbsp;: resources options
Each directory contains a snapshot of a specific period.
To restore, just copy files from those directory to the location you want.


== Useful links ==
== Useful links ==
Line 121: Line 205:
[http://ganglia.iihe.ac.be Ganglia Monitoring]&nbsp;: Servers status
[http://ganglia.iihe.ac.be Ganglia Monitoring]&nbsp;: Servers status


[http://ganglia.iihe.ac.be/addons/job_monarch/?c=iihe JobMonArch]&nbsp;: Running jobs overview
[http://ganglia.iihe.ac.be/addons/job_monarch/?c=nodes JobMonArch]&nbsp;: Jobs overview

Latest revision as of 15:03, 16 September 2016

IIHE local cluster

Deprecated, see T2B wiki

Overview

The cluster is composed by 4 machine types :

  • User Interfaces (UI)

This is the cluster front-end, to use the cluster, you need to log into those machines

Servers : ui01, ui02

  • Computing Element (CE)

This server is the core of the batch system : it run submitted jobs on worker nodes

Servers : ce

  • Worker Nodes (WN)

This is the power of the cluster : they run jobs and send the status back to the CE

Servers : slave*

  • Storage Elements

This is the memory of the cluster : they contains data, software, ...

Servers : datang (/data, /software), lxserv (/user), x4500 (/ice3)



How to connect

To connect to the cluster, you must use your IIHE credentials (same as for wifi)

ssh username@icecube.iihe.ac.be

TIP : icecube.iihe.ac.be & lxpub.iihe.ac.be points automatically to available UI's (ui01, ui02, ...)


After a successful login, you'll see this message :

==========================================
Welcome on the IIHE ULB-VUB cluster

Cluster status  http://ganglia.iihe.ac.be
Documentation   http://wiki.iihe.ac.be/index.php/Cluster
IT Help         support-iihe@ulb.ac.be
==========================================

username@uiXX:~$

Your default current working directory is your home folder.


Directory Structure

Here is a description of most useful directories

/user/{username}

Your home folder

/data

Main data repository

/data/user/{username}

Users data folder

/data/ICxx

IceCube datasets

/software

The local software area

/ice3

This folder is the old software area. We strongly recommend you to build your tools in the /software directory

/cvmfs

Centralised CVMFS software repository for IceCube and CERN (see CMFS)

Batch System

Queues

The cluster is decomposed in queues


any lowmem standard highmem express gpu
Description default queue, all available nodes (except GPUs & express)
2 Gb RAM
3 Gb RAM
4 Gb RAM
Limited walltime
GPU's dedicated queue
CPU's (Jobs) 494
88
384
8
24
14
Walltime default/limit 144 hours (6 days) / 240 hours (10 days) 15 hours / 15 hours
50 hours / 50 hours
Memory default/limit 2 Gb
2 Gb
3 Gb
4 Gb
3 Gb
6 Gb


Job submission

To submit a job, you just have to use the qsub command :

qsub myjob.sh

OPTIONS

  • -q queueName : choose the queue (default: any)
  • -N jobName : name of the job
  • -I : (capital i) pass in interactive mode
  • -m mailaddress : set mail address (use in conjonction with -m) : MUST be @ulb.ac.be or @vub.ac.be
  • -m [a|b|e] : send mail on job status change (a = aborted , b = begin, e = end)
  • -l : resources options


Job management

To see all jobs (running / queued), you can use the qstat command or go to the JobMonArch page

qstat

OPTIONS

  • -u username : list only jobs submitted by username
  • -n : show nodes where jobs are running
  • -q : show the job repartition on queues


Backup

/user, /data and /ice3 use snapshot backup system.
They're located in :

  • /backup_user
    • daily : 5
    • weekly : 4
    • monthly : 3
  • /backup_data
    • daily : 6
  • /backup_ice3
    • daily : 5
    • weekly : 4
    • monthly : 3

Each directory contains a snapshot of a specific period. To restore, just copy files from those directory to the location you want.

Useful links

Ganglia Monitoring : Servers status

JobMonArch : Jobs overview