Cluster: Difference between revisions
No edit summary |
|||
(25 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
IIHE local cluster | IIHE local cluster | ||
<center><span style="font-size: 300%;">'''Deprecated''', see [http://t2bwiki.iihe.ac.be T2B wiki]</span></center> | |||
== Overview == | == Overview == | ||
Line 28: | Line 30: | ||
Servers : datang (/data, /software), lxserv (/user), x4500 (/ice3) | Servers : datang (/data, /software), lxserv (/user), x4500 (/ice3) | ||
[[Image:Batch.png|center]] | |||
== How to connect<br> == | == How to connect<br> == | ||
Line 36: | Line 44: | ||
<pre>ssh username@icecube.iihe.ac.be | <pre>ssh username@icecube.iihe.ac.be | ||
</pre> | </pre> | ||
TIP : ''icecube.iihe.ac.be'' &''lxpub.iihe.ac.be'' points automatically to available UI's (ui01, ui02, ...)<br> | TIP : ''icecube.iihe.ac.be'' & ''lxpub.iihe.ac.be'' points automatically to available UI's (ui01, ui02, ...)<br> | ||
<br> | <br> | ||
Line 59: | Line 67: | ||
Here is a description of most useful directories<br> | Here is a description of most useful directories<br> | ||
=== /user/username<br> === | === /user/{username}<br> === | ||
Your home folder<br> | Your home folder<br> | ||
Line 67: | Line 75: | ||
Main data repository<br> | Main data repository<br> | ||
=== / | ==== /data/user/{username} ==== | ||
Users data folder | |||
==== / | ==== /data/ICxx<br> ==== | ||
IceCube datasets | |||
=== /software<br> === | |||
The local software area<br> | |||
=== /ice3<br> === | |||
This folder | This folder is the old software area. We strongly recommend you to build your tools in the /software directory | ||
=== /cvmfs<br> === | |||
Centralised CVMFS software repository for IceCube and CERN (see [[IceCube_Software_Cluster#CVMFS|CMFS]]) | |||
== Batch System == | == Batch System == | ||
Line 144: | Line 101: | ||
The cluster is decomposed in queues | The cluster is decomposed in queues | ||
{| width="1064" cellspacing="1" cellpadding=" | {| width="1064" cellspacing="1" cellpadding="5" border="1" align="center" | ||
|- | |- | ||
! scope="col" | <br> | ! scope="col" | <br> | ||
Line 151: | Line 108: | ||
! scope="col" | standard | ! scope="col" | standard | ||
! scope="col" | highmem | ! scope="col" | highmem | ||
! scope="col" | express | |||
! scope="col" | gpu | ! scope="col" | gpu | ||
|- | |- | ||
! scope="row" | Description | ! scope="row" | Description | ||
| nowrap="nowrap" align="center" | default queue, all available nodes<br> | | nowrap="nowrap" align="center" | default queue, all available nodes (except GPUs & express)<br> | ||
| nowrap="nowrap" align="center" | <br> | | nowrap="nowrap" align="center" | 2 Gb RAM<br> | ||
| nowrap="nowrap" align="center" | <br> | | nowrap="nowrap" align="center" | 3 Gb RAM<br> | ||
| nowrap="nowrap" align="center" | <br> | | nowrap="nowrap" align="center" | 4 Gb RAM<br> | ||
| nowrap="nowrap" align="center" | Limited walltime<br> | |||
| nowrap="nowrap" align="center" | GPU's dedicated queue<br> | | nowrap="nowrap" align="center" | GPU's dedicated queue<br> | ||
|- | |- | ||
! scope="row" | CPU's | ! scope="row" | CPU's (Jobs) | ||
| nowrap="nowrap" align="center" | | | nowrap="nowrap" align="center" | 494<br> | ||
| nowrap="nowrap" align="center" | 88<br> | | nowrap="nowrap" align="center" | 88<br> | ||
| nowrap="nowrap" align="center" | 384<br> | | nowrap="nowrap" align="center" | 384<br> | ||
| nowrap="nowrap" align="center" | | | nowrap="nowrap" align="center" | 8<br> | ||
| nowrap="nowrap" align="center" | | | nowrap="nowrap" align="center" | 24<br> | ||
| nowrap="nowrap" align="center" | 14<br> | |||
|- | |- | ||
! scope="row" | Walltime default/limit | ! scope="row" | Walltime default/limit | ||
| nowrap="nowrap" | | nowrap="nowrap" align="center" colspan="4" | 144 hours (6 days) / 240 hours (10 days) | ||
| nowrap="nowrap" align="center" | 15 hours / 15 hours<br> | |||
| nowrap="nowrap" align="center" | 50 hours / 50 hours<br> | |||
|- | |- | ||
! scope="row" | Memory default/limit | ! scope="row" | Memory default/limit | ||
| nowrap="nowrap" align="center" | <br> | | nowrap="nowrap" align="center" | 2 Gb<br> | ||
| nowrap="nowrap" align="center" | <br> | | nowrap="nowrap" align="center" | 2 Gb<br> | ||
| nowrap="nowrap" align="center" | <br> | | nowrap="nowrap" align="center" | 3 Gb<br> | ||
| nowrap="nowrap" align="center" | <br> | | nowrap="nowrap" align="center" | 4 Gb<br> | ||
| nowrap="nowrap" align="center" | <br> | | nowrap="nowrap" align="center" | 3 Gb<br> | ||
| nowrap="nowrap" align="center" | 6 Gb<br> | |||
|} | |} | ||
Line 187: | Line 150: | ||
''OPTIONS'' | ''OPTIONS'' | ||
-q queueName : choose the queue (default: any) | *-q queueName : choose the queue (default: any) | ||
-N jobName : name of the job | *-N jobName : name of the job | ||
-I : pass in interactive mode | *-I : (capital i) pass in interactive mode | ||
-m : mail | *-m mailaddress : set mail address (use in conjonction with -m) : MUST be @ulb.ac.be or @vub.ac.be | ||
-l : resources options | *-m [a|b|e] : send mail on job status change (a = aborted , b = begin, e = end) | ||
*-l : resources options | |||
<br> | <br> | ||
Line 201: | Line 166: | ||
=== Job management === | === Job management === | ||
To see all jobs (running / queued), you can use the '''qstat''' command or go to the [http://ganglia.iihe.ac.be/addons/job_monarch/?c= | To see all jobs (running / queued), you can use the '''qstat''' command or go to the [http://ganglia.iihe.ac.be/addons/job_monarch/?c=nodes JobMonArch] page | ||
<pre>qstat | <pre>qstat | ||
</pre> | </pre> | ||
''OPTIONS'' | ''OPTIONS'' | ||
-u username : list only jobs submitted by username | * -u username : list only jobs submitted by username | ||
* -n : show nodes where jobs are running | |||
* -q : show the job repartition on queues | |||
<br> | |||
== Backup == | |||
/user, /data and /ice3 use snapshot backup system. | |||
<br> | <br> | ||
They're located in : | |||
*/backup_user | |||
**daily : 5 | |||
**weekly : 4 | |||
**monthly : 3 | |||
*/backup_data | |||
**daily : 6 | |||
*/backup_ice3 | |||
**daily : 5 | |||
**weekly : 4 | |||
**monthly : 3 | |||
Each directory contains a snapshot of a specific period. | |||
To restore, just copy files from those directory to the location you want. | |||
== Useful links == | == Useful links == | ||
Line 218: | Line 205: | ||
[http://ganglia.iihe.ac.be Ganglia Monitoring] : Servers status | [http://ganglia.iihe.ac.be Ganglia Monitoring] : Servers status | ||
[http://ganglia.iihe.ac.be/addons/job_monarch/?c= | [http://ganglia.iihe.ac.be/addons/job_monarch/?c=nodes JobMonArch] : Jobs overview |
Latest revision as of 15:03, 16 September 2016
IIHE local cluster
Overview
The cluster is composed by 4 machine types :
- User Interfaces (UI)
This is the cluster front-end, to use the cluster, you need to log into those machines
Servers : ui01, ui02
- Computing Element (CE)
This server is the core of the batch system : it run submitted jobs on worker nodes
Servers : ce
- Worker Nodes (WN)
This is the power of the cluster : they run jobs and send the status back to the CE
Servers : slave*
- Storage Elements
This is the memory of the cluster : they contains data, software, ...
Servers : datang (/data, /software), lxserv (/user), x4500 (/ice3)
How to connect
To connect to the cluster, you must use your IIHE credentials (same as for wifi)
ssh username@icecube.iihe.ac.be
TIP : icecube.iihe.ac.be & lxpub.iihe.ac.be points automatically to available UI's (ui01, ui02, ...)
After a successful login, you'll see this message :
========================================== Welcome on the IIHE ULB-VUB cluster Cluster status http://ganglia.iihe.ac.be Documentation http://wiki.iihe.ac.be/index.php/Cluster IT Help support-iihe@ulb.ac.be ========================================== username@uiXX:~$
Your default current working directory is your home folder.
Directory Structure
Here is a description of most useful directories
/user/{username}
Your home folder
/data
Main data repository
/data/user/{username}
Users data folder
/data/ICxx
IceCube datasets
/software
The local software area
/ice3
This folder is the old software area. We strongly recommend you to build your tools in the /software directory
/cvmfs
Centralised CVMFS software repository for IceCube and CERN (see CMFS)
Batch System
Queues
The cluster is decomposed in queues
any | lowmem | standard | highmem | express | gpu | |
---|---|---|---|---|---|---|
Description | default queue, all available nodes (except GPUs & express) |
2 Gb RAM |
3 Gb RAM |
4 Gb RAM |
Limited walltime |
GPU's dedicated queue |
CPU's (Jobs) | 494 |
88 |
384 |
8 |
24 |
14 |
Walltime default/limit | 144 hours (6 days) / 240 hours (10 days) | 15 hours / 15 hours |
50 hours / 50 hours | |||
Memory default/limit | 2 Gb |
2 Gb |
3 Gb |
4 Gb |
3 Gb |
6 Gb |
Job submission
To submit a job, you just have to use the qsub command :
qsub myjob.sh
OPTIONS
- -q queueName : choose the queue (default: any)
- -N jobName : name of the job
- -I : (capital i) pass in interactive mode
- -m mailaddress : set mail address (use in conjonction with -m) : MUST be @ulb.ac.be or @vub.ac.be
- -m [a|b|e] : send mail on job status change (a = aborted , b = begin, e = end)
- -l : resources options
Job management
To see all jobs (running / queued), you can use the qstat command or go to the JobMonArch page
qstat
OPTIONS
- -u username : list only jobs submitted by username
- -n : show nodes where jobs are running
- -q : show the job repartition on queues
Backup
/user, /data and /ice3 use snapshot backup system.
They're located in :
- /backup_user
- daily : 5
- weekly : 4
- monthly : 3
- /backup_data
- daily : 6
- /backup_ice3
- daily : 5
- weekly : 4
- monthly : 3
Each directory contains a snapshot of a specific period. To restore, just copy files from those directory to the location you want.
Useful links
Ganglia Monitoring : Servers status
JobMonArch : Jobs overview