Published on March 6, 2014
Research Computing at ILRI Alan Orth ICT Managers Meeting, ILRI, Kenya, 5 March 2014
Where we came from (2003) - 32 dual-core compute nodes - 32 * 2 != 64 - Writing MPI code is hard! - Data storage over NFS to “master” node - “Rocks” cluster distro - Revolutionary at the time!
Where we came from (2010) - Most of the original cluster removed - Replaced with single Dell PowerEdge R910 - 64 cores, 8TB storage, 128 GB - Threading is easier* than MPI! - Data is local - Easier to manage!
To infinity and beyond (2013) - A little bit back to the “old” model - Mixture of “thin” and “thick” nodes - Networked storage - Pure CentOS - Supermicro boxen - Pretty exciting! --->
Primary characteristics Computational capacity Data storage
Platform - 152 compute cores - 32* TB storage - 700 GB RAM - 10 GbE interconnects - LTO-4 tape backups (LOL?)
Homogeneous computing environment User IDs, applications, and data are available everywhere.
Scaling out storage with GlusterFS - Developed by Red Hat - Abstracts backend storage (file systems, technology, etc) - Can do replicate, distribute, replicate+distribute, geo-replication (off site!), etc - Scales “out”, not “up”
How we use GlusterFS [aorth@hpc: ~]$ df -h Filesystem Size ... wingu1:/homes 31T wingu0:/apps 31T wingu1:/data 31T Used Avail Use% Mounted on 9.5T 9.5T 9.5T 21T 21T 21T 32% /home 32% /export/apps 32% /export/data - Persistent paths for homes, data, and applications across the cluster. - These volumes are replicated, so essentially application-layer RAID1
GlusterFS <3 10GbE
- Project from Lawrence Livermore National Labs (LLNL) - Manages resources - Users request CPU, memory, and node allocations - Queues / prioritizes jobs, logs usage, etc - More like an accountant than a bouncer
How we use SLURM - Can submit “batch” jobs (long-running jobs, invoke program many times with different variables, etc) - Can run “interactively” (something that needs keyboard interaction) Make it easy for users to do the “right thing”: [aorth@hpc: ~]$ interactive -c 10 salloc: Granted job allocation 1080 [aorth@compute0: ~]$
Managing applications - Environment modules - http://modules. sourceforge.net - Dynamically load support for packages in a user’s environment - Makes it easy to support multiple versions, complicated packages with $PERL5LIB, package dependencies, etc
Managing applications Install once, use everywhere... [aorth@hpc: ~]$ module avail blast blast/2.2.25+ blast/2.2.26 blast/2.2.26+ blast/2. 2.28+ [aorth@hpc: ~]$ module load blast/2.2.28+ [aorth@hpc: ~]$ which blastn /export/apps/blast/2.2.28+/bin/blastn Works anywhere on the cluster!
Users and Groups - Consistent UID/GIDs across systems - LDAP + SSSD (also from Red Hat) is a great match - 389 LDAP works great with CentOS - SSSD is simpler than pam_ldap and does caching
More information and contact firstname.lastname@example.org http://hpc.ilri.cgiar.org/
This is the ILRI Research Computing wiki. Here you'll find information about the research computing infrastructure, including information about software ...
Agricultural research at ILRI and its partners are aimed at producing healthier crops and livestock to alleviate poverty and hunger in the developing world ...
The International Livestock Research Institute (ILRI) seeks to recruit a Computing Systems Analyst to analyse, develop and implement systems to ensure ILRI ...
ILRI Research Methods Group outputs. ILRI research repository - Mahider. ILRI research repository - Mahider. Website. ... ILRI research computing wiki.
The ILRI Research Methods Group provides Consortium Research Program and project level support covering all aspects of research methods (design of surveys ...
Computing Systems Analyst. Nairobi, Kenya. Internal Audit. Internal Audit Clerk. ... International Livestock Research Institute (ILRI) Jobs. Company Website;
GitHub is where people build software. ... Ansible playbooks for ILRI research-computing infrastructure Updated Sep 7, 2016. Java 4 553 DSpace ...
View 4226 Research Computing posts, presentations, experts, and more. Get the professional knowledge you need on LinkedIn.
Research Computing Cluster; Azizi Biorepository; LiveGene. Delivering improved genetics to the world’s small-scale livestock keepers