Limulus Project Software Release Notes Date: 01/12/14 Release Nickname: Morrison Hotel Base OS: Scientific Linux 6.x Contents: 1. Assistance 2. Root Password And System Overview 3. User Accounts 4. Documentation 5. Powering Up Nodes 6. Executing Commands On Nodes 7. Node Log Files 8. Changing The System Host Name 9. Fan Control and Thermal Management 10. SATA Ports Configuration 11. Optional RAID Configuration 12. Ganglia Configuration 13. Installed Software 1. Assistance ============= Paid support customers may contact Basement Supercomputing as follows: Submit questions to: http://basement-supercomputing.com/qa/ Email: support@basement-supercomputing.com 2. Root Password And System Overview ==================================== Root password is: changeme Machine name: limulus Operating System: Scientific Linux 6.6 File System (sizes may vary and /home may be on raid array): / 50G /boot 485M /home 62G The worker nodes do not automatically power-up on booting the machine. (See below) Nodes operate using a RAM-Disk. Both /opt and /home are mounted on the nodes via NFS. Packages and libraries are added in /opt. Root and users can ssh to all nodes The IP address and node names are: 10.0.0.10 n0 10.0.0.11 n1 10.0.0.12 n2 In the 10.0.0 subnet, the main node is called: 10.0.0.1 limulus headnode The second Ethernet port is configured to request a DHCP IP address. Limulus 200 and 400 models use the Intel i7 processor. These processors have four physical cores and support Intel Hyper-threading (HT). Intel HT will report 8 cores per processor. For some HPC applications these four extra virtual processors have been shown to slow down floating point operations. Therefore, all Limulus model 200 systems have Intel HT turned off by default (set in the BIOS). Indeed, there are four physical FPU units per processor that can only be effectively utilized by four cores. 3. User Accounts ================ Users are created using standard the "useradd" command New user names will be propagate to nodes within 5 minutes. New user ssh keys are automatically generated on first login. When logging into nodes for the first time or when logging into newly booted bodes, the following message will be printed. /usr/bin/xauth: creating new authority file /root/.Xauthority This message can be safely ignored. 4. Documentation ================ There is on-line documentation available. There are pointers to reference material in the documentation. Point the browser to: # firefox http://localhost/limulus To see the documentation in text mode without starting a browser enter: # links /usr/share/doc/limulus-doc/index.html The current Limulus reference manual is part of the Cluster Documentation Project http://cdp.clustermonkey.net/index.php/Limulus_Manual Note: to make the documentation viewable over a local network, edit the /etc/httpd/conf.d/limulus.conf as root and add an "Allow" line in place of the # Allow from .example.com For instance, to allow local access to the limulus documentation for a local network with 192.168.0.0/255.255.255.0 add the following: Allow from 192.168.0.0/24 There may be multiple instances that need replacing. Reload the web server using # service httpd reload The same can be done for ganglia using the /etc/httpd/conf.d/ganglia.conf file. 5. Powering Up/Down Nodes ========================= Low Level Power Control ----------------------- At the lowest level, node power is applied by using the "relayset" command. The relayset command controls the four power relays. The power relays are mapped as follows: relay 1 empty relay 2 n0 relay 3 n1 relay 4 n2 The relays are initialized on boot by "relayset -init". NOTE: Manual re-initialization will result in a power-cycle for all worker nodes. For example, after initial power-on of the base system, to turn on node n0 enter: # relayset 2 on The remaining nodes can be powered up as follows: (node n1, then node n2) # relayset 3 on # relayset 4 on To power off a node n0, # relayset 2 off Keep in mind, turning the power off with relayset will not gracefully shut the node down. It is like pulling the plug on the nodes. If the node has an attached disk drive, the drive may not be shutdown properly. It is best to use the high level scripts below to power the nodes ON and OFF. Other options for "relayset" are described below. Note relayset is designed to be "silent" so that it can be easily used in other scripts To initialize (do first): relayset init To turn relay on/off: relayset 1|2|3|4 on|off To get status: relayset 1|2|3|4 status (Returns 1 if on, 0 if off) To list the devices found: relayset list To print debug messages add "debug" to the command line Returns -1 on error, 0 or 1 if successful. High Level Power Control (Recommended) -------------------------------------- There are two higher level power control scripts, node-poweron and node-poweroff. These scripts are the preferred way to turn nodes ON or OFF. node-poweron No node arguments turns all nodes ON. If a node is already on, nothing will happen. Node name(s) can be given as argument(s) in the range {n0,...,n6}. For example: # node-poweron n0 n2 # node-poweron -s n1 # node-poweron -s Invalid nodes will be ignored. Default Limulus nodes are {n0,n1,n2} The script waits until all nodes are started or the process times out. -s runs in quiet mode; -h provides this help. node-poweroff No node arguments turns all nodes OFF. If a node is already off, nothing will happen. Node name(s) can be given as argument(s) in the range {n0,...,n6}. For example: # node-poweroff n0 n2 # node-poweroff -s n1 # node-poweroff -s Invalid nodes will be ignored. Default Limulus nodes are {n0,n1,n2} A delay is included so nodes can properly shutdown before power is removed. Any node attached drives are placed in stand-by mode. -s runs in quiet mode; -h provides this help. System Power ON and OFF ----------------------- The three worker nodes are controlled by the main node and DO NOT start when the system is booted. If you would like the nodes to start when the machine is booted, simply add the following at the end of the /etc/rc.local file: node-poweron When the system is rebooted or halted, the nodes are turned OFF gracefully (i.e. the OS is shutdown). If there are any attached disks, they are placed in standby mode (# hdparm -Y /dev/hda). This step is important because the drives remain powered when the nodes are off. On reboot, the drives with wake-up and work as expected. 6. Executing Commands On Nodes ============================== ssh may be used to execute commands on nodes or to login directly to the nodes. You may also use "pdsh" utility to execute commands on all or some of the nodes. For instance to run "uptime" on all the nodes: # pdsh uptime n1: 17:54:46 up 3 min, 0 users, load average: 0.00, 0.00, 0.00 n0: 17:54:46 up 3 min, 0 users, load average: 0.00, 0.00, 0.00 n2: 17:54:46 up 3 min, 0 users, load average: 0.00, 0.01, 0.00 Node status is checked every 60 seconds. If a node is active, it will be used by the pdsh command. Recently started or rebooted nodes, may not respond to pdsh right away. Recently shutdown nodes may be cause an ssh time-out. The whats-up package is used to maintain the file pointed to by the WCOLL environment variable. Individual or groups of nodes can be addressed with pdsh as follows using the "-w" option: # pdsh -w n1 hostname n1: n1 # pdsh -w n[0,2] hostname n2: n2 n0: n0 # pdsh -w n[0-2] hostname n1: n1 n0: n0 n2: n2 Please see the pdsh manpage for more information. 7. Node Log Files ================= Rebooting nodes causes the local logs to be lost. To provide record of node activity the node log files are mirrored (using rsyslogd) on the headnode and placed in /var/log/nodes/{n0,n1,n2}.log These logs are written to the local disk on the headnode. They are also rotated every week and kept for four weeks. Node logs are also written locally (to RAM disk) and purged daily and not rotated. 8. Changing The System Host Name ================================ As configured, each Limulus system assumes the hostname is "limulus" The LAN interface is configured for DHCP. It you want to change the hostname to a FQHN the following steps are needed to ensure Grid Engine works properly. Note: the "headnode" alias is used provide a consistent name for the login node (or "headnode") A. Edit /etc/sysconfig/network and provide a new HOSTNAME. In this example we will use "waldo.basement-supercomputing.com" B. Edit /etc/hosts on the head/main node add change the 10.0.0.1 line to reflect you new hostname, the example below used the new name "waldo.basement-supercomputing.com" For example, change; 10.0.0.1 limulus headnode to 10.0.0.1 waldo.basement-supercomputing.com waldo headnode Or, if you have a static IP address and want to include it in your hosts file: 192.168.0.42 waldo.basement-supercomputing.com waldo 10.0.0.1 waldo headnode C. Change the /opt/gridengine/default/common/host_aliases to look like: headnode waldo.basement-supercomputing.com waldo D. In order for NFS v4 to work properly you need to provide a "Domain" in the /etc/idmapd.conf file. Edit this file and replace the line (or add after the line): #Domain = local.domain.edu with your local domain. For example: Domain = basement-supercomputing.com This file will also be sent to nodes on boot-up. If this is not set when the host has FQDN, all NFS mounted files on the nodes will have the owner "nobody" and not be usable by their respective owners. E. Change the /etc/hosts file in the VNFS. This requires several steps. First edit /var/chroots/sl62_base/etc/hosts and change the line: 10.0.0.1 limulus headnode to 10.0.0.1 waldo.basement-supercomputing.com waldo headnode Next rebuild the VNFS or # wwvnfs --chroot=/var/chroots/sl62_base/ --hybridpath=/vnfs When asked "overwrite the Warewulf VNFS Image," enter "yes" The new VNFS with updated /etc/hosts will be saved in the data store and used on next reboot. F. It is advisable to reboot the headnode and restart the worker nodes at this point so that the hostname change can take effect. 9. Fan Control and Thermal Management ===================================== The front fans are controlled using the fancontrol daemon. If the worker processors become hot the fan speed is increased. The fancontrol daemon is started on boot using the limfanctl rc script. A daemon called "fancontrol" and "limulus-node-temp" are started and monitor the temperatures and fan speed. The configuration file for the fancontrol daemon is in /etc/fancontrol. This file should not be changed. On boot and shutdown the fans will run at high speed for a short time because the limfanctl service is not running. If for some reason you need to restart the limfanctl daemons enter: # service limfanctl restart To check that the limfanctl daemons are running, enter: # service limfanctl status Thermal Throttling ------------------ The Intel Haswell line of processors are known to run hot. This is partially due to the "turbo mode" that increases the clock speed while trying to keep the processor within the thermal specification. In addition, if a critical temperature is reached (often about 90C for the Haswell), the processor will lower the clock speed to reduce the temperature. In order to keep each Limulus system as quiet as possible, the front intake fans are low noise with a high air-flow. This design may under some circumstance result in thermal throttling of the node processors. Each application has it's own temperature profile and many can take advantage of Intel turbo mode and not hit the throttling limit. Interestingly, with many parallel applications the use of turbo mode adds little to the performance. This behavior is due to parallel applications using all the cores running at full speed. Under these conditions there is no extra frequency headroom to bump the clock speeds. The actual thermal profile is application dependent. As shipped, Intel turbo mode is enabled on all cores. Should you notice that certain applications result in throttling (you can observer throttling events int /var/log/messages) you can disable turbo mode on the nodes and thus lower the temperatures and not impact performance. To turn off turbo mode in the nodes, the "node-turbo-off" script can be run. For example: # sh node-turbo-off Turning off turbo mode on nodes: n0 n1 n2 Node n0 turbo mode is OFF Node n1 turbo mode is OFF Node n2 turbo mode is OFF To turn turbo mode back on, use "node-turbo-on." The status of the nodes turbo mode can be found using the "node-turbo-status" command. There is no need to turn turbo mode off on the main node. Also, it is a good idea to keep the default "powersave" governor setting for the node processors. 10. SATA Ports Configuration ============================ Model 100 and 200 (Limulus HPC): -------------------------------- Each Limulus HPC has a total of ten SATA (6 Gb/s) ports connected to the main motherboard. There are six on-board SATA ports and an add-in card with four additional ports. There are seven removable storage slots available on the case. These are as follows: 1 - DVD slot 2 - 2.5 inch slots (for SSD) 4 - 3.5 inch slots (for spinning disk) The DVD slot and the eSATA port on top of the case are connected to the 4-port SATA card. The two 2.5 inch (SSD) slots and the four 3.5 inch (spinning disk) slots are connected to the main motherboard. Depending on how your system is configures, some or all of these storage slots will have drives in them. There are two available SATA ports on the add-in card. There are also two 2.5 inch internal slots on the bottom of the case. It is possible to use these extra ports with two additional 2.5 inch drives. These drives slots are not removable. The three compute nodes operate as disk-less and have no disks connected. As a reference, each node does have six SATA (6 Gb/s) ports on the motherboard. Model 300 and 400 (Limulus Hadoop): ----------------------------------- Each Limulus Hadoop has six SATA (6 Gb/s) ports on both the main motherboard and worker nodes. There are a total of ten removable storage slots on the case. These are as follows: 2 - 3.5 inch slots (for spinning disk) 8 - 2.5 inch slots (for SSD) The two 3.5 inch slots and two of the 2.5 inch slots are connected to the main motherboard. The worker nodes each have two 2.5 inch slots connected to their motherboard. The eSATA port on top of the case is connected to a SATA port on the main motherboard. This configuration leaves one open SATA slot on the main motherboard that could be used for a 2.5 inch drive mounted on the inside bottom of the case. The layout of of the 2.5 inch SSD drives is as follows: top of case +---------------------+ | headnode | n0 | | SATA 1 | SATA 1 | |---------------------| | n1 | n2 | | SATA 1 | SATA 1 | |---------------------| | headnode | n0 | | SATA 2 | SATA 2 | |---------------------| | n1 | n2 | | SATA 2 | SATA 2 | +---------------------+ 11. Optional RAID Configuration If the system has preconfigured RAID sets, be sure to add a notification email to /etc/mdadm.conf. Monitoring is started in /etc/r.local by issuing an "mdadm --monitor --scan --daemonize" 12. Ganglia Configuration ========================= The latest version of the Ganglia monitoring system allows much more user configuration than previous versions. In order to allow for customization, Ganglia is installed in "edit" mode that allows any viewer to change the configuration of the Ganglia web page. The "edit" mode is set by disabling the authorization system. This setting has been made in /etc/ganglia/conf.php by adding (before the final "?>"): $conf['auth_system'] = 'disabled'; The full configuration is in /usr/share/ganglia/conf_default.php. Do not edit this file, however, make the changes in /etc/ganglia/conf.php. Once Ganglia is configured, the authorization can be reset by removing the setting in /etc/ganglia/conf.php. The system will then be "readonly." 13. Installed HPC Software ========================== All Limulus software has been configured to work seamlessly after installation. Adding other packages may require additional configuration. Environment Modules are used to integrate most HPC software to the cluster environment. For instance, to use the Sun Grid Engine batch scheduler you must enter: # module load sge6 Before using any of the commands (e.g qsub) You may view the batch queue and worker nodes using the "userstat" utility. (load sge module first) See the documentation for more information on the "modules" package. Ganglia should start automatically on the nodes when booted. You can view the ganglia interface by setting the browser to: http://localhost/ganglia See above #4 for making ganglia page viewable on the local LAN. YUM Repositories: ----------------- In addition to the Scientific Linux repositories, both the Limulus (limulus.repo) and the EPEL - Extra Packages for Enterprise Linux (epel.repo) are enabled. Be careful when using other repositories as similar, but incompatible, versions of some software may be available. NOTE: kernel updates are disabled in /etc/yum.conf. Limulus systems require 3.x kernels to support the features of the latest Intel processors. When necessary, these kernels will be placed in the Limulus update repository. Also, YUM will check for updates once a day, see /etc/cron.daily/yum-autoupdate. Installed HPC Software: ----------------------- * Warewulf Cluster Toolkit - Cluster provisioning and administration * PDSH - Parallel Distributed Shell for collective administration * Open Grid Scheduler - previously Sun Grid Engine Resource Scheduler * Ganglia - Cluster Monitoring System * GNU Compilers (gcc, g++, g77, gdb) - Standard GNU compiler suite * Modules - Manages User Environments * MPICH - MPI Library (message passing middleware) * OPEN-MPI - MPI Library (message passing middleware) * Open-MX - Myrinet Express over Ethernet * ATLAS - host tuned BLAS library * OpenBLAS - hand tuned BLAS library * FFTW - Optimized FFT library * FFTPACK - FFT library * LAPACK and BLAS - Reference Linear Algebra library * SCALAPACK - linear algebra routines for parallel distributed memory machines * PETSc - data structures and routines for parallel PDE solvers * GNU GSL - GNU Scientific Library (over 1000 functions) * PADB - Parallel Application Debugger Inspection Tool * Userstat - a "top" like job queue/node monitoring application * Beowulf Performance Suite - benchmark and testing suite * relayset - power relay control utility and scripts * ssmtp - mail forwarder for nodes * whatsup - node status using ping * Julia - Easy To Use High Performance Parallel Scientific Language