Next: Test Script
Up: Dual Processor Nodes for
Previous: Results
Before we make any general conclusions, keep in mind that the performance
of your application(s) will certainly vary from those tested above.
Indeed, virtually all performance issues stem from the specific nature
of the application run on the cluster. It is therefore important that
you test your application before committing to specific hardware design.
The results are very interesting. For the GNU compiler suite we see
a range of 1.1 to 1.99 with an average speedup of 1.52. The results
for the Intel compiler follow these trends and range from 1.05 to
2.00 with and average speed-up of 1.38. Neglecting the difference
in compilers we see that the best average speed-up is barely over
1.5 times. In the case of CG there is no benefit to running two copies
of the program on a dual SMP node. In the case of EP, two copies run
virtually as fast as one copy indicating perfect speed-up. Or first
general conclusion is:
- memory contention on a dual SMP system is very application sensitive
and in many cases will reduce the efficiency of the second processor.
In general, the Intel compiler produced lower speed-up for most tests.
This result is attributed to better optimization of the CPU by the
Intel compilers which then increases the contention for memory access.
This result leads to our second general conclusion:
- Improving compiler efficiency may, in some cases, increase the memory
contention and therefore lower the efficiency of the second processor.
While the addition of a second CPU seems minimal from a cost perspective,
it may lead to false sense of efficiency for the cluster. Indeed,
the tests indicate that a program that normally takes 20 minutes on
a single CPU node, may take as long as 40 minutes on an active dual
CPU node. This situation may be further compounded by the fact that
the batch scheduler may place a program on the first or second processor
of any node and thus provide the program with a very heterogeneous
memory contention environment. (i.e. some nodes may have large memory
contention and other may have none.)
In these tests, we have not considered communication issues or the
mix of different programs on the same node. These issues will be addressed
in upcoming reports. In addition, we have run the tests on single
hardware platform with two compilers.
Next: Test Script
Up: Dual Processor Nodes for
Previous: Results
Douglas Eadline
2003-03-24