Main Page
Introduction
Sun Microsystems T1000
The RSA Factoring Challenge
The Constructive Approach
Evaluation Summary
Future Research
Evaluation Chronology
System Configurations
Financial Disclaimer
Postscript
© Grebyn Corp. 2006
|
|
General Purpose CPU / System Platforms
I have a request into Azul Systems
which also has a try-and-buy program with their network attached compute
device with similar characteristics, albeit marketed as a Java Compute
Engine. I don't yet know whether this machine will allow native execution
of programs in other languages. (1/2/2007 - Nope. Java only.)
It would also be interesting to look at other systems, such as the Cell Processor, the
KiloCore
CPU, and other commodity multi-core systems as they become available.
Another, wild, off-the-wall, out-of-the-box approach would be to create an
Application Specific Integrated Circuit (ASIC) or Field Programmable Gate
Array (FPGA) to see if they could be used here, since the implementation
using a standard commercial CPU and Operating System wastes so much chip
real estate and functionality.
Algorithm Research
The Aha! moment when moving the implementation off of decimal powers
to binary powers for storage has lit a fire regarding the extension to
higher powers (from 8 to 16, even possibly 32) for implementation. At the
next boundary (16 bits), the current naive implementation of pre-computing
tables will require 8GB per thread (or 256GB for a T1000 environment). This
could possibly be addressed through more or larger disk drives or attached
network storage (after all, the T1000 does have 4 Gigabit Ethernet (GbE)
connections).
Doing increased data lengths will eventually favor the 64 bit systems over
the 32 bit systems because of the ability to natively perform arithmetic on
larger values. How much that improvement will be is currently unknown.
The current implementation also does not take advantage of the extensive
memory available on the T1000. It is not clear whether there is a
convenient way to do so without overly complicating it.
The current implementation is also very non-distributed and also does not
provide any reasonable mechanism for check-pointing or management of work
performed. I just ran across The Eight Fallacies of
Distributed Computing, which certainly will have to be taken into
consideration when distributing the workload.
It's not clear that the implementation really takes maximum advantage of the
multicore architecture. Articles in the industry, such as Multicore
faces a long road and Making the
Move to Multicore, and my experience here, make it clear that
programming multicore systems for performance is going to be more than just
a simplistic approach of running N copies of a homogeneous application on
a system in order to maximize utilization. The T1000 is clearly a nice
platform for doing research and development on multicore programming.
Previous
Next
|
 |
 |
 |
|