Friday, February 8, 2008

Memory Interleaving:

Memory Interleaving

The demands of new higher speed processors and parallel processing has made memory throughput (bandwidth) a bottleneck in modern computer systems. Memory interleaving is a technique implemented to increase the maximal throughput of data a memory system can provide per unit time. However, memory interleaving does not effect memory latency as discussed in the cache section.

Memory interleaving is implemented by dividing the memory system into a number independent banks which can answer read or write requests independently, in parallel. For example the Intel Orion 450GX chip set (discontinued) for the Intel Pentium Pro processor used four-way interleaved memory by dividing its memory into four banks. The most extreme interleaved design is used in current SMP vector supercomputers (Cray) which may have up to 256 way interleaved memory banks!

In a typical four-way interleaved memory system the SIMM's are divided logically into four banks. When lines of data are written to the memory four lines of data may be written simultaneously because each line can be written to each bank separately, in parallel. In contrast, in a non-interleaved system only one line may be written to memory in the same amount of time. Therefore four-way interleaved memory can read and write data four times faster than non- interleaved memory at its maximal rate. Imagine what a 256 way interleaved memory supercomputer can do! You may want to note that this technique of interleaving is analogous to disk striping in a RAID system (see our hard drive technical section) and increases through put in the same manor.

To obtain the maximal throughput of interleaved memory the data must be prefetched. Prefetch techniques are automatically utilized by pipelined and superscalar CPU's. The use of Prefetch loops is particularly important for iterative loops like "for (i=1; i<n; i++) { ... a[i] ... }." Where the CPU must Prefetch elements of a[i] from memory before they are actually called for. Problems are also encountered in matrix and vector math when an operation requires access to data in a sequence that is some multiple of the interleaving. In this cases all the data needed is in the same memory banks. However, modern compilers, especially Fortran compilers will address such issues for the programmer.

Most PC chip sets like the Intel Triton-II (430HX and 430VX) and the Intel Natoma (440FX) do not support memory interleaving because Intel does not believe PC users saturate their memory bus. However, many number crunching applications do saturate the memory bus. Increasingly Multi-processor PC's are being used for such applications and we hope to see future Intel Chip sets support memory Interleaving. We hope you send your thoughts on this to Intel. (Hint).

No comments: