- Physical Memory
- DDR2 Controller
- User Memory Interface
- Frequently Asked Questions
- Revision History
The purpose of this document is to give an overview of the DDR2 memory system supported by the BEE2. In addition to a brief overview of the physical memory support on each BEE2 module, this document also describes the custom memory interface logic created for the BEE2 at the Berkeley Wireless Research Center.
Access to the raw DDR2 memory has been implemented through a set of interfaces which build upon one another. The following figure gives a high level implementation of this layered approach.
At the lowest level is the actual DDR2 memory controller. Each DIMM has its own independent DDR2 controller, each of which is heavily pipelined and supports simple bank management. On top of the low-level DDR2 controller is the user memory interface. The user memory interface is a simple, common interface that user logic can use to access memory. The interface is implemented as a series of asynchronous command FIFOs which provide command buffering and decouple the user clock domain from the 200MHz DDR2 controller clock domain. Finally, on top of the user memory interface, any arbitrary user logic can be implemented.
To provide multi-port access to each DIMM, we have provided a simple arbiter to the user memory interface for each DIMM. The switch is completely configurable in the number of ports, and has different modes of arbitration. The basic arbitration is priority round-robin. This gives a good balance between the needs of a priority based arbiter and provides starvation free arbitration. In addition to the basic mode, the arbiter also supports bursting. Bursting allows a requester to maintain arbitrated status while it holds down the request line up to a configurable window of cycles. This enables efficient bursting for applications which require bursting to maintain sufficient performance. Finally, in addition to these two modes of arbitration, the arbitration logic itself is highly self contained with a consistent interface, allowing users to easily implement their own arbitration scheme to suit their particular needs.
The multi-port switch provides each port the same user memory interface as is provided by the asynchronous command FIFO block. This means that any logic design to work directly with the user memory interface will work without modification with the multi-port switch. An example of a system which uses the multi-port switch can be seen below.
This example is similar to the setup of the reference Linux base system provided with the BSP. In this example the multi-port switch provides two ports. One is connected to a PLB attachment that allows the host processor to access the physical memory. The second port is attached to a frame buffer device (note that the OPB attachment for the frame buffer is not for data transfer, it is simply for control registers). In this example, bursting is turned on to meet the timing requirements of the framebuffer's DVI output.
Each BEE2 module has five Virtex-II Pro 70 FPGAs, each of which are connected to four fully independent DDR2 DIMM modules. At present the maximum memory capacity of each module is 1GB (although higher density DIMMs could bring this to 2GB or above) giving a total memory capacity of 20GB per BEE2 module. The FPGA is wired to accept a 72-bit data path, allowing the use of ECC storage bits as well.
For more information on the details of the physical connections to memory, please see the BEE2 module documentation which contains links to schematics, and data sheets. Additionally the Bee2Setup page has information about part numbers for DIMMs that have been verified to work with the BEE2.
The heart of the BEE2 memory system is the DDR2 controller. The controller is responsible for all of the low-level DRAM management data transfer tasks. The controller was originally based on the data path generated from the Xilinx MIG007 tool and then heavily modified at BWRC. The tasks performed by the controller include:
Reset, initialization, and clock generation
Read and write command issuance
Automatic bank management (can be turned off via parameter to save resources)
As of now, the controller allows access to ECC storage bits, but does not actually implement ECC. For more detailed information on the controller implementation, please see the core documentation directly for the ddr2_controller_v2_00_a core.
Although the DDR2 controller has independent data paths for read and write data, the controller does not allow simultaneous issuance of both read and write commands. However, the controller is highly pipelined and can issue a command every other cycle. The interface provided by the controller is fairly simple, although it runs at 200MHz and therefore can introduce timing problems if interacted with directly. The preferred way to interact with this interface is through the asynchronous command FIFO block (described below), but the lowest latency option is to use this interface directly. The following diagram shows the signals that comprise the interface.
Please note that although there are separate user_read and user_write signals, they may not be asserted at the same time, and therefore only a single read or write can be issued at a time.
Issuing a command to the controller is controlled by the user_ready signal. When this signal is high, it means that in that cycle it is possible to issue a new command. Because of the single cycle turn around on this signal, it often shows up on the critical path for this interface. The transaction happens in two phases. First the address is accepted, and second the data is fetched (for a write) or returned (for a read).
Phase 1: Address
When user_ready is high the user can assert either user_read or user_write (but not both) along with a valid 32-bit address. Note that although you present a 32-bit address, the lower 4-bits of the address will be unused because the controller transfers data aligned on 128-bit boundaries (actually it is 144-bits if you count the extra ECC storage bits). The transaction is accepted on the rising edge that sees both user_ready and user_read or user_write asserted. Ready will always go low for at least one cycle after accepting an address. An additional signal, user_half_burst is also sampled at issue time. If this signal is high, it means that the controller will only accept or generate one 144-bit data value. If it is low then it means the controller will accept or generate two 144-bit data values.
Phase 2: Data
Read transactions: Some time after (from 0 to n cycles) the completion of phase one the controller will assert the signal user_data_valid. This signal corresponds to valid data on the user_output_data bus. If the transaction is a half-burst transaction this signal will occur for only one cycle. If it is not a half-burst transaction the signal will be asserted for two cycles, however they will not necessarily be back to back.
Write transactions: Some time after (from 0 to n cycles) the completion of phase one the controller will assert the signal user_get_data. During the cycle this signal is asserted the user must have valid data and byte enables on the user_input_data and user_byte_enable buses. If the transaction is a half-burst transaction this signal will go high for only one cycle, and if not it will go high for two cycles, although not necessarily back to back.
The following timing diagrams describe the behavior of the DDR2 controller interface.
If the user_half_burst signal is asserted when issuing the command, then the user_get_data signal will only be asserted for one cycle and the controller only expects one 144-bit data value.
If the user_half_burst signal is asserted when issuing the command, then the user_data_valid signal will only be asserted for one cycle and the controller only sends one 144-bit data value.
User Memory Interface
Although interacting with the DDR2 controller interface directly is the highest performance, lowest latency option to access memory, most applications will have logic operating in a different clock domain than the 200MHz controller. To address this issue, we have provided a easier to use and clock decoupled interface called the user memory interface. The implementation of this interface is the async_ddr2_v2_00_a core which contains block RAM based asynchronous FIFOs to buffer commands and provide clock domain crossing.
The interface has several different modes of operation which are set by two parameters:
C_WIDE_DATA - This parameter sets the width of the data path presented to the user. The two choices are a wide data path of 288-bits and a narrow data path of 144-bits. The wide data path takes advantage of the internal block RAM MUXs to create the 288-bits of data required or produced by the DDR2 controller per transaction.
C_HALF_BURST - This parameter sets whether the DDR2 controller signal user_half_burst will be asserted or not. This parameter only has relevance when operating in narrow data path mode. When using a narrow data path this means that the user only needs to present or receive 144-bits per transaction, however it comes at the cost of losing half of the memory bandwidth.
Finally, the asynchronous command FIFO core also give the user the option to utilize a command tag FIFO that will maintain a 32-bit tag that corresponds to each read transaction. This makes certain applications easier given the split phase operation of commands.
The following diagram shows all the signals that comprise the user memory interface.
The interface is comprised of roughly three components. The command (Cmd) signals are involved with issuing a read or write command. The write (Wr) signals carry data and byte enables for writes, and the read (Rd) signals carry data and tag information from reads.
Just like the DDR2 controller, the asynchronous command FIFO block does only allows a single read or write command per cycle and no reordering of commands is done within the block. This is an important point for consistency and coherence issues for multi-core systems, as this means that this core acts as a synchronization point for all commands. It is possible to augment this core to perform reordering and optimization, but at this time it is safe to assume the same order in and out of the core.
Phase 1: Command Issue
To issue a command the user will assert Mem_Cmd_Valid and will present a valid address on Mem_Cmd_Address. The cycle in which the command is accepted the core will assert Mem_Cmd_Ack (this may be asserted during the first cycle Mem_Cmd_Valid is asserted).
Write transactions: Set Mem_Cmd_RNW low and place valid data and byte enables on Mem_Wr_Din and Mem_Wr_BE. When the command has been acknowledged the write is accepted an no further action is required.
Read transactions: Set Mem_Cmd_RNW high and place an optional tag on the Mem_Cmd_Tag bus.
Phase 2: Read data
When the read data has returned from the controller the controller will assert Mem_Rd_Valid and the Mem_Rd_Dout and Mem_Rd_Tag buses will be valid. The user can acknowledge the data at any time by asserting Mem_Rd_Ack for a single cycle.
Special Case: Narrow data mode and full burst mode
The only case where the description above differs is when the core has been configured with C_WIDE_DATA=0 and C_HALF_BURST=0. This case corresponds to utilizing the full bandwidth of the DDR2 controller, but without using the internal block RAM muxes. In this case the user is required to issue two commands for a write and will need to acknowledge two read data values for a read.
Write transactions: Both commands are issued as above, however, the second command's address is ignored and is implicitly taken as the address of command #1 + 16 bytes, and therefore the supplied data must correspond to this address.
Read transactions: The second data value returned will correspond to the data found at the address of command #1 + 16 bytes. The tag returned will be the same as the tag sent with the command.
Frequently Asked Questions
None so far.
Click the Info link below to see the revision history.