-
Notifications
You must be signed in to change notification settings - Fork 3
Step Six: Memory IO
In order to store programs and variables, a CPU needs to have memory. Traditionally, main memory or RAM - Random Access Memory, is external to the CPU die. The FPGA development kit I'm using, the miniSpartan6+ has a 256Mb (32MB) SDRAM (Synchronous Dynamic RAM — it uses capacitors to store values). Because this was my first project with an FPGA, I decided not to use the SDRAM because its interface is not simple to use, and I wanted to focus on building a CPU, not an SDRAM controller (my new project, r32, does have a functioning SDRAM controller).
There are two types of RAM on FPGAs, distributed and block RAM. Distributed, as the name suggests, spreads the storage elements it uses across many logic blocks (using a significant amount of resources for large storage). FPGAs also have block RAMs, small SRAM modules. Both are relatively simple to implement, but at the time of this project, I hadn't heard of block RAM, so I opted for distributed RAM (which was fine size-wise, I only needed 8KB — 4K RAM, 4K ROM).
Anyway, the implementation for a RAM in VHDL is extremely similar to the register file we built earlier:
entity ram4kB is
port(addr : in std_logic_vector (11 downto 0);
clk : in std_logic;
store : in std_logic;
sel : in std_logic;
data_in : in std_logic_vector (7 downto 0);
data_out : out std_logic_vector (7 downto 0);
out_ctl : out std_logic);
end ram4kB;
architecture behavioral of ram4kB is
type arry_type is array (0 to 4095) of std_logic_vector (7 downto 0);
signal mem_contents : arry_type := (others => "00000000");
begin
write_data: process(clk)
begin
if (rising_edge(clk)) then
if (store = '1' and sel = '1') then
mem_contents(to_integer(unsigned(addr))) <= data_in;
end if;
end if;
end process;
data_out <= mem_contents(to_integer(unsigned(addr)));
out_ctl <= '1' when (store /= '1' and sel = '1') else '0';
end behavioral;
out_ctl
is high if data_out
is valid. This signal is used by the memory "controller" to determine which memory's data_out
(including I/O) should be fed to the execute unit.
Read Only Memory - ROM, has a slightly different implementation. Because it is never written to, it is simple to implement asynchronously. The only trick is initialization of the ROM contents. This is achieved by a process which runs once and sets each value of the main data array.
Here's the implementation:
entity rom4kB is
port(addr : in std_logic_vector (11 downto 0);
sel : in std_logic;
data_out : out std_logic_vector (7 downto 0);
out_ctl : out std_logic
);
end rom4kB;
architecture behavioral of rom4kB is
type array_type is array (0 to 4095) of std_logic_vector (7 downto 0);
type prog_type is array (0 to 1023) of std_logic_vector (15 downto 0);
type data_type is array (0 to 2047) of std_logic_vector (7 downto 0);
constant prog_instr : prog_type := (
--fill with desired program (max 1023 instructions)
others => x"0000"
);
constant prog_data : data_type := (
--fill with data to be used in program
others => x"00"
);
signal program : array_type;
signal const0 : std_logic := '0';
begin
process(const0)
begin
for i in 0 to 1023 loop
program(i*2 + 2048) <= prog_instr(i)(15 downto 8);
program(i*2 + 1 + 2048) <= prog_instr(i)(7 downto 0);
end loop;
for i in 0 to 2047 loop
program(i) <= prog_data(i);
end loop;
end process;
data_out <= program(to_integer(unsigned(addr)));
out_ctl <= '1' when (sel = '1') else '0';
end behavioral;
By using signal const0
(which is not driven by any input/logic) as the sole argument to the process sensitivity list, the process runs once. Half of ROM is allocated for instructions (1024 words, 2KB — technically 2KiB), and the other half for program constants. There is no "security" logic or protection against executing data as instructions — the program counter can point to any location in memory, even I/O (since muCPU uses memory-mapped I/O).
In versions prior to 2.1, only 128 bytes of RAM and 64 bytes of ROM were available (8-bit address space allows for 256 addressable locations). In version 2.1, r7
is used similarly to a base address, however it is shifted left by 8 and concatenated with the 8-bit sum of the supplied base address rs
and 8-bit immediate. This constitutes a 16-bit address, allowing for 64KiB of addressable locations.
The lower 4K addresses are allocated to ROM, with the program to be run starting at address 0. The next 4K section is allocated to RAM. There is a large blank space between RAM and I/O; the latter is located in the last 4 addresses.
The only part left before the full memory component is complete is the I/O logic. The CPU core interface for iobank
is essentially identical to that of the RAM component. iobank
's real difference is in its write implementation. Because multiple devices can write to it, arbitration is necessary. For this application, I opted for a very simple approach — block CPU writes while an external device is writing to I/O.
Below is the a single-port implementation of iobank (the actual implementation has 4 ports, each of which is implemented the exact same way in iobank.vhd
)
entity iobank is
port(cpu_data_in : in std_logic_vector (7 downto 0);
cpu_data_out : out std_logic_vector (7 downto 0);
cpu_addr : in std_logic_vector (1 downto 0);
clk : in std_logic;
cpu_we : in std_logic;
cpu_select : in std_logic;
cpu_out_ctl : out std_logic;
portA_data_in : in std_logic_vector (7 downto 0);
portA_data_out : out std_logic_vector (7 downto 0);
portA_we : in std_logic;
portA_out_ctl : out std_logic
);
end iobank;
architecture behavioral of iobank is
type array_type is array (3 downto 0) of std_logic_vector (7 downto 0);
signal port_data: array_type := (others => "00000000");
begin
write_data: process(clk)
begin
if (rising_edge(clk)) then
if (portA_we = '1') then -- give priority to external signal drivers
port_data(0) <= portA_data_in;
elsif (cpu_we = '1' and cpu_select = '1' and cpu_addr = "00") then
port_data(0) <= cpu_data_in;
end if;
end if;
end process;
cpu_data_out <= port_data(to_integer(unsigned(cpu_addr)));
cpu_out_ctl <= '1' when (cpu_select = '1' and cpu_we /= '1') else '0';
portA_data_out <= port_data(0);
portA_out_ctl <= '1' when (portA_we /= '1') else '0';
end behavioral;
Finally, each section of memory (RAM, ROM, I/O) needs to be connected up to a controller that allows the CPU core to access a single address space. Here's the address decoding logic for selecting a memory device.
with address(15 downto 12) select
ram_sel <= '1' when "0001",
'0' when others;
with address(15 downto 12) select
rom_sel <= '1' when "0000",
'0' when others;
with address(15 downto 2) select
io_sel <= '1' when "11111111111111",
'0' when others;
Previous Step (Finishing up the Execute Unit)
The full implementation of memory.vhd
(with all 4 io banks):
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity memory is
port (--processor interface
data_in : in std_logic_vector(7 downto 0);
pc_in : in std_logic_vector(15 downto 0);
mem_addr : in std_logic_vector(15 downto 0);
mem_access : in std_logic;
mem_store : in std_logic;
clk : in std_logic;
--data out
ram_data_out : out std_logic_vector(7 downto 0);
rom_data_out : out std_logic_vector(7 downto 0);
io_data_out : out std_logic_vector(7 downto 0);
out_ctl : out std_logic_vector(2 downto 0);
--io interface
portA_data_in : in std_logic_vector (7 downto 0);
portA_data_out : out std_logic_vector (7 downto 0);
portA_we : in std_logic;
portA_out_ctl : out std_logic;
portB_data_in : in std_logic_vector (7 downto 0);
portB_data_out : out std_logic_vector (7 downto 0);
portB_we : in std_logic;
portB_out_ctl : out std_logic;
portC_data_in : in std_logic_vector (7 downto 0);
portC_data_out : out std_logic_vector (7 downto 0);
portC_we : in std_logic;
portC_out_ctl : out std_logic;
portD_data_in : in std_logic_vector (7 downto 0);
portD_data_out : out std_logic_vector (7 downto 0);
portD_we : in std_logic;
portD_out_ctl : out std_logic
);
end memory;
architecture structural of memory is
signal ram_sel, rom_sel, io_sel : std_logic := '0';
signal address : std_logic_vector (15 downto 0) := (others => '0');
signal ram_out_ctl, rom_out_ctl, io_out_ctl : std_logic;
component ram4kB is
port (addr : in std_logic_vector (11 downto 0);
clk : in std_logic;
store : in std_logic;
sel : in std_logic;
data_in : in std_logic_vector (7 downto 0);
data_out : out std_logic_vector (7 downto 0);
out_ctl : out std_logic
);
end component;
component rom4kB is
port (addr : in std_logic_vector (11 downto 0);
sel : in std_logic;
data_out : out std_logic_vector (7 downto 0);
out_ctl : out std_logic
);
end component;
component iobank is
port (portA_data_in : in std_logic_vector (7 downto 0);
portA_data_out : out std_logic_vector (7 downto 0);
portA_we : in std_logic;
portA_out_ctl : out std_logic;
portB_data_in : in std_logic_vector (7 downto 0);
portB_data_out : out std_logic_vector (7 downto 0);
portB_we : in std_logic;
portB_out_ctl : out std_logic;
portC_data_in : in std_logic_vector (7 downto 0);
portC_data_out : out std_logic_vector (7 downto 0);
portC_we : in std_logic;
portC_out_ctl : out std_logic;
portD_data_in : in std_logic_vector (7 downto 0);
portD_data_out : out std_logic_vector (7 downto 0);
portD_we : in std_logic;
portD_out_ctl : out std_logic;
cpu_data_in : in std_logic_vector (7 downto 0);
cpu_data_out : out std_logic_vector (7 downto 0);
cpu_addr : in std_logic_vector (1 downto 0);
cpu_we : in std_logic;
cpu_select : in std_logic;
cpu_out_ctl : out std_logic;
clk : in std_logic
);
end component;
component mux is
generic(dwidth : positive);
port (data_in_0 : in std_logic_vector (dwidth-1 downto 0);
data_in_1 : in std_logic_vector (dwidth-1 downto 0);
sel : in std_logic;
data_out : out std_logic_vector (dwidth-1 downto 0));
end component;
begin
ram : ram4kB
port map(addr => address(11 downto 0),
clk => clk,
store => mem_store,
sel => ram_sel,
data_in => data_in,
data_out => ram_data_out,
out_ctl => ram_out_ctl);
rom : rom4kB
port map(addr => address(11 downto 0),
sel => rom_sel,
data_out => rom_data_out,
out_ctl => rom_out_ctl);
io : iobank
port map(portA_data_in => portA_data_in,
portA_data_out => portA_data_out,
portA_we => portA_we,
portA_out_ctl => portA_out_ctl,
portB_data_in => portB_data_in,
portB_data_out => portB_data_out,
portB_we => portB_we,
portB_out_ctl => portB_out_ctl,
portC_data_in => portC_data_in,
portC_data_out => portC_data_out,
portC_we => portC_we,
portC_out_ctl => portC_out_ctl,
portD_data_in => portD_data_in,
portD_data_out => portD_data_out,
portD_we => portD_we,
portD_out_ctl => portD_out_ctl,
cpu_data_in => data_in,
cpu_data_out => io_data_out,
cpu_addr => address(1 downto 0),
cpu_we => mem_store,
cpu_select => io_sel,
cpu_out_ctl => io_out_ctl,
clk => clk);
addrmux : mux
generic map(dwidth => 16)
port map(data_in_0 => pc_in,
data_in_1 => mem_addr,
sel => mem_access,
data_out => address);
out_ctl <= io_out_ctl & rom_out_ctl & ram_out_ctl;
with address(15 downto 12) select
ram_sel <= '1' when "0001",
'0' when others;
with address(15 downto 12) select
rom_sel <= '1' when "0000",
'0' when others;
with address(15 downto 2) select
io_sel <= '1' when "11111111111111",
'0' when others;
end structural;