Skip to content

Step Six: Memory IO

reed_foster edited this page Aug 9, 2017 · 2 revisions

Step Six: Memory/IO

In order to store programs and variables, a CPU needs to have memory. Traditionally, main memory or RAM - Random Access Memory, is external to the CPU die. The FPGA development kit I'm using, the miniSpartan6+ has a 256Mb (32MB) SDRAM (Synchronous Dynamic RAM — it uses capacitors to store values). Because this was my first project with an FPGA, I decided not to use the SDRAM because its interface is not simple to use, and I wanted to focus on building a CPU, not an SDRAM controller (my new project, r32, does have a functioning SDRAM controller).

There are two types of RAM on FPGAs, distributed and block RAM. Distributed, as the name suggests, spreads the storage elements it uses across many logic blocks (using a significant amount of resources for large storage). FPGAs also have block RAMs, small SRAM modules. Both are relatively simple to implement, but at the time of this project, I hadn't heard of block RAM, so I opted for distributed RAM (which was fine size-wise, I only needed 8KB — 4K RAM, 4K ROM).

Anyway, the implementation for a RAM in VHDL is extremely similar to the register file we built earlier:

entity ram4kB is
    port(addr     : in  std_logic_vector (11 downto 0);
         clk      : in  std_logic;
         store    : in  std_logic;
         sel      : in  std_logic;
         data_in  : in  std_logic_vector (7 downto 0);
         data_out : out std_logic_vector (7 downto 0);
         out_ctl  : out std_logic);
end ram4kB;

architecture behavioral of ram4kB is
    type arry_type is array (0 to 4095) of std_logic_vector (7 downto 0);
    signal mem_contents : arry_type := (others => "00000000");
begin

    write_data: process(clk)
    begin
        if (rising_edge(clk)) then
            if (store = '1' and sel = '1') then
                mem_contents(to_integer(unsigned(addr))) <= data_in;
            end if;
        end if;
    end process;

    data_out <= mem_contents(to_integer(unsigned(addr)));
    out_ctl <= '1' when (store /= '1' and sel = '1') else '0';

end behavioral;

out_ctl is high if data_out is valid. This signal is used by the memory "controller" to determine which memory's data_out (including I/O) should be fed to the execute unit.

Read Only Memory - ROM, has a slightly different implementation. Because it is never written to, it is simple to implement asynchronously. The only trick is initialization of the ROM contents. This is achieved by a process which runs once and sets each value of the main data array.

Here's the implementation:

entity rom4kB is
    port(addr     : in  std_logic_vector (11 downto 0);
         sel      : in  std_logic;
         data_out : out std_logic_vector (7 downto 0);
         out_ctl  : out std_logic
    );
end rom4kB;

architecture behavioral of rom4kB is
    type array_type is array (0 to 4095) of std_logic_vector (7 downto 0);
    type prog_type is array (0 to 1023) of std_logic_vector (15 downto 0);
    type data_type is array (0 to 2047) of std_logic_vector (7 downto 0);
    constant prog_instr : prog_type := (
        --fill with desired program (max 1023 instructions)
        others => x"0000"
    );
    constant prog_data : data_type := (
        --fill with data to be used in program
        others	=> x"00"
    );
    signal program : array_type;
    signal const0 : std_logic := '0';
begin

    process(const0)
    begin
        for i in 0 to 1023 loop
            program(i*2 + 2048) <= prog_instr(i)(15 downto 8);
            program(i*2 + 1 + 2048) <= prog_instr(i)(7 downto 0);
        end loop;
        for i in 0 to 2047 loop
            program(i) <= prog_data(i);
        end loop;
    end process;
    data_out <= program(to_integer(unsigned(addr)));
    out_ctl <= '1' when (sel = '1') else '0';

end behavioral;

By using signal const0 (which is not driven by any input/logic) as the sole argument to the process sensitivity list, the process runs once. Half of ROM is allocated for instructions (1024 words, 2KB — technically 2KiB), and the other half for program constants. There is no "security" logic or protection against executing data as instructions — the program counter can point to any location in memory, even I/O (since muCPU uses memory-mapped I/O).

In versions prior to 2.1, only 128 bytes of RAM and 64 bytes of ROM were available (8-bit address space allows for 256 addressable locations). In version 2.1, r7 is used similarly to a base address, however it is shifted left by 8 and concatenated with the 8-bit sum of the supplied base address rs and 8-bit immediate. This constitutes a 16-bit address, allowing for 64KiB of addressable locations.

The lower 4K addresses are allocated to ROM, with the program to be run starting at address 0. The next 4K section is allocated to RAM. There is a large blank space between RAM and I/O; the latter is located in the last 4 addresses.

The only part left before the full memory component is complete is the I/O logic. The CPU core interface for iobank is essentially identical to that of the RAM component. iobank's real difference is in its write implementation. Because multiple devices can write to it, arbitration is necessary. For this application, I opted for a very simple approach — block CPU writes while an external device is writing to I/O.

Below is the a single-port implementation of iobank (the actual implementation has 4 ports, each of which is implemented the exact same way in iobank.vhd)

entity iobank is
    port(cpu_data_in    : in    std_logic_vector (7 downto 0);
         cpu_data_out   : out   std_logic_vector (7 downto 0);
         cpu_addr       : in    std_logic_vector (1 downto 0);
         clk            : in    std_logic;
         cpu_we         : in    std_logic;
         cpu_select     : in    std_logic;
         cpu_out_ctl    : out   std_logic;

         portA_data_in  : in    std_logic_vector (7 downto 0);
         portA_data_out : out   std_logic_vector (7 downto 0);
         portA_we       : in    std_logic;
         portA_out_ctl  : out   std_logic
	);
end iobank;

architecture behavioral of iobank is
    type array_type is array (3 downto 0) of std_logic_vector (7 downto 0);
    signal port_data: array_type := (others => "00000000");
begin

    write_data: process(clk)
    begin
        if (rising_edge(clk)) then
            if (portA_we = '1') then -- give priority to external signal drivers
                port_data(0) <= portA_data_in;
            elsif (cpu_we = '1' and cpu_select = '1' and cpu_addr = "00") then
                port_data(0) <= cpu_data_in;
            end if;
        end if;
    end process;

    cpu_data_out <= port_data(to_integer(unsigned(cpu_addr)));
    cpu_out_ctl <= '1' when (cpu_select = '1' and cpu_we /= '1') else '0';

    portA_data_out <= port_data(0);
    portA_out_ctl <= '1' when (portA_we /= '1') else '0';

end behavioral;

Finally, each section of memory (RAM, ROM, I/O) needs to be connected up to a controller that allows the CPU core to access a single address space. Here's the address decoding logic for selecting a memory device.

with address(15 downto 12) select
    ram_sel <=	'1' when "0001",
                '0' when others;

with address(15 downto 12) select
    rom_sel <=	'1' when "0000",
                '0' when others;

with address(15 downto 2) select
    io_sel  <=	'1' when "11111111111111",
                '0' when others;

Previous Step (Finishing up the Execute Unit)

The full implementation of memory.vhd (with all 4 io banks):

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity memory is
	port (--processor interface
			data_in		: in	std_logic_vector(7 downto 0);
			pc_in			: in	std_logic_vector(15 downto 0);
			mem_addr		: in	std_logic_vector(15 downto 0);
			mem_access	: in	std_logic;
			mem_store	: in	std_logic;
			clk			: in	std_logic;

			--data out
			ram_data_out	: out std_logic_vector(7 downto 0);
			rom_data_out	: out std_logic_vector(7 downto 0);
			io_data_out		: out std_logic_vector(7 downto 0);
			out_ctl			: out std_logic_vector(2 downto 0);

			--io interface
			portA_data_in	: in	std_logic_vector (7 downto 0);
			portA_data_out	: out	std_logic_vector (7 downto 0);
			portA_we			: in	std_logic;
			portA_out_ctl	: out	std_logic;
			portB_data_in	: in	std_logic_vector (7 downto 0);
			portB_data_out	: out	std_logic_vector (7 downto 0);
			portB_we			: in	std_logic;
			portB_out_ctl	: out	std_logic;
			portC_data_in	: in	std_logic_vector (7 downto 0);
			portC_data_out	: out	std_logic_vector (7 downto 0);
			portC_we			: in	std_logic;
			portC_out_ctl	: out	std_logic;
			portD_data_in	: in	std_logic_vector (7 downto 0);
			portD_data_out	: out	std_logic_vector (7 downto 0);
			portD_we			: in	std_logic;
			portD_out_ctl	: out	std_logic
			);
end memory;

architecture structural of memory is
	signal ram_sel, rom_sel, io_sel : std_logic := '0';
	signal address : std_logic_vector (15 downto 0) := (others => '0');
	signal ram_out_ctl, rom_out_ctl, io_out_ctl : std_logic;

	component ram4kB is
		port (addr     : in  std_logic_vector (11 downto 0);
				clk      : in  std_logic;
				store    : in  std_logic;
				sel      : in  std_logic;
				data_in  : in  std_logic_vector (7 downto 0);
				data_out : out std_logic_vector (7 downto 0);
				out_ctl  : out std_logic
		);
	end component;

	component rom4kB is
		port (addr     : in  std_logic_vector (11 downto 0);
				sel      : in  std_logic;
				data_out : out std_logic_vector (7 downto 0);
				out_ctl  : out std_logic
		);
	end component;

	component iobank is
		port (portA_data_in  : in  std_logic_vector (7 downto 0);
				portA_data_out : out	std_logic_vector (7 downto 0);
				portA_we       : in  std_logic;
				portA_out_ctl  : out	std_logic;
				portB_data_in  : in  std_logic_vector (7 downto 0);
				portB_data_out : out	std_logic_vector (7 downto 0);
				portB_we       : in  std_logic;
				portB_out_ctl  : out	std_logic;
				portC_data_in  : in  std_logic_vector (7 downto 0);
				portC_data_out : out	std_logic_vector (7 downto 0);
				portC_we       : in  std_logic;
				portC_out_ctl  : out	std_logic;
				portD_data_in  : in  std_logic_vector (7 downto 0);
				portD_data_out : out	std_logic_vector (7 downto 0);
				portD_we       : in  std_logic;
				portD_out_ctl  : out	std_logic;
				cpu_data_in    : in  std_logic_vector (7 downto 0);
				cpu_data_out   : out	std_logic_vector (7 downto 0);
				cpu_addr       : in  std_logic_vector (1 downto 0);
				cpu_we         : in  std_logic;
				cpu_select     : in  std_logic;
				cpu_out_ctl    : out std_logic;
				clk            : in  std_logic
		);
	end component;

	component mux is
		generic(dwidth : positive);
		port (data_in_0   : in  std_logic_vector (dwidth-1 downto 0);
				data_in_1   : in  std_logic_vector (dwidth-1 downto 0);
				sel         : in  std_logic;
				data_out    : out std_logic_vector (dwidth-1 downto 0));
	end component;

begin

   ram : ram4kB
      port map(addr        => address(11 downto 0),
               clk         => clk,
               store       => mem_store,
               sel         => ram_sel,
               data_in     => data_in,
               data_out    => ram_data_out,
               out_ctl     => ram_out_ctl);

   rom : rom4kB
      port map(addr        => address(11 downto 0),
               sel         => rom_sel,
               data_out    => rom_data_out,
               out_ctl     => rom_out_ctl);

   io  : iobank
      port map(portA_data_in	=> portA_data_in,
               portA_data_out	=> portA_data_out,
               portA_we			=> portA_we,
               portA_out_ctl	=> portA_out_ctl,
               portB_data_in	=> portB_data_in,
               portB_data_out	=> portB_data_out,
               portB_we			=> portB_we,
               portB_out_ctl	=> portB_out_ctl,
               portC_data_in	=> portC_data_in,
               portC_data_out	=> portC_data_out,
               portC_we			=> portC_we,
               portC_out_ctl	=> portC_out_ctl,
               portD_data_in	=> portD_data_in,
               portD_data_out	=> portD_data_out,
               portD_we			=> portD_we,
               portD_out_ctl	=> portD_out_ctl,
               cpu_data_in		=> data_in,
               cpu_data_out	=> io_data_out,
               cpu_addr			=> address(1 downto 0),
               cpu_we			=> mem_store,
               cpu_select		=> io_sel,
               cpu_out_ctl		=> io_out_ctl,
               clk				=> clk);

   addrmux	: mux
      generic map(dwidth		=> 16)
      port map(data_in_0	=> pc_in,
               data_in_1	=> mem_addr,
               sel			=> mem_access,
               data_out		=> address);

   out_ctl <= io_out_ctl & rom_out_ctl & ram_out_ctl;

   with address(15 downto 12) select
      ram_sel <=	'1' when "0001",
                  '0' when others;

   with address(15 downto 12) select
      rom_sel <=	'1' when "0000",
                  '0' when others;

   with address(15 downto 2) select
      io_sel  <=	'1' when "11111111111111",
                  '0' when others;

end structural;