Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add latency estimations to simulator #21

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

maaspa
Copy link

@maaspa maaspa commented Jul 10, 2024

Changes: Extended the simulator so that it can provide a latency estimation for a given kernel. The code computes the total latency, as well as per-instruction latency and the latency of the longest operation for each instruction.
Tests used to verify: The latency estimations were validated by testing the provided examples as well as my own personal test cases written previously (designed to identify and test challenging patterns). To further verify the accuracy of the estimations, the reviewer can try other kernels (or write new ones) and compare with the expected results simulated on HEEPsilon.

@JuanSapriza
Copy link
Collaborator

JuanSapriza commented Jul 10, 2024

@maaspa please add a detailed description including

  • What were the precise changes that you did
  • Which tests did you run to check that it works
  • What are the things the reviewer (me in this case) should test
  • If there are any other pending tasks (in which case the PR should be converted to a draft )

Copy link
Collaborator

@JuanSapriza JuanSapriza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job. Check the comments, which are mostly naming and placing. I think you approached really fast the kind of architecture i had in mind, well done!
I hope you are seeing the advantages in clarity and scalability of doing it this way 😄

src/cgra.py Outdated
@@ -21,6 +22,19 @@
dsts = ['SELF', 'RCL', 'RCR', 'RCT', 'RCB','R0', 'R1', 'R2', 'R3']
regs = dsts[-4:]

operation_latency_mapping = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move the latency-related operations to a separate module latency.py?
It is unclear to me when this code is gonna execute (when loading the module probably). I would rather have a latency_load_characterization( filename ) function, so we could actually have more than one possible characterization (for example, for the case where you wanna test different scenarios.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I added a function that can eventually be reused for other characterizations

src/cgra.py Outdated
@@ -71,6 +85,8 @@ def __init__( self, kernel, memory, read_addrs, write_addrs):
self.memory = memory
self.instr2exec = 0
self.cycles = 0
self.instr_time = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add units to all variables that refer to a magnitude?
e.g. if instr_time is in seconds, rename it to instr_time_s

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if instead it is in clock cycles, instr_time_cc, in which case might be better to call it instr_latency_cc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✔️

src/cgra.py Outdated
@@ -97,23 +113,42 @@ def step( self, prs="ROUT" ):
for c in range(N_COLS):
self.cells[r][c].update()
instr2exec = self.instr2exec
self.max_instr = None
self.lw_count = [0] * N_COLS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.lw_count was never defined right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was part of my previous memory latency algo which I have now removed. It is no longer in my code

src/cgra.py Outdated
Comment on lines 125 to 128
if self.max_instr is None or self.cells[r][c].time > self.max_instr.time:
self.max_instr = self.cells[r][c]
if self.cells[r][c].op in ["LWD", "LWI", "SWD","SWI"]:
self.lw_count[c] += 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this belongs here. This section of the code is controlling the operation of the instructions, and in the middle we are adding a section to count the number of memory accesses. Does not feel right, true?
Probably there is a better place to do it. For example, given that we are saving all the history of operations, computing the number of memory access can very easily be done at the very end by simply counting how many times the operations appear in the instructions matrix.

Regarding the max_instr i find it counter-intuitive. I understand that you are trying to check which cell of this instruction is the one that has the longest operation (therefore max_instr is to be read as maximum latency of this instruction). Had think quite a bit to see it this way. Additionally, given that several cells are doing LW, then why would one cell be the longest or not? It's weird to think and sounds kinda arbitrary. Will keep reading the code to see how it's used, but have a negative feeling about it.

Regarding the ["LWI", ...], please define the array as a global variable with a name as OPERATIONS_MEMORY_ACCESS.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✔️ (moved the code to a more appropriate place)

src/cgra.py Outdated
if PRINT_OUTS: print("Instr = ", self.cycles, "(",instr2exec,")")
for r in range(N_ROWS):
for c in range(N_COLS):
op = self.instrs[instr2exec].ops[r][c]
b ,e = self.cells[r][c].exec( op )
if b != 0: self.instr2exec = b - 1 #To avoid more logic afterwards
if e != 0: self.exit = True
if self.max_instr is None or self.cells[r][c].time > self.max_instr.time:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is the time of a cell computed and assigned? Could not find that part. Also, is it time or latency (the word time usually refers to a moment in time and not a period of time). Also add units latency_cc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed units and added line that fetches latency ✔️

src/cgra.py Outdated
@@ -375,6 +411,17 @@ def blt( self, val1, val2, branch ):
ops_jump = { 'JUMP' : '' }
ops_exit = { 'EXIT' : '' }

def display_characterization(cgra):
total_time = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remember to also add units to all variables

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✔️

src/cgra.py Outdated
@@ -375,6 +411,17 @@ def blt( self, val1, val2, branch ):
ops_jump = { 'JUMP' : '' }
ops_exit = { 'EXIT' : '' }

def display_characterization(cgra):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to another module. We can have one for latency and one for energy (in a year another for area, etc..)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✔️

src/cgra.py Outdated
print("Cycle:", index + 1, "( ", item.instr2exec, " )")
print("Instruction:", item.instr)
print("Time:", item.time, "CC\n")
total_time += item.time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say this information should already be available in the class instance e.g. in self.total_latency_cc

Maybe even better in a self.latency.instructions with a matrix and self.latency.total, self.latency.bottleneck, etc...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✔️

src/cgra.py Outdated
total_time = 0
print("Longest instructions per cycle:\n")
for index, item in enumerate(cgra.instr_time):
print("Cycle:", index + 1, "( ", item.instr2exec, " )")
Copy link
Collaborator

@JuanSapriza JuanSapriza Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For printing information i always recommend a transposition of this. e.g.
Instead of having:

Name: Juan
Last Name: Sapriza
Age: 22 (?) 

Have

Name       Last Name    Age (years)
Juan       Sapriza          22
Maxime     Aspros           20 

I guess you see why 😉
It's also much comfortable to copy-paste to a spreadsheet, or export into a file to open in python later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✔️

@@ -0,0 +1,27 @@
# operation_latency_mapping
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

units :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you wanna add an extra row with "non-operation-codes", for example, the overhead for memory accesses, and the additional cost of each memory access, the overhead for the first iteration of the kernel, of moving memory, etc...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✔️

@maaspa maaspa requested a review from JuanSapriza July 12, 2024 10:34
mem_latency_cc += 1
self.max_latency_instr.latency_cc = max(self.max_latency_instr.latency_cc, mem_latency_cc)
if (self.exit):
if (self.max_latency_instr.latency_cc > 2):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this tho?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-The "max" function is used to only retain the largest operation (ie: between 3CC multiplication and only one 2CC memory operation, conserve the SMUL)
-An instruction takes an extra CC if it contains the EXIT operation (= last instruction)

Comment on lines 57 to 59
self.max_latency_instr.instr2exec = self.instr2exec
self.instr_latency_cc.append(copy.copy(self.max_latency_instr))
self.total_latency_cc += self.instr_latency_cc[-1].latency_cc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's all this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First line: we need the current CGRA-cycle number as well as that of the instruction being executed (instr2exec)
Second line: Self.instr_latency is an array containing the longest operation for each instruction
Third line: With each instruction, we also sum up the total latency (to avoid doing so in the display_characterization function)

self.total_latency_cc += self.instr_latency_cc[-1].latency_cc

def display_characterization(cgra):
print("Longest instructions per cycle:\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with "cycle" you mean "CGRA-cycle" (i.e. the execution of an instruction)?
If an execution has 1256655 CGRA-cycles, we are gonna print them all?
Why not make use of the system that is already implemented where the user can choose what to print? For example, I could call

run(kernel_name, version=version, pr=["ROUT","OPS",  "OP_MAX_LAT", "TOTAL_LAT"  ], load_addrs=load_addrs, store_addrs=store_addrs)

Instead of forcing the user to see all the time everything

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'm going to do this

Comment on lines 44 to 52
if self.max_latency_instr is None or self.cells[r][c].latency_cc > self.max_latency_instr.latency_cc:
self.max_latency_instr = self.cells[r][c]
if self.cells[r][c].op in OPERATIONS_MEMORY_ACCESS:
mem_latency_cc += 1
# A memory access to a memory bank has a 2-cycle overhead,
# plus 1 additional cycle per PE trying to access it.
if mem_latency_cc >= 1:
mem_latency_cc += 1
self.max_latency_instr.latency_cc = max(self.max_latency_instr.latency_cc, mem_latency_cc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a little unhappy that this logic has to be hardcoded. This is easy now, but when we speak about the INTERLEAVED BUS you will not want to change the code to switch from one to other. What about this other approach:

operation_characterization_N2M.csv

TYPE, OPERATION, LATENCY, OVERHEAD, ADD_SAME_ROW, ADD_SAME_COL, ADD_CGRA, ADD_ADDR_SAME_INTEGER, ADD_ADDR_SAME_MODULO
0, NOP,  1, 0, 0, 0, 0, 0, 0
1, SADD, 1, 0, 0, 0, 0, 0, 0
1, SSUB, 1, 0, 0, 0, 0, 0, 0
2, SMUL, 3, 0, 0, 0, 0, 0, 0
3, LWD,  0, 2, 0, 1, 0, 1, 0

operation_characterization_ONE2M.csv

TYPE, OPERATION, LATENCY, OVERHEAD, ADD_SAME_ROW, ADD_SAME_COL, ADD_CGRA, ADD_ADDR_SAME_INTEGER, ADD_ADDR_SAME_MODULO
0, NOP,  1, 0, 0, 0, 0, 0, 0
1, SADD, 1, 0, 0, 0, 0, 0, 0
1, SSUB, 1, 0, 0, 0, 0, 0, 0
2, SMUL, 3, 0, 0, 0, 0, 0, 0
3, LWD,  0, 2, 0, 1, 1, 0, 0

Here you kind of embed the logic in the CSV and simplify the logic in the code, so if you want to test different architectures you dont need to modify the code. The example from above is the different bus topologies you already encountered (dont take this as is, as I could be wrong)

The idea is that you encode in the different columns different latency patterns and how they are affected by operations of the same type. For example, ADD_SAME_ROW is the cc you add if you encounter another instruction of the same type in the same row, ADD_ADDR_SAME_INTEGER is, you take the target address, perform the integer division with the memory bank size and, if its the same, add 1 cc of penalty, same for the MODULO with the % modulo division (this will be useful for the interleaved memory banks.

I'm not saying DO IT LIKE THIS ... but this kind of spirit that you will not want to change the code every time. It's something one day we will also do for the instruction operations to make the logic independent of the simulator so we can test different architecture changes without modifying the code. BTW.. the passion for not modifying the code is because we would like to have e.g. 20 different architectures and test which is the best.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, given how you implemented the fetch of the values, in the same CSV we could have different implementations:

 #operation_latency_cc_mapping
. . . 

#operation_memory_interleaved_latency_cc_mapping
. . . 


OPERATIONS_MEMORY_ACCESS = ["LWD", "LWI", "SWD","SWI"]

def load_operation_characterization(operation_mapping, characterization_type):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why take operation mapping as a parameter?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed ✔️

src/cgra.py Show resolved Hide resolved
@maaspa
Copy link
Author

maaspa commented Aug 6, 2024

Changes:

  • The simulator now uses the bus type as a parameter to estimate the latency: 1-M, N-M, and interleaved bus types are supported. In the case of the 1-M bus, the conflicts between memory accesses and the CPU fetching code instructions impact latency. Currently, the simulator only estimates the latency due to the CPU polls regularly to check if the CGRA has finished executing.
  • Simulator also estimates the configuration time before execution, as well as the time between the end of configuration and the start of execution iteration.
  • User can specify the output information to be printed by using the pr parameter.
    Tests used to verify: The latency estimations were validated by testing the provided examples as well as my own personal test cases written previously (designed to identify and test challenging patterns). To further verify the accuracy of the estimations, the reviewer can try other kernels (or write new ones) and compare with the expected results simulated on HEEPsilon.

Copy link
Collaborator

@JuanSapriza JuanSapriza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really improved the first stages of getting the latency. We now need to continue on the same path for the rest of the computation. From ordering the DMA accesses onwards is still seemingly too complex.

I left some extra comment here and there you might also wanna check

src/cgra.py Outdated
@@ -20,7 +21,7 @@
srcs = ['ZERO', 'SELF', 'RCL', 'RCR', 'RCT', 'RCB', 'R0', 'R1', 'R2', 'R3', 'IMM']
dsts = ['SELF', 'RCL', 'RCR', 'RCT', 'RCB','R0', 'R1', 'R2', 'R3']
regs = dsts[-4:]

flag_poll_cnt = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds pretty dangerous. This variable is being used outside this module, but only used outside. Although it's technically possible, i would strongly dis-advice it. If it's a variable being used in the context of the CGRA, i suggest adding it as an internal variable of the class and you can initialize it to 0 in the init function.

Also, please leave an extra enter between these lines and the rest. It's nice not to cramp the code to improve readability. The same happens with indenting consequent lines that are doing similar things to the same level (i.e. you align all the = equal signs)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed ✔️

@@ -47,8 +48,8 @@ def print_out( prs, outs, insts, ops, reg ):
elif pr == "R1" : pnt = reg[1]
elif pr == "R2" : pnt = reg[2]
elif pr == "R3" : pnt = reg[3]

out_string += "["
if pnt != []:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if pnt == []? The for should still be executed? What was the idea behind it?

mem_latency_cc = adjust_latency_for_bus(self, mem_latency_cc, bus_type)
if mem_latency_cc > self.max_latency_instr.latency_cc:
self.max_latency_instr.latency_cc = mem_latency_cc
self.max_latency_instr.instr = f'MEM ({self.max_latency_instr.instr})'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why MEM?

def load_operation_characterization(characterization_type):
operation_mapping = {}
script_dir = os.path.dirname(os.path.abspath(__file__))
csv_file_path = os.path.join(script_dir, 'operation_characterization.csv')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we leave the characterization file as a parameter?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

continue
return operation_mapping

operation_latency_mapping = load_operation_characterization("latency_cc")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this code executed? When the module is included? Shouldnt this be in a init_characterization or load_characterization or something? Then we should call these functions from the notebook. I haven't finished reading the code, but i find it weird that all this is working without generating any changes on the notebooks launching the examples right?

def compute_bank_index(self, r, c) :
if self.bus_type == "INTERLEAVED":
if self.cells[r][c].op == "SWD":
index_pos = int(((self.cells[r][c].addr - self.init_store[0]) / 4) % 8)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 4 and 8? What are those numbers?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to change those to constants thanks for the reminder ✔️

if self.cells[k][c].op in OPERATIONS_MEMORY_ACCESS and (k, c) not in covered_accesses:
covered_accesses, concurrent_accesses = update_accesses(covered_accesses, concurrent_accesses, r, c, k, self.cells[k][c].bank_index)
break
if self.bus_type != "INTERLEAVED":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it matter if the memory is interleaved in the reordering of the DMA? The DMA does not know which memory architecture it will be targeting, so it seems weird that it affects something on how it orders operations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to a more appropriate location ✔️

concurrent_accesses = [{} for _ in range(4)]
# reorder memory accesses to group into concurrent executions
# covered_accesses tracks the accesses that have already been visited
for r in range(N_ROWS):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... when you explained me this logic you were just grabbing the operations and "moving them up"... This seems like just creating an NxN matrix and copying them "flattened"... why is such a complex logic required to this? I think these 4 functions could be a single one with 3 or 4 lines

longest_alu_op_latency_cc = get_latency_alu_cc(self)
total_mem_latency_cc = get_latency_mem_cc(self)
self.max_latency_instr.latency_cc = max(longest_alu_op_latency_cc, total_mem_latency_cc)
if total_mem_latency_cc > longest_alu_op_latency_cc:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if not?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if condition only serves to add "MEM" to the longest operation's name when the mem ops take longer than the non-mem ones. If this is not the case, then the name is just that of the longest non-mem operation.

bus_type_active_row_coef = load_operation_characterization("active_row_coef")
bus_type_cpu_loop_instrs = load_operation_characterization("cpu_loop_instrs")

def get_latency_cc(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is self? the cgra? the instruction? the operation? the cell?

Note that self is a reserved parameter for the parent class instance. That is...

class Person:
   def __init__( self, height ):
       self.height = 0

   def get_height(self):
      return self.height


juan   = Person( 180 )
height = juan.get_height() 

self is a parameter YOU DO NOT PASS. It is passed automatically because its the instance of the containing method. Under no circumstances use self as a parameter name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also because then you dont know what the parameter is... like was my case

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected self to "cgra" ✔️

Copy link
Collaborator

@JuanSapriza JuanSapriza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the progress! Well done.
I left some more comments. I know you are now fixing some things with the CPU that were not working.

Please, check this review comments and the previous ones, fix that problem with the CPU accesses and then I can proceed to doing tests to merge :)

cgra.cells[r][c].bank_index = compute_bank_index(cgra,r,c)

def compute_bank_index(cgra, r, c):
base_addr = cgra.init_store[0] if cgra.cells[r][c].op == "SWD" else sorted(cgra.memory)[0][0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont fully understand what this base address stands for, or why it's different for the SWD...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base_addr corresponds to the address of the first memory element (cgra_input[0][0]). We need this value to find the position of the elements and ultimately return the corresponding bank index. (ie: 9th element accessed -> bank 0).
We need to handle SWD separately due to the starting address.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why is the starting address related to the bank number?

if cgra.memory_manager.bus_type == "INTERLEAVED":
index_pos = int(((cgra.cells[r][c].addr - base_addr) / cgra.memory_manager.spacing) % cgra.memory_manager.n_banks)
else:
index_pos = cgra.cells[r][c].addr / cgra.memory_manager.bank_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt the one-to-M be another case where the index is always the same (to simulate that they always get a conflict)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

def compute_bank_index(cgra, r, c):
base_addr = cgra.init_store[0] if cgra.cells[r][c].op == "SWD" else sorted(cgra.memory)[0][0]
if cgra.memory_manager.bus_type == "INTERLEAVED":
index_pos = int(((cgra.cells[r][c].addr - base_addr) / cgra.memory_manager.spacing) % cgra.memory_manager.n_banks)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the spacing? if its how many bytes for each address, you may wanna call it alignment_B or word_size_B (my preferred)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add units to variables so that we know if they are in bits (b) , Bytes (B) or words of 4 bytes (w). My standard is to add it as a underscore and then the unit word_size_B, banks_n (with n standing for something that is just a count and therefore has no units)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

return pairs[1].index(pair)
return 0

def compute_latency_cc(cgra, dependencies):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe wanna give it a name that refers to memory? because its not latency in general, but the latency of a memory access, i understand

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

src/memory.py Outdated
WORD_SIZE = 4

class MEMORY:
def __init__( self,bus_type="ONE-TO-M", spacing=4, n_banks=8, bank_size=32000):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. Just add the units

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

src/memory.py Outdated
self.bank_size = bank_size
self.flag_poll_cnt = 0

def kernel_clear_memory( name, version=""):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want, you can remove the work kernel from the beginning of these functions, as it does not make sense anymore. I like it more starting with memory as they are in the memory module.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants