-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add latency estimations to simulator #21
base: master
Are you sure you want to change the base?
Conversation
@maaspa please add a detailed description including
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job. Check the comments, which are mostly naming and placing. I think you approached really fast the kind of architecture i had in mind, well done!
I hope you are seeing the advantages in clarity and scalability of doing it this way 😄
src/cgra.py
Outdated
@@ -21,6 +22,19 @@ | |||
dsts = ['SELF', 'RCL', 'RCR', 'RCT', 'RCB','R0', 'R1', 'R2', 'R3'] | |||
regs = dsts[-4:] | |||
|
|||
operation_latency_mapping = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move the latency-related operations to a separate module latency.py
?
It is unclear to me when this code is gonna execute (when loading the module probably). I would rather have a latency_load_characterization( filename )
function, so we could actually have more than one possible characterization (for example, for the case where you wanna test different scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I added a function that can eventually be reused for other characterizations
src/cgra.py
Outdated
@@ -71,6 +85,8 @@ def __init__( self, kernel, memory, read_addrs, write_addrs): | |||
self.memory = memory | |||
self.instr2exec = 0 | |||
self.cycles = 0 | |||
self.instr_time = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add units to all variables that refer to a magnitude?
e.g. if instr_time
is in seconds, rename it to instr_time_s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if instead it is in clock cycles, instr_time_cc
, in which case might be better to call it instr_latency_cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ✔️
src/cgra.py
Outdated
@@ -97,23 +113,42 @@ def step( self, prs="ROUT" ): | |||
for c in range(N_COLS): | |||
self.cells[r][c].update() | |||
instr2exec = self.instr2exec | |||
self.max_instr = None | |||
self.lw_count = [0] * N_COLS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.lw_count
was never defined right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was part of my previous memory latency algo which I have now removed. It is no longer in my code
src/cgra.py
Outdated
if self.max_instr is None or self.cells[r][c].time > self.max_instr.time: | ||
self.max_instr = self.cells[r][c] | ||
if self.cells[r][c].op in ["LWD", "LWI", "SWD","SWI"]: | ||
self.lw_count[c] += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think this belongs here. This section of the code is controlling the operation of the instructions, and in the middle we are adding a section to count the number of memory accesses. Does not feel right, true?
Probably there is a better place to do it. For example, given that we are saving all the history of operations, computing the number of memory access can very easily be done at the very end by simply counting how many times the operations appear in the instructions matrix.
Regarding the max_instr
i find it counter-intuitive. I understand that you are trying to check which cell of this instruction is the one that has the longest operation (therefore max_instr
is to be read as maximum latency of this instruction
). Had think quite a bit to see it this way. Additionally, given that several cells are doing LW, then why would one cell be the longest or not? It's weird to think and sounds kinda arbitrary. Will keep reading the code to see how it's used, but have a negative feeling about it.
Regarding the ["LWI", ...]
, please define the array as a global variable with a name as OPERATIONS_MEMORY_ACCESS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ✔️ (moved the code to a more appropriate place)
src/cgra.py
Outdated
if PRINT_OUTS: print("Instr = ", self.cycles, "(",instr2exec,")") | ||
for r in range(N_ROWS): | ||
for c in range(N_COLS): | ||
op = self.instrs[instr2exec].ops[r][c] | ||
b ,e = self.cells[r][c].exec( op ) | ||
if b != 0: self.instr2exec = b - 1 #To avoid more logic afterwards | ||
if e != 0: self.exit = True | ||
if self.max_instr is None or self.cells[r][c].time > self.max_instr.time: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is the time
of a cell computed and assigned? Could not find that part. Also, is it time
or latency
(the word time
usually refers to a moment in time and not a period of time). Also add units latency_cc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed units and added line that fetches latency ✔️
src/cgra.py
Outdated
@@ -375,6 +411,17 @@ def blt( self, val1, val2, branch ): | |||
ops_jump = { 'JUMP' : '' } | |||
ops_exit = { 'EXIT' : '' } | |||
|
|||
def display_characterization(cgra): | |||
total_time = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remember to also add units to all variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ✔️
src/cgra.py
Outdated
@@ -375,6 +411,17 @@ def blt( self, val1, val2, branch ): | |||
ops_jump = { 'JUMP' : '' } | |||
ops_exit = { 'EXIT' : '' } | |||
|
|||
def display_characterization(cgra): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to another module. We can have one for latency and one for energy (in a year another for area, etc..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ✔️
src/cgra.py
Outdated
print("Cycle:", index + 1, "( ", item.instr2exec, " )") | ||
print("Instruction:", item.instr) | ||
print("Time:", item.time, "CC\n") | ||
total_time += item.time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say this information should already be available in the class instance e.g. in self.total_latency_cc
Maybe even better in a self.latency.instructions
with a matrix and self.latency.total
, self.latency.bottleneck
, etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ✔️
src/cgra.py
Outdated
total_time = 0 | ||
print("Longest instructions per cycle:\n") | ||
for index, item in enumerate(cgra.instr_time): | ||
print("Cycle:", index + 1, "( ", item.instr2exec, " )") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For printing information i always recommend a transposition of this. e.g.
Instead of having:
Name: Juan
Last Name: Sapriza
Age: 22 (?)
Have
Name Last Name Age (years)
Juan Sapriza 22
Maxime Aspros 20
I guess you see why 😉
It's also much comfortable to copy-paste to a spreadsheet, or export into a file to open in python later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ✔️
src/operation_characterization.csv
Outdated
@@ -0,0 +1,27 @@ | |||
# operation_latency_mapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
units :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you wanna add an extra row with "non-operation-codes", for example, the overhead for memory accesses, and the additional cost of each memory access, the overhead for the first iteration of the kernel, of moving memory, etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ✔️
src/characterization.py
Outdated
mem_latency_cc += 1 | ||
self.max_latency_instr.latency_cc = max(self.max_latency_instr.latency_cc, mem_latency_cc) | ||
if (self.exit): | ||
if (self.max_latency_instr.latency_cc > 2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this tho?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-The "max" function is used to only retain the largest operation (ie: between 3CC multiplication and only one 2CC memory operation, conserve the SMUL)
-An instruction takes an extra CC if it contains the EXIT operation (= last instruction)
src/characterization.py
Outdated
self.max_latency_instr.instr2exec = self.instr2exec | ||
self.instr_latency_cc.append(copy.copy(self.max_latency_instr)) | ||
self.total_latency_cc += self.instr_latency_cc[-1].latency_cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's all this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First line: we need the current CGRA-cycle number as well as that of the instruction being executed (instr2exec)
Second line: Self.instr_latency is an array containing the longest operation for each instruction
Third line: With each instruction, we also sum up the total latency (to avoid doing so in the display_characterization function)
src/characterization.py
Outdated
self.total_latency_cc += self.instr_latency_cc[-1].latency_cc | ||
|
||
def display_characterization(cgra): | ||
print("Longest instructions per cycle:\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with "cycle" you mean "CGRA-cycle" (i.e. the execution of an instruction)?
If an execution has 1256655 CGRA-cycles, we are gonna print them all?
Why not make use of the system that is already implemented where the user can choose what to print? For example, I could call
run(kernel_name, version=version, pr=["ROUT","OPS", "OP_MAX_LAT", "TOTAL_LAT" ], load_addrs=load_addrs, store_addrs=store_addrs)
Instead of forcing the user to see all the time everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I'm going to do this
src/characterization.py
Outdated
if self.max_latency_instr is None or self.cells[r][c].latency_cc > self.max_latency_instr.latency_cc: | ||
self.max_latency_instr = self.cells[r][c] | ||
if self.cells[r][c].op in OPERATIONS_MEMORY_ACCESS: | ||
mem_latency_cc += 1 | ||
# A memory access to a memory bank has a 2-cycle overhead, | ||
# plus 1 additional cycle per PE trying to access it. | ||
if mem_latency_cc >= 1: | ||
mem_latency_cc += 1 | ||
self.max_latency_instr.latency_cc = max(self.max_latency_instr.latency_cc, mem_latency_cc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a little unhappy that this logic has to be hardcoded. This is easy now, but when we speak about the INTERLEAVED BUS you will not want to change the code to switch from one to other. What about this other approach:
operation_characterization_N2M.csv
TYPE, OPERATION, LATENCY, OVERHEAD, ADD_SAME_ROW, ADD_SAME_COL, ADD_CGRA, ADD_ADDR_SAME_INTEGER, ADD_ADDR_SAME_MODULO
0, NOP, 1, 0, 0, 0, 0, 0, 0
1, SADD, 1, 0, 0, 0, 0, 0, 0
1, SSUB, 1, 0, 0, 0, 0, 0, 0
2, SMUL, 3, 0, 0, 0, 0, 0, 0
3, LWD, 0, 2, 0, 1, 0, 1, 0
operation_characterization_ONE2M.csv
TYPE, OPERATION, LATENCY, OVERHEAD, ADD_SAME_ROW, ADD_SAME_COL, ADD_CGRA, ADD_ADDR_SAME_INTEGER, ADD_ADDR_SAME_MODULO
0, NOP, 1, 0, 0, 0, 0, 0, 0
1, SADD, 1, 0, 0, 0, 0, 0, 0
1, SSUB, 1, 0, 0, 0, 0, 0, 0
2, SMUL, 3, 0, 0, 0, 0, 0, 0
3, LWD, 0, 2, 0, 1, 1, 0, 0
Here you kind of embed the logic in the CSV and simplify the logic in the code, so if you want to test different architectures you dont need to modify the code. The example from above is the different bus topologies you already encountered (dont take this as is, as I could be wrong)
The idea is that you encode in the different columns different latency patterns and how they are affected by operations of the same type. For example, ADD_SAME_ROW
is the cc you add if you encounter another instruction of the same type in the same row, ADD_ADDR_SAME_INTEGER
is, you take the target address, perform the integer division with the memory bank size and, if its the same, add 1 cc of penalty, same for the MODULO
with the % modulo division (this will be useful for the interleaved memory banks.
I'm not saying DO IT LIKE THIS ... but this kind of spirit that you will not want to change the code every time. It's something one day we will also do for the instruction operations to make the logic independent of the simulator so we can test different architecture changes without modifying the code. BTW.. the passion for not modifying the code is because we would like to have e.g. 20 different architectures and test which is the best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, given how you implemented the fetch of the values, in the same CSV we could have different implementations:
#operation_latency_cc_mapping
. . .
#operation_memory_interleaved_latency_cc_mapping
. . .
src/characterization.py
Outdated
|
||
OPERATIONS_MEMORY_ACCESS = ["LWD", "LWI", "SWD","SWI"] | ||
|
||
def load_operation_characterization(operation_mapping, characterization_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why take operation mapping as a parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed ✔️
Changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really improved the first stages of getting the latency. We now need to continue on the same path for the rest of the computation. From ordering the DMA accesses onwards is still seemingly too complex.
I left some extra comment here and there you might also wanna check
src/cgra.py
Outdated
@@ -20,7 +21,7 @@ | |||
srcs = ['ZERO', 'SELF', 'RCL', 'RCR', 'RCT', 'RCB', 'R0', 'R1', 'R2', 'R3', 'IMM'] | |||
dsts = ['SELF', 'RCL', 'RCR', 'RCT', 'RCB','R0', 'R1', 'R2', 'R3'] | |||
regs = dsts[-4:] | |||
|
|||
flag_poll_cnt = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds pretty dangerous. This variable is being used outside this module, but only used outside. Although it's technically possible, i would strongly dis-advice it. If it's a variable being used in the context of the CGRA, i suggest adding it as an internal variable of the class and you can initialize it to 0 in the init function.
Also, please leave an extra enter between these lines and the rest. It's nice not to cramp the code to improve readability. The same happens with indenting consequent lines that are doing similar things to the same level (i.e. you align all the = equal signs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed ✔️
@@ -47,8 +48,8 @@ def print_out( prs, outs, insts, ops, reg ): | |||
elif pr == "R1" : pnt = reg[1] | |||
elif pr == "R2" : pnt = reg[2] | |||
elif pr == "R3" : pnt = reg[3] | |||
|
|||
out_string += "[" | |||
if pnt != []: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if pnt == []? The for should still be executed? What was the idea behind it?
src/characterization.py
Outdated
mem_latency_cc = adjust_latency_for_bus(self, mem_latency_cc, bus_type) | ||
if mem_latency_cc > self.max_latency_instr.latency_cc: | ||
self.max_latency_instr.latency_cc = mem_latency_cc | ||
self.max_latency_instr.instr = f'MEM ({self.max_latency_instr.instr})' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why MEM?
src/characterization.py
Outdated
def load_operation_characterization(characterization_type): | ||
operation_mapping = {} | ||
script_dir = os.path.dirname(os.path.abspath(__file__)) | ||
csv_file_path = os.path.join(script_dir, 'operation_characterization.csv') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we leave the characterization file as a parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
continue | ||
return operation_mapping | ||
|
||
operation_latency_mapping = load_operation_characterization("latency_cc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is this code executed? When the module is included? Shouldnt this be in a init_characterization or load_characterization or something? Then we should call these functions from the notebook. I haven't finished reading the code, but i find it weird that all this is working without generating any changes on the notebooks launching the examples right?
src/characterization.py
Outdated
def compute_bank_index(self, r, c) : | ||
if self.bus_type == "INTERLEAVED": | ||
if self.cells[r][c].op == "SWD": | ||
index_pos = int(((self.cells[r][c].addr - self.init_store[0]) / 4) % 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 4 and 8? What are those numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot to change those to constants thanks for the reminder ✔️
src/characterization.py
Outdated
if self.cells[k][c].op in OPERATIONS_MEMORY_ACCESS and (k, c) not in covered_accesses: | ||
covered_accesses, concurrent_accesses = update_accesses(covered_accesses, concurrent_accesses, r, c, k, self.cells[k][c].bank_index) | ||
break | ||
if self.bus_type != "INTERLEAVED": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it matter if the memory is interleaved in the reordering of the DMA? The DMA does not know which memory architecture it will be targeting, so it seems weird that it affects something on how it orders operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to a more appropriate location ✔️
src/characterization.py
Outdated
concurrent_accesses = [{} for _ in range(4)] | ||
# reorder memory accesses to group into concurrent executions | ||
# covered_accesses tracks the accesses that have already been visited | ||
for r in range(N_ROWS): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... when you explained me this logic you were just grabbing the operations and "moving them up"... This seems like just creating an NxN matrix and copying them "flattened"... why is such a complex logic required to this? I think these 4 functions could be a single one with 3 or 4 lines
longest_alu_op_latency_cc = get_latency_alu_cc(self) | ||
total_mem_latency_cc = get_latency_mem_cc(self) | ||
self.max_latency_instr.latency_cc = max(longest_alu_op_latency_cc, total_mem_latency_cc) | ||
if total_mem_latency_cc > longest_alu_op_latency_cc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if condition only serves to add "MEM" to the longest operation's name when the mem ops take longer than the non-mem ones. If this is not the case, then the name is just that of the longest non-mem operation.
src/characterization.py
Outdated
bus_type_active_row_coef = load_operation_characterization("active_row_coef") | ||
bus_type_cpu_loop_instrs = load_operation_characterization("cpu_loop_instrs") | ||
|
||
def get_latency_cc(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is self
? the cgra? the instruction? the operation? the cell?
Note that self
is a reserved parameter for the parent class instance. That is...
class Person:
def __init__( self, height ):
self.height = 0
def get_height(self):
return self.height
juan = Person( 180 )
height = juan.get_height()
self
is a parameter YOU DO NOT PASS. It is passed automatically because its the instance of the containing method. Under no circumstances use self
as a parameter name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also because then you dont know what the parameter is... like was my case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected self to "cgra" ✔️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the progress! Well done.
I left some more comments. I know you are now fixing some things with the CPU that were not working.
Please, check this review comments and the previous ones, fix that problem with the CPU accesses and then I can proceed to doing tests to merge :)
src/characterization.py
Outdated
cgra.cells[r][c].bank_index = compute_bank_index(cgra,r,c) | ||
|
||
def compute_bank_index(cgra, r, c): | ||
base_addr = cgra.init_store[0] if cgra.cells[r][c].op == "SWD" else sorted(cgra.memory)[0][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont fully understand what this base address stands for, or why it's different for the SWD...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base_addr
corresponds to the address of the first memory element (cgra_input[0][0]
). We need this value to find the position of the elements and ultimately return the corresponding bank index. (ie: 9th element accessed -> bank 0).
We need to handle SWD separately due to the starting address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why is the starting address related to the bank number?
src/characterization.py
Outdated
if cgra.memory_manager.bus_type == "INTERLEAVED": | ||
index_pos = int(((cgra.cells[r][c].addr - base_addr) / cgra.memory_manager.spacing) % cgra.memory_manager.n_banks) | ||
else: | ||
index_pos = cgra.cells[r][c].addr / cgra.memory_manager.bank_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldnt the one-to-M be another case where the index is always the same (to simulate that they always get a conflict)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
src/characterization.py
Outdated
def compute_bank_index(cgra, r, c): | ||
base_addr = cgra.init_store[0] if cgra.cells[r][c].op == "SWD" else sorted(cgra.memory)[0][0] | ||
if cgra.memory_manager.bus_type == "INTERLEAVED": | ||
index_pos = int(((cgra.cells[r][c].addr - base_addr) / cgra.memory_manager.spacing) % cgra.memory_manager.n_banks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the spacing
? if its how many bytes for each address, you may wanna call it alignment_B
or word_size_B
(my preferred)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add units to variables so that we know if they are in bits (b) , Bytes (B) or words of 4 bytes (w). My standard is to add it as a underscore and then the unit word_size_B
, banks_n
(with n
standing for something that is just a count and therefore has no units)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
src/characterization.py
Outdated
return pairs[1].index(pair) | ||
return 0 | ||
|
||
def compute_latency_cc(cgra, dependencies): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe wanna give it a name that refers to memory? because its not latency in general, but the latency of a memory access, i understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
src/memory.py
Outdated
WORD_SIZE = 4 | ||
|
||
class MEMORY: | ||
def __init__( self,bus_type="ONE-TO-M", spacing=4, n_banks=8, bank_size=32000): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice. Just add the units
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
src/memory.py
Outdated
self.bank_size = bank_size | ||
self.flag_poll_cnt = 0 | ||
|
||
def kernel_clear_memory( name, version=""): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want, you can remove the work kernel
from the beginning of these functions, as it does not make sense anymore. I like it more starting with memory as they are in the memory module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Changes: Extended the simulator so that it can provide a latency estimation for a given kernel. The code computes the total latency, as well as per-instruction latency and the latency of the longest operation for each instruction.
Tests used to verify: The latency estimations were validated by testing the provided examples as well as my own personal test cases written previously (designed to identify and test challenging patterns). To further verify the accuracy of the estimations, the reviewer can try other kernels (or write new ones) and compare with the expected results simulated on HEEPsilon.