-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hw3 #425
Open
FroxieYe
wants to merge
4
commits into
harvard-cs205:HW3
Choose a base branch
from
FroxieYe:HW3
base: HW3
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Hw3 #425
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
Problem 3 | ||
3.1 GHz Intel Core i7 | ||
16 GB 1867 MHz DDR3 | ||
Intel Iris Graphics 6100 1536 MB | ||
|
||
#################### Coalesced reads output ######################## | ||
coalesced reads, workgroups: 8, num_workers: 4, 0.1509724 seconds | ||
coalesced reads, workgroups: 8, num_workers: 8, 0.0738424 seconds | ||
coalesced reads, workgroups: 8, num_workers: 16, 0.04797616 seconds | ||
coalesced reads, workgroups: 8, num_workers: 32, 0.02097256 seconds | ||
coalesced reads, workgroups: 8, num_workers: 64, 0.01119976 seconds | ||
coalesced reads, workgroups: 8, num_workers: 128, 0.00631752 seconds | ||
coalesced reads, workgroups: 16, num_workers: 4, 0.07417096 seconds | ||
coalesced reads, workgroups: 16, num_workers: 8, 0.0370732 seconds | ||
coalesced reads, workgroups: 16, num_workers: 16, 0.0206708 seconds | ||
coalesced reads, workgroups: 16, num_workers: 32, 0.01128632 seconds | ||
coalesced reads, workgroups: 16, num_workers: 64, 0.00626792 seconds | ||
coalesced reads, workgroups: 16, num_workers: 128, 0.0048784 seconds | ||
coalesced reads, workgroups: 32, num_workers: 4, 0.0385776 seconds | ||
coalesced reads, workgroups: 32, num_workers: 8, 0.01847856 seconds | ||
coalesced reads, workgroups: 32, num_workers: 16, 0.01112848 seconds | ||
coalesced reads, workgroups: 32, num_workers: 32, 0.00626464 seconds | ||
coalesced reads, workgroups: 32, num_workers: 64, 0.00322904 seconds | ||
coalesced reads, workgroups: 32, num_workers: 128, 0.00254848 seconds | ||
coalesced reads, workgroups: 64, num_workers: 4, 0.01852248 seconds | ||
coalesced reads, workgroups: 64, num_workers: 8, 0.01010552 seconds | ||
coalesced reads, workgroups: 64, num_workers: 16, 0.00922304 seconds | ||
coalesced reads, workgroups: 64, num_workers: 32, 0.00324184 seconds | ||
coalesced reads, workgroups: 64, num_workers: 64, 0.00268032 seconds | ||
coalesced reads, workgroups: 64, num_workers: 128, 0.00276096 seconds | ||
coalesced reads, workgroups: 128, num_workers: 4, 0.02534048 seconds | ||
coalesced reads, workgroups: 128, num_workers: 8, 0.011612 seconds | ||
coalesced reads, workgroups: 128, num_workers: 16, 0.00621888 seconds | ||
coalesced reads, workgroups: 128, num_workers: 32, 0.00337832 seconds | ||
coalesced reads, workgroups: 128, num_workers: 64, 0.00275672 seconds | ||
coalesced reads, workgroups: 128, num_workers: 128, 0.00271792 seconds | ||
coalesced reads, workgroups: 256, num_workers: 4, 0.01766056 seconds | ||
coalesced reads, workgroups: 256, num_workers: 8, 0.00872592 seconds | ||
coalesced reads, workgroups: 256, num_workers: 16, 0.0054876 seconds | ||
coalesced reads, workgroups: 256, num_workers: 32, 0.00271448 seconds | ||
coalesced reads, workgroups: 256, num_workers: 64, 0.00279576 seconds | ||
coalesced reads, workgroups: 256, num_workers: 128, 0.00255944 seconds | ||
coalesced reads, workgroups: 512, num_workers: 4, 0.01784168 seconds | ||
coalesced reads, workgroups: 512, num_workers: 8, 0.00853328 seconds | ||
coalesced reads, workgroups: 512, num_workers: 16, 0.00477104 seconds | ||
coalesced reads, workgroups: 512, num_workers: 32, 0.00276224 seconds | ||
coalesced reads, workgroups: 512, num_workers: 64, 0.00294664 seconds | ||
coalesced reads, workgroups: 512, num_workers: 128, 0.00282192 seconds | ||
|
||
######################### Blocks output ############################## | ||
|
||
blocked reads, workgroups: 8, num_workers: 4, 0.14206752 seconds | ||
blocked reads, workgroups: 8, num_workers: 8, 0.07778488 seconds | ||
blocked reads, workgroups: 8, num_workers: 16, 0.05246424 seconds | ||
blocked reads, workgroups: 8, num_workers: 32, 0.02621472 seconds | ||
blocked reads, workgroups: 8, num_workers: 64, 0.01133264 seconds | ||
blocked reads, workgroups: 8, num_workers: 128, 0.00752464 seconds | ||
blocked reads, workgroups: 16, num_workers: 4, 0.07089848 seconds | ||
blocked reads, workgroups: 16, num_workers: 8, 0.03954336 seconds | ||
blocked reads, workgroups: 16, num_workers: 16, 0.03290536 seconds | ||
blocked reads, workgroups: 16, num_workers: 32, 0.01196296 seconds | ||
blocked reads, workgroups: 16, num_workers: 64, 0.00750312 seconds | ||
blocked reads, workgroups: 16, num_workers: 128, 0.00618032 seconds | ||
blocked reads, workgroups: 32, num_workers: 4, 0.03575768 seconds | ||
blocked reads, workgroups: 32, num_workers: 8, 0.02162808 seconds | ||
blocked reads, workgroups: 32, num_workers: 16, 0.01131608 seconds | ||
blocked reads, workgroups: 32, num_workers: 32, 0.00879392 seconds | ||
blocked reads, workgroups: 32, num_workers: 64, 0.00660272 seconds | ||
blocked reads, workgroups: 32, num_workers: 128, 0.00650592 seconds | ||
blocked reads, workgroups: 64, num_workers: 4, 0.01826832 seconds | ||
blocked reads, workgroups: 64, num_workers: 8, 0.01026856 seconds | ||
blocked reads, workgroups: 64, num_workers: 16, 0.00737376 seconds | ||
blocked reads, workgroups: 64, num_workers: 32, 0.0061656 seconds | ||
blocked reads, workgroups: 64, num_workers: 64, 0.0065076 seconds | ||
blocked reads, workgroups: 64, num_workers: 128, 0.00643744 seconds | ||
blocked reads, workgroups: 128, num_workers: 4, 0.01825976 seconds | ||
blocked reads, workgroups: 128, num_workers: 8, 0.01133552 seconds | ||
blocked reads, workgroups: 128, num_workers: 16, 0.00850736 seconds | ||
blocked reads, workgroups: 128, num_workers: 32, 0.0065948 seconds | ||
blocked reads, workgroups: 128, num_workers: 64, 0.00705792 seconds | ||
blocked reads, workgroups: 128, num_workers: 128, 0.00624128 seconds | ||
blocked reads, workgroups: 256, num_workers: 4, 0.0137916 seconds | ||
blocked reads, workgroups: 256, num_workers: 8, 0.00877536 seconds | ||
blocked reads, workgroups: 256, num_workers: 16, 0.00589888 seconds | ||
blocked reads, workgroups: 256, num_workers: 32, 0.0057244 seconds | ||
blocked reads, workgroups: 256, num_workers: 64, 0.00661848 seconds | ||
blocked reads, workgroups: 256, num_workers: 128, 0.00618208 seconds | ||
blocked reads, workgroups: 512, num_workers: 4, 0.01365016 seconds | ||
blocked reads, workgroups: 512, num_workers: 8, 0.0086716 seconds | ||
blocked reads, workgroups: 512, num_workers: 16, 0.00666688 seconds | ||
blocked reads, workgroups: 512, num_workers: 32, 0.00613208 seconds | ||
blocked reads, workgroups: 512, num_workers: 64, 0.0063936 seconds | ||
blocked reads, workgroups: 512, num_workers: 128, 0.00588088 seconds |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
|
||
Part 1: | ||
Maze 1: Finished after 915 iterations, 207.802 ms total, 0.227106010929 ms per iteration | ||
Maze 1: Finished after 532 iterations, 120.78392 ms total, 0.227037443609 ms per iteration | ||
|
||
Part 2: | ||
Maze 1: Finished after 529 iterations, 116.80128 ms total, 0.22079637051 ms per iteration | ||
Maze 2: Finished after 273 iterations, 61.12912 ms total, 0.223916190476 ms per iteration | ||
|
||
Part 3: | ||
Maze 1: Finished after 10 iterations, 3.0388 ms total, 0.30388 ms per iteration | ||
Maze 2: Finished after 10 iterations, 2.88184 ms total, 0.288184 ms per iteration | ||
|
||
Part 4: | ||
Maze 1: Finished after 10 iterations, 7.55592 ms total, 0.755592 ms per iteration | ||
Maze 2: Finished after 9 iterations, 6.84968 ms total, 0.761075555556 ms per iteration | ||
|
||
Justification: | ||
3.1 GHz Intel Core i7 | ||
Intel Iris Graphics 6100 1536 MB | ||
|
||
My computer shows that using single thread slows down the program. This can be justified by claiming for this particular problem, one will benefit more from the parallelization than reduce in memory read / write given my hardware configuration. However, this may vary if the hardware changes. | ||
|
||
Part 5: | ||
Maze 1: Finished after 10 iterations, 7.94048 ms total, 0.794048 ms per iteration | ||
Maze 2: Finished after 9 iterations, 7.23608 ms total, 0.804008888889 ms per iteration | ||
|
||
The use of min instead of atomic_min increases the total time and keeps the number of iterations unchanged, thus increases the time per iterations. However, this might vary across machines. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,20 @@ initialize_labels(__global __read_only int *image, | |
} | ||
} | ||
|
||
int minn(int x, int y) { | ||
return (x > y) ? y : x; | ||
} | ||
|
||
int maxx(int x, int y) { | ||
return (x > y) ? x : y; | ||
} | ||
|
||
int | ||
update_label(int old, int l, int r, int t, int b) | ||
{ | ||
return minn(minn(minn(minn(old, l), r), t), b); | ||
} | ||
|
||
int | ||
get_clamped_value(__global __read_only int *labels, | ||
int w, int h, | ||
|
@@ -80,18 +94,50 @@ propagate_labels(__global __read_write int *labels, | |
old_label = buffer[buf_y * buf_w + buf_x]; | ||
|
||
// CODE FOR PARTS 2 and 4 HERE (part 4 will replace part 2) | ||
|
||
// Part 2, update buffer for those within the threshold only. | ||
/* | ||
if (old_label < w*h) { | ||
buffer[buf_y * buf_w + buf_x] = labels[old_label]; | ||
} | ||
*/ | ||
|
||
if (idx_1D == 0) { | ||
int last_fetch = -1; | ||
int last_idx = -1; | ||
for (int yiter = halo; yiter < buf_h - halo; ++yiter) { | ||
for (int xiter = halo; xiter < buf_w - halo; ++xiter) { | ||
int cur_idx = yiter * buf_h + xiter; | ||
if (buffer[cur_idx] >= w*h) continue; | ||
if (buffer[cur_idx] != last_idx) { | ||
last_idx = cur_idx; | ||
last_fetch = labels[buffer[cur_idx]]; | ||
} | ||
buffer[cur_idx] = last_fetch; | ||
} | ||
} | ||
} | ||
|
||
barrier(CLK_LOCAL_MEM_FENCE); | ||
|
||
// stay in bounds | ||
if ((x < w) && (y < h)) { | ||
// CODE FOR PART 1 HERE | ||
// We set new_label to the value of old_label, but you will need | ||
// to adjust this for correctness. | ||
new_label = old_label; | ||
if (old_label >= w * h) new_label = old_label; | ||
else new_label = update_label( | ||
old_label, | ||
buffer[(buf_y-1)*buf_w + buf_x], | ||
buffer[(buf_y+1)*buf_w + buf_x], | ||
buffer[buf_y * buf_w + buf_x - 1], | ||
buffer[buf_y * buf_w + buf_x + 1]); | ||
|
||
if (new_label != old_label) { | ||
// CODE FOR PART 3 HERE | ||
// indicate there was a change this iteration. | ||
// multiple threads might write this. | ||
atomic_min(old_label + labels, new_label); | ||
//labels[old_label] = min(labels[old_label], new_label); | ||
*(changed_flag) += 1; | ||
labels[y * w + x] = new_label; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should also be done using atomic_min. |
||
} | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After parts 2 and 4, you should use buffer[buf_w * buf_y + buf_x] instead of old_label.