-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
270 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
# LSA Assignment 1 - Matrix-matrix multiplication | ||
|
||
The deadline for submitting this assignment is **Midnight Friday 30 August 2024**. | ||
|
||
The easiest ways to create this file are: | ||
|
||
- Write your code and answers in a Jupyter notebook, then select File -> Download as -> PDF via LaTeX (.pdf). | ||
- Write your code and answers on Google Colab, then select File -> Print, and print it as a pdf. | ||
|
||
Tasks you are required to carry out and questions you are required to answer are shown in bold below. | ||
|
||
## The assignment | ||
|
||
In this assignment, we will look at computing the product $AB$ of two matrices $A,B\in\mathbb{R}^{n\times n}$. The following snippet of code defines a function that computes the | ||
product of two matrices. As an example, the product of two 10 by 10 matrices is printed. The final line prints `matrix1 @ matrix2` - the `@` symbol denotes matrix multiplication, and | ||
Python will get Numpy to compute the product of two matrices. By looking at the output, it's possible to check that the two results are the same. | ||
|
||
```python | ||
import numpy as np | ||
|
||
|
||
def slow_matrix_product(mat1, mat2): | ||
"""Multiply two matrices.""" | ||
assert mat1.shape[1] == mat2.shape[0] | ||
result = [] | ||
for c in range(mat2.shape[1]): | ||
column = [] | ||
for r in range(mat1.shape[0]): | ||
value = 0 | ||
for i in range(mat1.shape[1]): | ||
value += mat1[r, i] * mat2[i, c] | ||
column.append(value) | ||
result.append(column) | ||
return np.array(result).transpose() | ||
|
||
|
||
matrix1 = np.random.rand(10, 10) | ||
matrix2 = np.random.rand(10, 10) | ||
|
||
print(slow_matrix_product(matrix1, matrix2)) | ||
print(matrix1 @ matrix2) | ||
``` | ||
|
||
The function in this snippet isn't very good. | ||
|
||
### Part 1: a better function | ||
**Write your own function called `faster_matrix_product` that computes the product of two matrices more efficiently than `slow_matrix_product`.** | ||
Your function may use functions from Numpy (eg `np.dot`) to complete part of its calculation, but your function should not use `np.dot` or `@` to compute | ||
the full matrix-matrix product. | ||
|
||
Before you look at the performance of your function, you should check that it is computing the correct results. **Write a Python script using an `assert` | ||
statement that checks that your function gives the same result as using `@` for random 2 by 2, 3 by 3, 4 by 4, and 5 by 5 matrices.** | ||
|
||
In a text box, **give two brief reasons (1-2 sentences for each) why your function is better than `slow_matrix_product`.** At least one of your | ||
reasons should be related to the time you expect the two functions to take. | ||
|
||
Next, we want to compare the speed of `slow_matrix_product` and `faster_matrix_product`. **Write a Python script that runs the two functions for matrices of a range of sizes, | ||
and use `matplotlib` to create a plot showing the time taken for different sized matrices for both functions.** You should be able to run the functions for matrices | ||
of size up to around 1000 by 1000 (but if you're using an older/slower computer, you may decide to decrease the maximums slightly). You do not need to run your functions for | ||
every size between your minimum and maximum, but should choose a set of 10-15 values that will give you an informative plot. | ||
|
||
### Part 2: speeding it up with Numba | ||
In the second part of this assignment, you're going to use Numba to speed up your function. | ||
|
||
**Create a copy of your function `faster_matrix_product` that is just-in-time (JIT) compiled using Numba.** To demonstrate the speed improvement acheived by using Numba, | ||
**make a plot (similar to that you made in the first part) that shows the times taken to multiply matrices using `faster_matrix_product`, `faster_matrix_product` with | ||
Numba JIT compilation, and Numpy (`@`).** Numpy's matrix-matrix multiplication is highly optimised, so you should not expect to be as fast is it. | ||
|
||
You may be able to achieve further speed up of your function by adjusting the memory layout used. The function `np.asfortanarray` will make a copy of an array that uses | ||
Fortran-style ordering, for example: | ||
|
||
```python | ||
import numpy as np | ||
|
||
a = np.random.rand(10, 10) | ||
fortran_a = np.asfortranarray(a) | ||
``` | ||
|
||
**Make a plot that compares the times taken by your JIT compiled function when the inputs have different combinations of C-style and Fortran-style ordering** | ||
(ie the plot should have lines for when both inputs are C-style, when the first is C-style and the second is Fortran-style, and so on). Focusing on the fact | ||
that it is more efficient to access memory that is close to previous accesses, **comment (in 1-2 sentences) on why one of these orderings appears to be fastest that the others**. | ||
(Numba can do a lot of different things when compiling code, so depending on your function there may or may not be a large difference: if there is little change in speeds | ||
for your function, you can comment on which ordering you might expect to be faster and why, but conclude that Numba is doing something more advanced.) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# LSA Assignment 2 - GPU Accelerated solution of Poisson problems | ||
|
||
**Note: This is the assignment from the 2021-22 academic year.** | ||
|
||
The deadline for submitting this assignment is **Midnight Friday 30 August 2024**. | ||
|
||
The easiest ways to create this file are: | ||
|
||
- Write your code and answers in a Jupyter notebook, then select File -> Download as -> PDF via LaTeX (.pdf). | ||
- Write your code and answers on Google Colab, then select File -> Print, and print it as a pdf. | ||
In this assignment we consider the solution of Poisson problems of the form | ||
|
||
$$ | ||
-\Delta u(x, y) = f(x, y) | ||
$$ | ||
with $\Delta := u_{xx} + u_{yy}$ | ||
for $(x, y)\in\Omega\subset\mathbb{R}^2$ and boundary conditions $u(x, y) = g(x, y)$ on $\Gamma :=\partial\Omega$. | ||
|
||
For all our experiments the domain $\Omega$ is the unit square $\Omega :=[0, 1]^2$. | ||
|
||
To numerically solve this problem we define grid points $x_i := ih$ and $y_j :=jh$ with $i, j=1, \dots, N$ and $h=1/(N+1)$. We can now approximate | ||
|
||
$$ | ||
-\Delta u(x_i, y_j) \approx \frac{1}{h^2}(4 u(x_i, y_j) - u(x_{i-1}, y_j) - u(x_{i+1}, y_j) - u(x_{i}, y_{j-1}) - u(x_i, y_{j+1})). | ||
$$ | ||
If the neighboring point of $(x_i, y_j)$ is at the boundary we simply use the corresponding value of the boundary data $g$ in the above approximation. | ||
|
||
The above Poisson problem now becomes the sytem of $N^2$ equations given by | ||
|
||
$$ | ||
\frac{1}{h^2}(4 u(x_i, y_j) - u(x_{i-1}, y_j) - u(x_{i+1}, y_j) - u(x_{i}, y_{j-1}) - u(x_i, y_{j+1})) = f(x_i, y_j) | ||
$$ | ||
for $i, j=1,\dots, N$. | ||
|
||
**Task 1** We first need to create a verified reference solution to this problem. Implement a function ```discretise(f, g, N)``` that takes a Python callable $f$, a Python callable $g$ and the parameter $N$ and returns a sparse CSR matrix $A$ and the corresponding right-hand side $b$ of the above discretised Poisson problem. | ||
|
||
To verify your code we use the method of manufactured solutions. Let $u(x, y)$ be the exact function $u_{exact}(x, y) = e^{(x-0.5)^2 + (y-0.5)^2}$. By taking $-\Delta u_{exact}$ you can compute the corresponding right-hand side $f$ so that this function $u_{exact}$ will be the exact solution of the Poisson equation $-\Delta u(x, y) = f(x, y)$ with boundary conditions given by the boundary data of your known $u_{exact}$. | ||
|
||
For growing values $N$ solve the linear system of equations using the `scipy.sparse.linalg.spsolve` command. Plot the maximum relative error of your computed grid values $u(x_i, y_j)$ against the exact solution $u_{exact}$ as $N$ increases. The relative error at a given point is | ||
|
||
$$ | ||
e_{rel} = \frac{|u(x_i, y_j) - u_{exact}(x_i, y_j)|}{|u_{exact}(x_i, y_j)|} | ||
$$ | ||
|
||
For your plot you should use a double logarithmic plot (```loglog``` in Matplotlib). As $N$ increases the error should go to zero. What can you conjecture about the rate of convergence? | ||
|
||
***Task 2*** With your verified code we now have something to compare a GPU code against. On the GPU we want to implement a simple iterative scheme to solve the Poisson equation. The idea is to rewrite the above discrete linear system as | ||
|
||
$$ | ||
u(x_i, y_j) = \frac{1}{4}\left(h^2f(x_i, y_j) + u(x_{i-1}, y_j) + u(x_{i+1}, y_j) + u(x_{i}, y_{j-1}) + u(x_i, y_{j+1}))\right) | ||
$$ | ||
|
||
You can notice that if $f$ is zero then the left-hand side $u(x_i, y_j)$ is just the average of all the neighbouring grid points. This motivates a simple iterative scheme, namely | ||
|
||
$$ | ||
u^{k+1}(x_i, y_j) = \frac{1}{4}\left(h^2f(x_i, y_j) + u^k(x_{i-1}, y_j) + u^k(x_{i+1}, y_j) + u^k(x_{i}, y_{j-1}) + u^k(x_i, y_{j+1}))\right). | ||
$$ | ||
|
||
In other words, the value of $u$ at the iteration $k+1$ is just the average of all the values at iteration $k$ plus the contribution from the right-hand side. | ||
|
||
Your task is to implement this iterative scheme in Numba Cuda. A few hints are in order: | ||
|
||
* Make sure that when possible you only copy data from the GPU to the host at the end of your computation. To initialize the iteration you can for example take $u=0$. You do not want to copy data after each iteration step. | ||
* You will need two global buffers, one for the current iteration $k$ and one for the next iteration. | ||
* Your compute kernel will execute one iteration of the scheme and you run multiple iterations by repeatedly calling the kernel from the host. | ||
* To check for convergence you should investigate the relative change of your values from $u^k$ to $u^{k+1}$ and take the maximum relative change as measure how accurate your solution is. Decide how you implement this (in the same kernel or through a separate kernel). Also, decide how often you check for convergence. You may not want to check in each iteration as it is an expensive operation. | ||
* Verify your GPU code by comparing against the exact discrete solution in Task 1. Generate a convergence plot of how the values in your iterative scheme converge against the exact discrete solution. For this use a few selected values of $N$. How does the convergence change as $N$ increases? | ||
* Try to optimise memory accesses. You will notice that if you consider a grid value $u(i, j)$ it will be read multiple times from the global buffer. Try to optimise memory accesses by preloading a block of values into local shared memory and have a thread block read the data from there. When you do this benchmark against an implementation where each thread just reads from global memory. | ||
|
||
***Carefully describe your computations and observations. Explain what you are doing and try to be scientifically precise in your observations and conclusions. Carefully designing and interpreting your convergence and benchmark experiments is a significant component of this assignment.*** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
# LSA Assignment 3 - Sparse matrices | ||
|
||
The deadline for submitting this assignment is **Midnight Friday 30 August 2024**. | ||
|
||
The easiest ways to create this file are: | ||
|
||
- Write your code and answers in a Jupyter notebook, then select File -> Download as -> PDF via LaTeX (.pdf). | ||
- Write your code and answers on Google Colab, then select File -> Print, and print it as a pdf. | ||
|
||
## The assignment | ||
|
||
### Part 1: Implementing a CSR matrix | ||
Scipy allows you to define your own objects that can be used with their sparse solvers. You can do this | ||
by creating a subclass of `scipy.sparse.LinearOperator`. In the first part of this assignment, you are going to | ||
implement your own CSR matrix format. | ||
|
||
The following code snippet shows how you can define your own matrix-like operator. | ||
|
||
``` | ||
from scipy.sparse.linalg import LinearOperator | ||
class CSRMatrix(LinearOperator): | ||
def __init__(self, coo_matrix): | ||
self.shape = coo_matrix.shape | ||
self.dtype = coo_matrix.dtype | ||
# You'll need to put more code here | ||
def __add__(self, other): | ||
"""Add the CSR matrix other to this matrix.""" | ||
pass | ||
def _matvec(self, vector): | ||
"""Compute a matrix-vector product.""" | ||
pass | ||
``` | ||
|
||
Make a copy of this code snippet and **implement the methods `__init__`, `__add__` and `matvec`.** | ||
The method `__init__` takes a COO matrix as input and will initialise the CSR matrix: it currently includes one line | ||
that will store the shape of the input matrix. You should add code here that extracts important data from a Scipy COO to and computes and stores the appropriate data | ||
for a CSR matrix. You may use any functionality of Python and various libraries in your code, but you should not use an library's implementation of a CSR matrix. | ||
The method `__add__` will overload `+` and so allow you to add two of your CSR matrices together. | ||
The `__add__` method should avoid converting any matrices to dense matrices. You could implement this in one of two ways: you could convert both matrices to COO matrices, | ||
compute the sum, then pass this into `CSRMatrix()`; or you could compute the data, indices and indptr for the sum, and use these to create a SciPy CSR matrix. | ||
The method `matvec` will define a matrix-vector product: Scipy will use this when you tell it to use a sparse solver on your operator. | ||
|
||
**Write tests to check that the `__add__` and `matvec` methods that you have written are correct.** These test should use appropriate `assert` statements. | ||
|
||
For a collection of sparse matrices of your choice and a random vector, **measure the time taken to perform a `matvec` product**. Convert the same matrices to dense matrices and **measure | ||
the time taken to compute a dense matrix-vector product using Numpy**. **Create a plot showing the times of `matvec` and Numpy for a range of matrix sizes** and | ||
**briefly (1-2 sentence) comment on what your plot shows**. | ||
|
||
For a matrix of your choice and a random vector, **use Scipy's `gmres` and `cg` sparse solvers to solve a matrix problem using your CSR matrix**. | ||
Check if the two solutions obtained are the same. | ||
**Briefly comment (1-2 sentences) on why the solutions are or are not the same (or are nearly but not exactly the same).** | ||
|
||
### Part 2: Implementing a custom matrix | ||
Let $\mathrm{A}$ by a $2n$ by $2n$ matrix with the following structure: | ||
|
||
- The top left $n$ by $n$ block of $\mathrm{A}$ is a diagonal matrix | ||
- The top right $n$ by $n$ block of $\mathrm{A}$ is zero | ||
- The bottom left $n$ by $n$ block of $\mathrm{A}$ is zero | ||
- The bottom right $n$ by $n$ block of $\mathrm{A}$ is dense (but has a special structure defined below) | ||
|
||
In other words, $\mathrm{A}$ looks like this, where $*$ represents a non-zero value | ||
|
||
$$ | ||
\mathrm{A}=\begin{pmatrix} | ||
*&0&0&\cdots&0&\hspace{3mm}0&0&\cdots&0\\ | ||
0&*&0&\cdots&0&\hspace{3mm}0&0&\cdots&0\\ | ||
0&0&*&\cdots&0&\hspace{3mm}0&0&\cdots&0\\ | ||
\vdots&\vdots&\vdots&\ddots&0&\hspace{3mm}\vdots&\vdots&\ddots&\vdots\\ | ||
0&0&0&\cdots&*&\hspace{3mm}0&0&\cdots&0\\[3mm] | ||
0&0&0&\cdots&0&\hspace{3mm}*&*&\cdots&*\\ | ||
0&0&0&\cdots&0&\hspace{3mm}*&*&\cdots&*\\ | ||
\vdots&\vdots&\vdots&\ddots&\vdots&\hspace{3mm}\vdots&\vdots&\ddots&\vdots\\ | ||
0&0&0&\cdots&0&\hspace{3mm}*&*&\cdots&* | ||
\end{pmatrix} | ||
$$ | ||
|
||
Let $\tilde{\mathrm{A}}$ be the bottom right $n$ by $n$ block of $\mathrm{A}$. | ||
Suppose that $\tilde{\mathrm{A}}$ is a matrix that can be written as | ||
|
||
$$ | ||
\tilde{\mathrm{A}} = \mathrm{T}\mathrm{W}, | ||
$$ | ||
where $\mathrm{T}$ is a $n$ by 2 matrix (a tall matrix); | ||
and | ||
where $\mathrm{W}$ is a 2 by $n$ matrix (a wide matrix). | ||
|
||
**Implement a Scipy `LinearOperator` for matrices of this form**. Your implementation must include a matrix-vector product (`matvec`) and the shape of the matrix (`self.shape`), but | ||
does not need to include an `__add__` function. In your implementation of `matvec`, you should be careful to ensure that the product does not have more computational complexity then necessary. | ||
|
||
For a range of values of $n$, **create matrices where the entries on the diagonal of the top-left block and in the matrices $\mathrm{T}$ and $\mathrm{W}$ are random numbers**. | ||
For each of these matrices, **compute matrix-vector products using your implementation and measure the time taken to compute these**. Create an alternative version of each matrix, | ||
stored using a Scipy or Numpy format of your choice, | ||
and **measure the time taken to compute matrix-vector products using this format**. **Make a plot showing time taken against $n$**. **Comment (2-4 sentences) on what your plot shows, and why you think | ||
one of these methods is faster than the other** (or why they take the same amount of time if this is the case). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# LSA Assignment 4 - Time-dependent problems | ||
|
||
The deadline for submitting this assignment is **Midnight Friday 30 August 2024**. | ||
|
||
The easiest ways to create this file are: | ||
|
||
- Write your code and answers in a Jupyter notebook, then select File -> Download as -> PDF via LaTeX (.pdf). | ||
- Write your code and answers on Google Colab, then select File -> Print, and print it as a pdf. | ||
|
||
Consider a square plate with sides $[−1, 1] × [−1, 1]$. At time t = 0 we are heating the plate up | ||
such that the temperature is $u = 5$ on one side and $u = 0$ on the other sides. The temperature | ||
evolves according to $u_t = \Delta u$. At what time $t^*$ does the plate reach $u = 1$ at the center of the plate? | ||
Implement a finite difference scheme and try with explicit and implicit time-stepping. Numerically investigate the stability of your schemes. | ||
By increasing the number of discretisation points demonstrate how many correct digits you can achieve. Also, | ||
plot the convergence of your computed time $t^*$ against the actual time. To 12 digits the wanted | ||
solution is $t^* = 0.424011387033$. | ||
|
||
A GPU implementation of the explicit time-stepping scheme is not necessary but would be expected for a very high mark beyond 80%. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters