Skip to content

Commit

Permalink
Added missing files
Browse files Browse the repository at this point in the history
  • Loading branch information
tbetcke committed Jul 22, 2024
1 parent 5754ff6 commit 62a4164
Show file tree
Hide file tree
Showing 5 changed files with 270 additions and 1 deletion.
84 changes: 84 additions & 0 deletions hpc_lecture_notes/2023-assignment_1-lsa.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# LSA Assignment 1 - Matrix-matrix multiplication

The deadline for submitting this assignment is **Midnight Friday 30 August 2024**.

The easiest ways to create this file are:

- Write your code and answers in a Jupyter notebook, then select File -> Download as -> PDF via LaTeX (.pdf).
- Write your code and answers on Google Colab, then select File -> Print, and print it as a pdf.

Tasks you are required to carry out and questions you are required to answer are shown in bold below.

## The assignment

In this assignment, we will look at computing the product $AB$ of two matrices $A,B\in\mathbb{R}^{n\times n}$. The following snippet of code defines a function that computes the
product of two matrices. As an example, the product of two 10 by 10 matrices is printed. The final line prints `matrix1 @ matrix2` - the `@` symbol denotes matrix multiplication, and
Python will get Numpy to compute the product of two matrices. By looking at the output, it's possible to check that the two results are the same.

```python
import numpy as np


def slow_matrix_product(mat1, mat2):
"""Multiply two matrices."""
assert mat1.shape[1] == mat2.shape[0]
result = []
for c in range(mat2.shape[1]):
column = []
for r in range(mat1.shape[0]):
value = 0
for i in range(mat1.shape[1]):
value += mat1[r, i] * mat2[i, c]
column.append(value)
result.append(column)
return np.array(result).transpose()


matrix1 = np.random.rand(10, 10)
matrix2 = np.random.rand(10, 10)

print(slow_matrix_product(matrix1, matrix2))
print(matrix1 @ matrix2)
```

The function in this snippet isn't very good.

### Part 1: a better function
**Write your own function called `faster_matrix_product` that computes the product of two matrices more efficiently than `slow_matrix_product`.**
Your function may use functions from Numpy (eg `np.dot`) to complete part of its calculation, but your function should not use `np.dot` or `@` to compute
the full matrix-matrix product.

Before you look at the performance of your function, you should check that it is computing the correct results. **Write a Python script using an `assert`
statement that checks that your function gives the same result as using `@` for random 2 by 2, 3 by 3, 4 by 4, and 5 by 5 matrices.**

In a text box, **give two brief reasons (1-2 sentences for each) why your function is better than `slow_matrix_product`.** At least one of your
reasons should be related to the time you expect the two functions to take.

Next, we want to compare the speed of `slow_matrix_product` and `faster_matrix_product`. **Write a Python script that runs the two functions for matrices of a range of sizes,
and use `matplotlib` to create a plot showing the time taken for different sized matrices for both functions.** You should be able to run the functions for matrices
of size up to around 1000 by 1000 (but if you're using an older/slower computer, you may decide to decrease the maximums slightly). You do not need to run your functions for
every size between your minimum and maximum, but should choose a set of 10-15 values that will give you an informative plot.

### Part 2: speeding it up with Numba
In the second part of this assignment, you're going to use Numba to speed up your function.

**Create a copy of your function `faster_matrix_product` that is just-in-time (JIT) compiled using Numba.** To demonstrate the speed improvement acheived by using Numba,
**make a plot (similar to that you made in the first part) that shows the times taken to multiply matrices using `faster_matrix_product`, `faster_matrix_product` with
Numba JIT compilation, and Numpy (`@`).** Numpy's matrix-matrix multiplication is highly optimised, so you should not expect to be as fast is it.

You may be able to achieve further speed up of your function by adjusting the memory layout used. The function `np.asfortanarray` will make a copy of an array that uses
Fortran-style ordering, for example:

```python
import numpy as np

a = np.random.rand(10, 10)
fortran_a = np.asfortranarray(a)
```

**Make a plot that compares the times taken by your JIT compiled function when the inputs have different combinations of C-style and Fortran-style ordering**
(ie the plot should have lines for when both inputs are C-style, when the first is C-style and the second is Fortran-style, and so on). Focusing on the fact
that it is more efficient to access memory that is close to previous accesses, **comment (in 1-2 sentences) on why one of these orderings appears to be fastest that the others**.
(Numba can do a lot of different things when compiling code, so depending on your function there may or may not be a large difference: if there is little change in speeds
for your function, you can comment on which ordering you might expect to be faster and why, but conclude that Numba is doing something more advanced.)

70 changes: 70 additions & 0 deletions hpc_lecture_notes/2023-assignment_2-lsa.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# LSA Assignment 2 - GPU Accelerated solution of Poisson problems

**Note: This is the assignment from the 2021-22 academic year.**

The deadline for submitting this assignment is **Midnight Friday 30 August 2024**.

The easiest ways to create this file are:

- Write your code and answers in a Jupyter notebook, then select File -> Download as -> PDF via LaTeX (.pdf).
- Write your code and answers on Google Colab, then select File -> Print, and print it as a pdf.
In this assignment we consider the solution of Poisson problems of the form

$$
-\Delta u(x, y) = f(x, y)
$$
with $\Delta := u_{xx} + u_{yy}$
for $(x, y)\in\Omega\subset\mathbb{R}^2$ and boundary conditions $u(x, y) = g(x, y)$ on $\Gamma :=\partial\Omega$.

For all our experiments the domain $\Omega$ is the unit square $\Omega :=[0, 1]^2$.

To numerically solve this problem we define grid points $x_i := ih$ and $y_j :=jh$ with $i, j=1, \dots, N$ and $h=1/(N+1)$. We can now approximate

$$
-\Delta u(x_i, y_j) \approx \frac{1}{h^2}(4 u(x_i, y_j) - u(x_{i-1}, y_j) - u(x_{i+1}, y_j) - u(x_{i}, y_{j-1}) - u(x_i, y_{j+1})).
$$
If the neighboring point of $(x_i, y_j)$ is at the boundary we simply use the corresponding value of the boundary data $g$ in the above approximation.

The above Poisson problem now becomes the sytem of $N^2$ equations given by

$$
\frac{1}{h^2}(4 u(x_i, y_j) - u(x_{i-1}, y_j) - u(x_{i+1}, y_j) - u(x_{i}, y_{j-1}) - u(x_i, y_{j+1})) = f(x_i, y_j)
$$
for $i, j=1,\dots, N$.

**Task 1** We first need to create a verified reference solution to this problem. Implement a function ```discretise(f, g, N)``` that takes a Python callable $f$, a Python callable $g$ and the parameter $N$ and returns a sparse CSR matrix $A$ and the corresponding right-hand side $b$ of the above discretised Poisson problem.

To verify your code we use the method of manufactured solutions. Let $u(x, y)$ be the exact function $u_{exact}(x, y) = e^{(x-0.5)^2 + (y-0.5)^2}$. By taking $-\Delta u_{exact}$ you can compute the corresponding right-hand side $f$ so that this function $u_{exact}$ will be the exact solution of the Poisson equation $-\Delta u(x, y) = f(x, y)$ with boundary conditions given by the boundary data of your known $u_{exact}$.

For growing values $N$ solve the linear system of equations using the `scipy.sparse.linalg.spsolve` command. Plot the maximum relative error of your computed grid values $u(x_i, y_j)$ against the exact solution $u_{exact}$ as $N$ increases. The relative error at a given point is

$$
e_{rel} = \frac{|u(x_i, y_j) - u_{exact}(x_i, y_j)|}{|u_{exact}(x_i, y_j)|}
$$

For your plot you should use a double logarithmic plot (```loglog``` in Matplotlib). As $N$ increases the error should go to zero. What can you conjecture about the rate of convergence?

***Task 2*** With your verified code we now have something to compare a GPU code against. On the GPU we want to implement a simple iterative scheme to solve the Poisson equation. The idea is to rewrite the above discrete linear system as

$$
u(x_i, y_j) = \frac{1}{4}\left(h^2f(x_i, y_j) + u(x_{i-1}, y_j) + u(x_{i+1}, y_j) + u(x_{i}, y_{j-1}) + u(x_i, y_{j+1}))\right)
$$

You can notice that if $f$ is zero then the left-hand side $u(x_i, y_j)$ is just the average of all the neighbouring grid points. This motivates a simple iterative scheme, namely

$$
u^{k+1}(x_i, y_j) = \frac{1}{4}\left(h^2f(x_i, y_j) + u^k(x_{i-1}, y_j) + u^k(x_{i+1}, y_j) + u^k(x_{i}, y_{j-1}) + u^k(x_i, y_{j+1}))\right).
$$

In other words, the value of $u$ at the iteration $k+1$ is just the average of all the values at iteration $k$ plus the contribution from the right-hand side.

Your task is to implement this iterative scheme in Numba Cuda. A few hints are in order:

* Make sure that when possible you only copy data from the GPU to the host at the end of your computation. To initialize the iteration you can for example take $u=0$. You do not want to copy data after each iteration step.
* You will need two global buffers, one for the current iteration $k$ and one for the next iteration.
* Your compute kernel will execute one iteration of the scheme and you run multiple iterations by repeatedly calling the kernel from the host.
* To check for convergence you should investigate the relative change of your values from $u^k$ to $u^{k+1}$ and take the maximum relative change as measure how accurate your solution is. Decide how you implement this (in the same kernel or through a separate kernel). Also, decide how often you check for convergence. You may not want to check in each iteration as it is an expensive operation.
* Verify your GPU code by comparing against the exact discrete solution in Task 1. Generate a convergence plot of how the values in your iterative scheme converge against the exact discrete solution. For this use a few selected values of $N$. How does the convergence change as $N$ increases?
* Try to optimise memory accesses. You will notice that if you consider a grid value $u(i, j)$ it will be read multiple times from the global buffer. Try to optimise memory accesses by preloading a block of values into local shared memory and have a thread block read the data from there. When you do this benchmark against an implementation where each thread just reads from global memory.

***Carefully describe your computations and observations. Explain what you are doing and try to be scientifically precise in your observations and conclusions. Carefully designing and interpreting your convergence and benchmark experiments is a significant component of this assignment.***
98 changes: 98 additions & 0 deletions hpc_lecture_notes/2023-assignment_3-lsa.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# LSA Assignment 3 - Sparse matrices

The deadline for submitting this assignment is **Midnight Friday 30 August 2024**.

The easiest ways to create this file are:

- Write your code and answers in a Jupyter notebook, then select File -> Download as -> PDF via LaTeX (.pdf).
- Write your code and answers on Google Colab, then select File -> Print, and print it as a pdf.

## The assignment

### Part 1: Implementing a CSR matrix
Scipy allows you to define your own objects that can be used with their sparse solvers. You can do this
by creating a subclass of `scipy.sparse.LinearOperator`. In the first part of this assignment, you are going to
implement your own CSR matrix format.

The following code snippet shows how you can define your own matrix-like operator.

```
from scipy.sparse.linalg import LinearOperator
class CSRMatrix(LinearOperator):
def __init__(self, coo_matrix):
self.shape = coo_matrix.shape
self.dtype = coo_matrix.dtype
# You'll need to put more code here
def __add__(self, other):
"""Add the CSR matrix other to this matrix."""
pass
def _matvec(self, vector):
"""Compute a matrix-vector product."""
pass
```

Make a copy of this code snippet and **implement the methods `__init__`, `__add__` and `matvec`.**
The method `__init__` takes a COO matrix as input and will initialise the CSR matrix: it currently includes one line
that will store the shape of the input matrix. You should add code here that extracts important data from a Scipy COO to and computes and stores the appropriate data
for a CSR matrix. You may use any functionality of Python and various libraries in your code, but you should not use an library's implementation of a CSR matrix.
The method `__add__` will overload `+` and so allow you to add two of your CSR matrices together.
The `__add__` method should avoid converting any matrices to dense matrices. You could implement this in one of two ways: you could convert both matrices to COO matrices,
compute the sum, then pass this into `CSRMatrix()`; or you could compute the data, indices and indptr for the sum, and use these to create a SciPy CSR matrix.
The method `matvec` will define a matrix-vector product: Scipy will use this when you tell it to use a sparse solver on your operator.

**Write tests to check that the `__add__` and `matvec` methods that you have written are correct.** These test should use appropriate `assert` statements.

For a collection of sparse matrices of your choice and a random vector, **measure the time taken to perform a `matvec` product**. Convert the same matrices to dense matrices and **measure
the time taken to compute a dense matrix-vector product using Numpy**. **Create a plot showing the times of `matvec` and Numpy for a range of matrix sizes** and
**briefly (1-2 sentence) comment on what your plot shows**.

For a matrix of your choice and a random vector, **use Scipy's `gmres` and `cg` sparse solvers to solve a matrix problem using your CSR matrix**.
Check if the two solutions obtained are the same.
**Briefly comment (1-2 sentences) on why the solutions are or are not the same (or are nearly but not exactly the same).**

### Part 2: Implementing a custom matrix
Let $\mathrm{A}$ by a $2n$ by $2n$ matrix with the following structure:

- The top left $n$ by $n$ block of $\mathrm{A}$ is a diagonal matrix
- The top right $n$ by $n$ block of $\mathrm{A}$ is zero
- The bottom left $n$ by $n$ block of $\mathrm{A}$ is zero
- The bottom right $n$ by $n$ block of $\mathrm{A}$ is dense (but has a special structure defined below)

In other words, $\mathrm{A}$ looks like this, where $*$ represents a non-zero value

$$
\mathrm{A}=\begin{pmatrix}
*&0&0&\cdots&0&\hspace{3mm}0&0&\cdots&0\\
0&*&0&\cdots&0&\hspace{3mm}0&0&\cdots&0\\
0&0&*&\cdots&0&\hspace{3mm}0&0&\cdots&0\\
\vdots&\vdots&\vdots&\ddots&0&\hspace{3mm}\vdots&\vdots&\ddots&\vdots\\
0&0&0&\cdots&*&\hspace{3mm}0&0&\cdots&0\\[3mm]
0&0&0&\cdots&0&\hspace{3mm}*&*&\cdots&*\\
0&0&0&\cdots&0&\hspace{3mm}*&*&\cdots&*\\
\vdots&\vdots&\vdots&\ddots&\vdots&\hspace{3mm}\vdots&\vdots&\ddots&\vdots\\
0&0&0&\cdots&0&\hspace{3mm}*&*&\cdots&*
\end{pmatrix}
$$

Let $\tilde{\mathrm{A}}$ be the bottom right $n$ by $n$ block of $\mathrm{A}$.
Suppose that $\tilde{\mathrm{A}}$ is a matrix that can be written as

$$
\tilde{\mathrm{A}} = \mathrm{T}\mathrm{W},
$$
where $\mathrm{T}$ is a $n$ by 2 matrix (a tall matrix);
and
where $\mathrm{W}$ is a 2 by $n$ matrix (a wide matrix).

**Implement a Scipy `LinearOperator` for matrices of this form**. Your implementation must include a matrix-vector product (`matvec`) and the shape of the matrix (`self.shape`), but
does not need to include an `__add__` function. In your implementation of `matvec`, you should be careful to ensure that the product does not have more computational complexity then necessary.

For a range of values of $n$, **create matrices where the entries on the diagonal of the top-left block and in the matrices $\mathrm{T}$ and $\mathrm{W}$ are random numbers**.
For each of these matrices, **compute matrix-vector products using your implementation and measure the time taken to compute these**. Create an alternative version of each matrix,
stored using a Scipy or Numpy format of your choice,
and **measure the time taken to compute matrix-vector products using this format**. **Make a plot showing time taken against $n$**. **Comment (2-4 sentences) on what your plot shows, and why you think
one of these methods is faster than the other** (or why they take the same amount of time if this is the case).
18 changes: 18 additions & 0 deletions hpc_lecture_notes/2023-assignment_4-lsa.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# LSA Assignment 4 - Time-dependent problems

The deadline for submitting this assignment is **Midnight Friday 30 August 2024**.

The easiest ways to create this file are:

- Write your code and answers in a Jupyter notebook, then select File -> Download as -> PDF via LaTeX (.pdf).
- Write your code and answers on Google Colab, then select File -> Print, and print it as a pdf.

Consider a square plate with sides $[−1, 1] × [−1, 1]$. At time t = 0 we are heating the plate up
such that the temperature is $u = 5$ on one side and $u = 0$ on the other sides. The temperature
evolves according to $u_t = \Delta u$. At what time $t^*$ does the plate reach $u = 1$ at the center of the plate?
Implement a finite difference scheme and try with explicit and implicit time-stepping. Numerically investigate the stability of your schemes.
By increasing the number of discretisation points demonstrate how many correct digits you can achieve. Also,
plot the convergence of your computed time $t^*$ against the actual time. To 12 digits the wanted
solution is $t^* = 0.424011387033$.

A GPU implementation of the explicit time-stepping scheme is not necessary but would be expected for a very high mark beyond 80%.
1 change: 0 additions & 1 deletion hpc_lecture_notes/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ parts:
- caption: Conclusions
chapters:
- file: further_topics

- caption: Late Summmer Assessments
chapters:
- file: 2023-assignment_1-lsa
Expand Down

0 comments on commit 62a4164

Please sign in to comment.