-
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
27 changed files
with
237 additions
and
2 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
--- | ||
title: AudioMAE | ||
tags: | ||
- sapling | ||
- paper-review | ||
enableToc: false | ||
--- | ||
## AudioMAE | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
--- | ||
title: CS231n Classification | ||
tags: | ||
- sapling | ||
- short-notes | ||
enableToc: false | ||
--- | ||
### Introduction | ||
Many other seemingly distinct Computer Vision tasks (such as object detection, segmentation) can be reduced to image classification. | ||
|
||
Image classification is solved through data-driven approach. | ||
|
||
>[!question] | ||
>Any alternatives other than data-driven approach for Image Classification. | ||
|
||
### Challenges | ||
>[!info] | ||
>Because Computer Vision algorithms take raw representation of images as a 3-D array of brightness values. | ||
Non-exhaustive challenges are: | ||
|
||
- Viewpoint variation | ||
- Scale variation | ||
- Deformation | ||
- Occlusion | ||
- Illumination conditions | ||
- Background clutter | ||
- Intra-class variation | ||
|
||
### Image Classification Pipeline | ||
- **Input**: A set of N images, each labeled with one of K different classes. [training set] | ||
- **Learning**: Use the training set to learn what every one of the classes looks like. [training a classifier] or [learning a model] | ||
- **Evaluation**: Evaluate the quality of classifier on a new set of images that it has never seen before. Predictions should match up with the true answers (which we call ground truth). | ||
|
||
### Nearest Neighbour Classifier | ||
- It is rarely used in practice. | ||
- Given the training set and test set; find nearest neighbour of each test sample using L1/L2 distance. | ||
- [k-nearest neighbour classifier] Higher values of k have a smoothing effect that makes the classifier more resistant to outliers. | ||
|
||
### Validation sets for Hyperparameter tuning | ||
- [Validation] Split the training set into training set and validation set. Use validation set to tune all hyperparameters. | ||
- At the end run a single time on the test set and report performance, ie, generalisation. | ||
- [Cross-validation] Split the training set into k-fold, where 1 fold is used as validation set in each turn, and average over all performances on validation sets is used for hyperparameter tuning. | ||
>[!question] | ||
>Cross-validation has same different trained model across different validation set selection from the k-fold sets of the training set. | ||
![[Pasted image 20240127171935.png]] | ||
![[Pasted image 20240127172003.png]] | ||
|
||
- Raw-pixel based L1/L2 distance is very counter intuitive, because of many realistically simple reasons. | ||
- K-nearest Neighbours is very computational complex during test time, and very intuitive and easy and training time. | ||
#### Copy-Notes | ||
- If there are many hyperparameters to estimate, you should err on the side of having larger validation set to estimate them effectively. | ||
- If you are concerned about the size of your validation data, it is best to split the training data into folds and perform cross-validation. | ||
- If you can afford the computational budget it is always safer to go with cross-validation (the more folds the better, but more expensive). | ||
### Relevant Blogs | ||
[[t-SNE.md|t-SNE]] | ||
[[PCA.md|PCA]] | ||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- | ||
title: Lasso vs Ridge | ||
tags: | ||
- sapling | ||
enableToc: false | ||
--- | ||
### Introduction | ||
Lasso and Ridge are regularisation methods used to find optimally complex model, which is as simple as possible while performing well on training data. | ||
|
||
![[Pasted image 20240128033726.png]] | ||
- Optimally complex model balances bias and variance. | ||
### When Lasso When Ridge | ||
- [Lasso]: To remove unnecessary features | ||
![[Pasted image 20240128033627.png]] | ||
- [Ridge]: To build robust model | ||
![[Pasted image 20240128033645.png]] | ||
|
||
### Feature Selection using Lasso | ||
- Orange contour represents the regularisation term contour and blue contour represents the error term contour. | ||
- The points where error term and regularisation terms are tangential to one another are the possible optimal solutions for which cost function can be minimised. | ||
- The chances that two contours will be tangential to one another on x or y-axis are very unlikely to happen in Ridge, so it's difficult to have sparse solution. | ||
- Since lasso has faces, corners, and sides in high-dimensions there are high chances that two contours are tangential to one another on x or y-axis. | ||
|
||
![[Pasted image 20240128033122.png]] | ||
|
||
![[Pasted image 20240128033437.png]] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
--- | ||
title: PCA | ||
tags: | ||
- sapling | ||
enableToc: true | ||
--- | ||
### Introduction | ||
|
||
Data is often high dimensional. So can't be stored directly, nor ignored completely. | ||
Dimensionality reduction techniques are: | ||
- [Filtering]: Leave most of the dimensions and concentrate only on certain dimensions | ||
- [PCA]: Project the high-dimensional data onto a lower dimensional subspace using linear or non-linear transformations (or projections). | ||
>[!info] | ||
>The basic idea is n (the number of data items) should be more than number of dimensions. | ||
![[Pasted image 20240127174656.png]] | ||
The above is an example of PCA which is a linear projection method. | ||
### Detailed Explanation | ||
|
||
- **Problem**: Approximating 2-D datapoints using lower-dimensional representation, ie, 1-D | ||
- Details: | ||
- Instead of storing 2 values for each data, we will store 1 value for each data plus vector V which is common across all the datapoints. | ||
- For each data point you have to store only this scalar value s, which gives the distance along this vector V. | ||
![[Pasted image 20240127180635.png]] | ||
- Other Details | ||
- You should choose V that will minimise the residual variance, ie, the difference b/w your original data and your projections. | ||
- It allows you to reconstruct the original data, with the least possible error. | ||
- Orthogonal projection on to the vector V. | ||
- You should pick V in the direction of biggest spread of your data. | ||
- Can extend this to multiple components. | ||
- you can repeat this process, and find second component that has second biggest variance of the data, ie, principal comp 2. | ||
![[Pasted image 20240128034433.png]] | ||
|
||
### Understanding SVD | ||
|
||
>[!question] Need for SVD | ||
>- The steps to implement PCA are expensive when X is very large or very small. | ||
>- Best way to compute principal components is by using SVD. | ||
>- SVD is one of the best linear transformation methods. | ||
**PCA Implementation** | ||
1. Subtract mean from the data. | ||
2. Scale each dimensions by its variance. | ||
3. Compute the covariance matrix S. Here X is data matrix. | ||
$$S = 1/N (X^TX)$$ | ||
4. Compute K largest eigen vectors of S. These eigen vectors are the principal components of the data set. | ||
#### What is SVD? | ||
|
||
Any matrix X, whether it is singular, square or diagonal, can be decomposed into product of three matrices; two orthogonal matrices U and V and diagonal matrix D. | ||
|
||
$$X = UDV^T$$ | ||
|
||
![[Pasted image 20240128141547.png]] | ||
|
||
[PCA using SVD] on S(co-variance matrix) is used to obtain eigen vectors and eigen values. | ||
- The columns of matrix U form the eigen vectors of S | ||
- The D matrix is a diagonal matrix, whose diagonal values are eigen values in descending order. | ||
- The eigen vectors have the same dimensions of a single datapoint. | ||
- **What does SVD has to do with Dimensionality Reduction?** | ||
- How PCA helps in dimensionality reduction? | ||
- If we reduce number of dimensions from k to q (q < k). | ||
- Number of column vectors of U would have been changed to q, ie, we are now is q-dimensional hyper-plane in a k-dimensional world. | ||
|
||
>[!notes] Intuition behind PCA using SVD Dimensionality Reduction | ||
>When we reduce the dimensions from k to q (q < k), then the points are now in q-dimensional hyper-plane in a k-dimensional world which can be stored as a q-dimensional datapoints along with eigen vectors and values indicating how much we have lost. | ||
> | ||
>![[Pasted image 20240128142944.png]] | ||
|
||
### Image recognition example | ||
|
||
![[Pasted image 20240128143200.png]] | ||
|
||
![[Pasted image 20240128143210.png]] | ||
|
||
![[Pasted image 20240128143218.png]] | ||
![[Pasted image 20240128143241.png]] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
--- | ||
title: PolyGrad | ||
tags: | ||
- paper-review | ||
enableToc: false | ||
--- | ||
## PolyGrad | ||
|
||
The paper ["World Models via Policy-Guided Trajectory Diffusion"](https://arxiv.org/abs/2312.08533) introduces novel world modelling approach "Policy-Guided Trajectory Diffusion" (PolyGrad) that is not autoregressive and generates entire on-policy trajectories in a single pass through a diffusion model. | ||
|
||
> [!info] Drawback of Autoregressive World Models | ||
> Prediction error inevitably compounds as the trajectory length grows, as they interleave predicting the next state with sampling the next action from policy. | ||
>[!question] Examples of On-policy and Off-Policy RL algorithms? | ||
> SARSA and Q-Learning respectively. | ||
> [[./on-policy-Vs-off-policy|On-Policy vs Off-Policy RL]] | ||
### Model | ||
- TBA | ||
### Method | ||
- TBA | ||
|
||
### Techniques | ||
- TBA | ||
## Generalisability: | ||
|
||
* TBA | ||
## Limitations: | ||
|
||
* TBA | ||
|
||
## Extended Research Direction: | ||
|
||
* TBA |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
--- | ||
title: On-policy vs Off-policy RL | ||
tags: | ||
- sapling | ||
enableToc: false | ||
--- | ||
### Introduction | ||
- Reinforcement Learning is learning how to map situations to actions so as to maximum a numerical reward signal | ||
- Policy is a function that maps from state to action. | ||
- There are two types of policies | ||
- **Target policy**: Policy used to optimise the decision making. | ||
- **Behaviour policy**: Policy used to take actions in the environment or policy used to navigate the environment. | ||
- Off policy RL algorithms have different behaviour and target policies, they can be decouple the data collection and training. | ||
- On policy RL algorithms have same behaviour and target policies. The agent takes actions and learns using the same policy. | ||
|
||
### Q-Learning is Off-Policy RL Algorithm | ||
|
||
- Say that the agent is randomly choosing action to execute in the environment, ie, the behaviour policy is random. | ||
- We will get Q value for (S, right), using Bellman equation $$Q(S, right) = R + max_a Q(S', a)$$ | ||
- Note that in the above equation, we are not taking the action a, it is selected based on our target policy. But it is not executed. | ||
- For most off-policy algorithms, | ||
- the target policy is greedy. | ||
- the behaviour policy can be random, $\epsilon$-greedy or greedy. | ||
- This Q(S, right) observed on taking action, will be used to update our target policy using TD method. | ||
- They can be decoupled, collecting data and learning our target policy, so Q-Learning is Off-Policy RL algorithm. | ||
|
||
>[!info] Q-Learning (off-policy TD control) for estimating $\pi \approx \pi_*$ | ||
>![[Pasted image 20240109185933.png]] | ||
>[!info] Sarsa (on-policy TD control) for estimating $Q \approx q_*$ | ||
>![[Pasted image 20240109190020.png]] |