Skip to content

Commit

Permalink
Quartz sync: Dec 26, 2023, 10:40 AM
Browse files Browse the repository at this point in the history
  • Loading branch information
ps4vs committed Dec 26, 2023
1 parent cb3575c commit 4381a54
Show file tree
Hide file tree
Showing 7 changed files with 70 additions and 6 deletions.
Empty file added Untitled.md
Empty file.
37 changes: 37 additions & 0 deletions content/notes/RL Alpha.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
title: "Reinforcement Learning: Alpha"
tags:
- seed
enableToc: false
---
### Introduction
Reinforcement Learning (RL) is a computational approach uniquely focused on goal-directed agents interacting with an uncertain environment, ie, more towards AGI. It is **"learning from interaction"** featuring "trail and error" and "delayed reward".

The four main subelements of RL are the Policy, Reward Signal, Value Function, and Model:
- **Policy**: The agent's strategy of action at a given time.
- **Reward Signal**: An indicator of immediate success.
- **Value Function**: A measure of long-term success.
- **Model**: The agent's internal representation of the environment, aiding in planning and predicting future outcomes.
### **Flip!!!**
According to [Reinforcement Learning, Fast and Slow](https://www.cell.com/action/showPdf?pii=S1364-6613%2819%2930061-0), Machine Learning methods are ***sample inefficient*** for given reasons and factors
1. **Incremental parameter adjustment**
- The adjustments made during learning must be small, in order to maximize generalization and avoid overwriting the effects of earlier learning
2. **Weak inductive bias**
- According to learning theory every learning procedure necessarily faces bias-variance tradeoff.
- Generic neural networks are extremely low-bias learning systems, ie, they will be able to master a wider range of patterns (higher variance) but will in general be less sample-efficient.
Furthermore, RL's ***sample inefficiency*** is exacerbated as the agent must self-determine labels (i.e., actions leading to higher rewards) through environmental interaction, unlike in supervised learning where labeled data is readily available.

### Tic Tac Toe Case Study
In a practical application, I used RL to train an agent to play Tic-Tac-Toe using the Temporal Difference Method. [The code for RL agent learning to play Tic-Tac-Toe using Temporal Difference Method.](https://github.com/ps4vs/Deep-RL/blob/main/Chapter-1/TicTacToe.ipynb)

Key insights include:
1. **Adaptive Strategy**: The agent learns to counter repetitive strategies, such as blocking repeated moves.
2. **Learning and Unlearning**: The agent's learning is dynamic, influenced by the nature of temporal updates (positive or negative).
3. **Challenge with Random Play**: Learning efficiency drops significantly against a randomly moving opponent due to limited informative feedback.
4. **Learning Through Self-Play**: When the RL agent plays Tic-Tac-Toe against itself, it experiences a unique learning environment. Since both players (first and second) are the same agent, they update value functions of different sets of states based on their position in the game.






16 changes: 16 additions & 0 deletions content/notes/Reinforcement Learning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
title: Reinforcement Learning
tags:
- seed
enableToc: false
---
These are some notes/projects I made related to Reinforcement Learning
## Blogs
- [[RL Alpha |Reinforcement Learning: Alpha]]

## Resources:
- [Deep RL Hugging Face](https://huggingface.co/learn/deep-rl-course)
- [Reinforcement Learning: An Introduction, Richard Sutton and Andrew G](http://incompleteideas.net/book/RLbook2020.pdf)
- [Foundations of Deep RL Series, by Pieter Abbeel](https://youtu.be/Psrhxy88zww)


Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: anySTOCKnow
title: any.STOCK.now
tags:
- seed
enableToc: false
Expand Down
17 changes: 14 additions & 3 deletions content/notes/epitomes.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,31 @@ tags:
- seed
---
some of the notes, blogs, projects that I look up to.
### AI maniacs
- [neural-link project foundation](https://projectfoundation.notion.site/projectfoundation/Project-69-cd70c0c75c00430fb2441222f579eacc)
- https://ayushtues.github.io/
### Blogs
- [lilianweng](https://lilianweng.github.io/)

- https://ayushtues.medium.com/
### Notes
- [ishan.coffee](https://www.ishan.coffee/)
- [param.codes](https://notes.param.codes/)
- [divyanshgarg](https://divyanshgarg.com/)

Talks
### Talks
- [Transformers United](https://www.youtube.com/watch?v=ylEk1TE1uBo)

SE
### SE
- https://medium.com/@narenarya

### Websites
- https://zhuanzhi.ai/topic/2001320766352755
- https://www.zhihu.com/people/dilab-46
- https://note.com/npaka

### Repositories
- [OpenDILab - RL, MultiModality, AGI, Diffusion](https://github.com/opendilab)




Expand Down
2 changes: 1 addition & 1 deletion content/notes/mymains.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ tags:
- evergreen
---


[neurallink notes](https://projectfoundation.notion.site/projectfoundation/Project-69-cd70c0c75c00430fb2441222f579eacc)



Expand Down
2 changes: 1 addition & 1 deletion content/notes/web development.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ enableToc: false
---
These are some notes/projects I made while hacking around web development asynchronously and on an as-needed basis.

[[anySTOCKnow| Blog on node.js, express.js and express-handlebars.js ]]
[[any.STOCK.now| Blog on node.js, express.js and express-handlebars.js ]]

0 comments on commit 4381a54

Please sign in to comment.