Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KDTT. First Draft. #277

Closed
wants to merge 4 commits into from
Closed

Conversation

seanvenadas
Copy link
Contributor

KDTT. First Draft.

@jeshraghian
Copy link
Owner

The code is functional. Some feedback:

  • We'll need some brief, easy-to-follow explanations of most sections. Assume the reader does not know what KD is.
  • Explain the purpose of distinguishing a temp student model and a final student model. It looks to me as though the temp student is for a baseline, and the final student uses KD. Describe this using math in markdown.
  • It is unclear what the difference between "Teacher Sinusoid" and "Teacher Model Prediction" is, as well as the student cases
  • It is unclear what the problem is. "Given X, do Y."

Performance issues:

  • It seems like the teacher performs worse than the student. This is causing the student +KD to fail. An alternative idea is to compress all your models into 2-3 layers (same depth), but giving the student few neurons (256?) and teacher far more (1024?) in the hidden layer. I'm not too sure what these numbers should be, but the large depth might be causing the teacher to underperform.
  • If KD doesn't work in the end, pivot this tutorial to prediction only. Define the problem clearly.

Instructions for submission gone. Cannot access the Canvas page for this class.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants