-
Notifications
You must be signed in to change notification settings - Fork 1
/
mlgrp43log.txt
258 lines (258 loc) · 13.7 KB
/
mlgrp43log.txt
1
<ujdcodr> : Let us begin<ujdcodr> : So does everyone remember that creepy looking sigmoid function? We're going to talk more about it<ujdcodr> : 1/(1+e^-x)<ujdcodr> : and the cost function is rewritten as y*(- log ( hyp )) + (1-y) * (- log ( 1 - hyp ))<ujdcodr> : no doubts about this right?<ujdcodr> : when y=1 we get the first half, when y=0 we get the second half of the cost function<ujdcodr> : where hyp is the sigmoid function<ujdcodr> : http://www.holehouse.org/mlclass/12_Support_Vector_Machines_files/Image%20[5].png<ujdcodr> : take a look at the image<ujdcodr> : it's just a plot when y=1 and hyp is substitued with the sigmoid function<ujdcodr> : *substituted<ujdcodr> : http://www.holehouse.org/mlclass/12_Support_Vector_Machines_files/Image%20[6].png<ujdcodr> : and this is when y=0<ujdcodr> : has everyone seen the image?<Swastik> : yeah<ujdcodr> : because we are going to define a new cost function<ujdcodr> : a small modification<ujdcodr> : http://www.holehouse.org/mlclass/12_Support_Vector_Machines_files/Image%20[7].png<ujdcodr> : http://www.holehouse.org/mlclass/12_Support_Vector_Machines_files/Image%20[8].png<ujdcodr> : notice the region of the curve that has been thresholded<ujdcodr> : in the first image anything beyond 1 is directly set to 0<ujdcodr> : and in the second image anything before -1 is set to 0<ujdcodr> : and then we have a straight line<ujdcodr> : how is all this helpful?<ujdcodr> : it Gives the SVM a computational advantage and an easier optimization problem<ujdcodr> : since the values are more clear and discrete, the algorithm will have an easier time computing the "right" result<ujdcodr> : everyone clear?<ujdcodr> : with the original and modified cost functions?<Swastik> : yeah<dharma> : yes<VS> : Yes<ujdcodr> : questions? anyone? This has a lot of math internally, but we're not going to go there<ujdcodr> : just understand that this modification reduces computational complexity<ujdcodr> : So what i defined here is an SVM cost function<Swastik> : How is better than the original one?<ujdcodr> : it works like a threshold<Swastik> : okay<ujdcodr> : there won't be values confused between 0 and 1<ujdcodr> : you will get either 0 or whatever lies on the slope of that straight line<Swastik> : Got it<ujdcodr> : which is much better than dealing with constantly varying values ona logarithmic curve right?<Swastik> : yeah<ujdcodr> : Moving on<ujdcodr> : What exactly is an SVM?<ujdcodr> : for that i'll have to tell you what a support vector is<ujdcodr> : https://www.analyticsvidhya.com/wp-content/uploads/2015/10/SVM_1.png<ujdcodr> : take a look<ujdcodr> : Support Vectors are simply the co-ordinates of individual observation.<ujdcodr> : they are just your "yes" and "no" that are graphed<ujdcodr> : That line that divides the 2 clusters is the "Machine"<ujdcodr> : “Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However, it is mostly used in classification problems.<ujdcodr> : In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate<ujdcodr> : Then, we perform classification by finding the hyper-plane that differentiate the two classes very well<ujdcodr> : From a bird's eye view... that's all that's there to SVMs<ujdcodr> : A simple concept, but a lot of mathematical rigor<ujdcodr> : We're not done yet<ujdcodr> : I will give you a small idea of the inner workings<ujdcodr> : But first let's the the "hyper plane" idea right<ujdcodr> : https://www.analyticsvidhya.com/wp-content/uploads/2015/10/SVM_21.png<ujdcodr> : which of A,B,C is the best hyperplane?<Swastik> : B?<ujdcodr> : Correct.... bottom line of SVMs:“Select the hyper-plane which segregates the two classes better” <ujdcodr> : every one clear with that?<Swastik> : yes<Ravali> : yeah<Vikram_> : Yes<dharma> : yes<ujdcodr> : the next one is straight forward but has a small concept behind it<ujdcodr> : https://www.analyticsvidhya.com/wp-content/uploads/2015/10/SVM_3.png<ujdcodr> : what abt this one?<dharma> : all<Swastik> : C?<ujdcodr> : @dharma yes...but which is the best one?<ujdcodr> : remember what SVM looks for<dharma> : den c<VS> : B<ujdcodr> : an SVM is also known as a "Large Margin Classifier"<ujdcodr> : Here, maximizing the distances between nearest data point (either class) and hyper-plane will help us to decide the right hyper-plane. This distance is called as Margin<ujdcodr> : you want to choose the line that's the farthest from the nearest support vectors of both classes<ujdcodr> : Hence you're trying to maximize the margin<ujdcodr> : Another lightning reason for selecting the hyper-plane with higher margin is robustness. If we select a hyper-plane having low margin then there is high chance of miss-classification.<ujdcodr> : So if you chose A, a new data entry that might actually belong to the red circles might get classified as a blue star<ujdcodr> : hence "miss classified"<ujdcodr> : https://www.analyticsvidhya.com/wp-content/uploads/2015/10/SVM_5.png<ujdcodr> : how about this one?<Swastik> : B?<ujdcodr> : anyone else?<Vikram_> : B<VS> : B<dharma> : B<ujdcodr> : So what was your logic?<ujdcodr> : everyone.. just type a one line explanation <ujdcodr> : keep it simple<dharma> : high margin<Swastik> : ignoring the one misclassified value,the margin was high<ujdcodr> : "ignoring the one misclassified value", i didn't ask you to ignore it. Why did you?<ujdcodr> : XD<ujdcodr> : here is the catch, SVM selects the hyper-plane which classifies the classes accurately prior to maximizing margin<Vikram_> : If we choose A, probability of misclassification of red will high<ujdcodr> : SVM isn't worried about that initially<dharma> : okey<ujdcodr> : every algorithm has steps right? If-else conditions<ujdcodr> : maximizing the margin is not a priority at this point of time<ujdcodr> : we need to classify<ujdcodr> : and that's exactly what SVM does<ujdcodr> : the fact that there is a blue star over there in the distance means that any other example around that region has high chances of being a blue star<ujdcodr> : makes sense everyone?<ujdcodr> : 1. Classify<ujdcodr> : 2. Maximize margine<ujdcodr> : Therefore, the right hyper-plane is A<ujdcodr> : clear?<dharma> : yes<Swastik> : yeah<Vikram_> : Yes<ujdcodr> : good<Swastik> : Well,doesn't it tend to overfit?<ujdcodr> : Better than underfitting isn't it?<ujdcodr> : At the end of Machine Learning, everything is a probability right?<Swastik> : yeah<Swastik> : Would chosing B be underfitting the data?<ujdcodr> : So if the algo gives the correct answer atleast 95% of the time, might as well take it<Swastik> : okay<ujdcodr> : who knows if you chose B as the hyper plane and the next 10 unlabeled examples happend to actually be blue stars<ujdcodr> : but you classified them under red circles? It's just probability, can't help it<Swastik> : Yeah agreed<ujdcodr> : https://www.analyticsvidhya.com/wp-content/uploads/2015/10/SVM_61.png<ujdcodr> : so what do we say about this one?<ujdcodr> : Is there a linear hyperplane for this?<Swastik> : No<ujdcodr> : This happens to be the third case of SVM<ujdcodr> : that blue star in the distance can be marked as an "outlier"<ujdcodr> : SVM has a feature to ignore outliers and find the hyper-plane that has maximum margin. Hence, we can say, SVM is robust to outliers<ujdcodr> : So literally you can ignore these "random cases"<ujdcodr> : think of the cancer patient example<ujdcodr> : with the parameter as tumor size<ujdcodr> : Anyone cancer with a tumor that large has to be harmful(malignant)<ujdcodr> : but there are stray cases(people call these miracles) where they are harmless(benign)<ujdcodr> : so while the SVM skims through the data set, if there are a few points that prevent it from getting a linear hyperplane, then it reserves the right to "ignore" those points<ujdcodr> : So the basic steps of SVM are as follows:<ujdcodr> : 1. Try to find a line that divides the classes<ujdcodr> : 2. Try to maximize the margin between the 2 classes<ujdcodr> : 3. If a few examples prevent you from doing so, ignore them<ujdcodr> : Is everyone clear ewith the SVM algorithm?<Swastik> : yes<Ravali> : yes<VS> : Yes<Vikram_> : Yes<ujdcodr> : https://www.analyticsvidhya.com/wp-content/uploads/2015/10/SVM_71.png<ujdcodr> : that's the hyperplane after ignoring<ujdcodr> : So, are you ready to have your minds blown away?<ujdcodr> : find the hyperplane for this<ujdcodr> : https://www.analyticsvidhya.com/wp-content/uploads/2015/10/SVM_8.png<ujdcodr> : so what do you think?<ujdcodr> : any ideas?<Vikram_> : Two concentric circles<Swastik> : maybe 4 planes forming a diamond shape enclosing the red ones?<Vikram_> : Otherwise may be x axis<ujdcodr> : why 2 concentric circles? we just want to separate the two classes<VS> : X axis<ujdcodr> : X axis will lead to a lot of misclassification my friend<ujdcodr> : let's stick to circles<Vikram_> : Only one circle then but it will be overfitting<Swastik> : Just one circle enclosing the red ones?<ujdcodr> : Yes @swastik, you are right?<ujdcodr> : think about it<ujdcodr> : all the red circles are in the centre<ujdcodr> : all blue stars otside it<ujdcodr> : *outside<ujdcodr> : so if I draw just one circle around the red circles region, i can be assured that anything outside "will" be a blue star <ujdcodr> : makes sense?<Swastik> : yes<Vikram_> : Yes<Vikram_> : Overfitting not an issue?<ujdcodr> : Hence hyper plane has been found<ujdcodr> : Give an example of where we might overfit?<ujdcodr> : okay it's not a "perfect" circle<Vikram_> : Margin will be less next data may cross that circle<ujdcodr> : it's oval.. but you get the idea<ujdcodr> : SVM will find the largest margin<ujdcodr> : don't worry about that<Vikram_> : Okay<ujdcodr> : even if that margin seems very small<ujdcodr> : So problem solved?<ujdcodr> : Ask yourself, remember the definition of what an SVM does<ujdcodr> : and think<ujdcodr> : did the SVM find the hyper plane<ujdcodr> : or did you find it?<ujdcodr> : "you" being a human who is able to see "non-linear" relationships in data<ujdcodr> : So if you recollect, SVM only fits "linear hyperplanes" through data<Swastik> : Yeah right!<ujdcodr> : but "we" as humans see a circle<ujdcodr> : so how do we make a dumb computer see this relation<Swastik> : As an infinite number of planes?<ujdcodr> : Everyone remembers that nightmare called "Engineering graphics"??<Vikram_> : No<ujdcodr> : Front view, side view?<ujdcodr> : top view<Swastik> : Yes<VS> : Yes<ujdcodr> : Vikram_ are you 1st year?<ujdcodr> : XD<Vikram_> : No<Vikram_> : Worst nightmare<ujdcodr> : haha<ujdcodr> : anyways<Vikram_> : So better to forget!<ujdcodr> : we're gonna give the algorithm a new perspective of the dataset<ujdcodr> : Here, we will add a new feature z=x^2+y^2<ujdcodr> : on plotting z vs x we get<ujdcodr> : https://www.analyticsvidhya.com/wp-content/uploads/2015/10/SVM_9.png<ujdcodr> : now tell me whether SVM can find a hyperplane?<Vikram_> : Yes <Swastik> : Yes<Ravali> : yes<ujdcodr> : Well it's just a mathematical transform<ujdcodr> : point closer to the centre appear lower on the z-axis<ujdcodr> : i.e. red circles<ujdcodr> : those farther away(blue stars) are higher on the z-axis<ujdcodr> : beautiful isn't it<ujdcodr> : All values for z would be positive always because z is the squared sum of both x and y<Vikram_> : Paraboloid equation<ujdcodr> : so the SVM finds the linear hyper plane<ujdcodr> : which will be of the form z+x=constant<ujdcodr> : replacing z with x^2+y^2<ujdcodr> : we get the perfect non-linear hyperplane<ujdcodr> : via SVM<ujdcodr> : But, another burning question which arises is, should we need to add this feature manually to have a hyper-plane<ujdcodr> : No, SVM has a technique called the kernel trick.<ujdcodr> : These are functions which takes low dimensional input space and transform it to a higher dimensional space i.e. it converts not separable problem to separable problem, these functions are called kernels<Vikram_> : It means that form a linear function z=fun(x,y) which fit data anf feed it to SVM?<Vikram_> : and*<ujdcodr> : Simply put, it does some extremely complex data transformations, then find out the process to separate the data based on the labels or outputs you’ve defined<ujdcodr> : If a linear model can't be found, use a transform and then repeat the procedure on a different coordinate system<ujdcodr> : And that my friends is how we use SVMs for classification problems and kernels to aid us in finding hyperplanes in non-linear cases<ujdcodr> : Yes but SVM does experience the overfitting problem<ujdcodr> : sometimes<ujdcodr> : when your data set gets to complicated<ujdcodr> : https://qph.ec.quoracdn.net/main-qimg-671ee837d3eb7d69856aeaf417c12436?convert_to_webp=true<ujdcodr> : but you can do some really great stuff<Swastik> : Damn<ujdcodr> : like this https://qph.ec.quoracdn.net/main-qimg-44a86cdf39604dbcf24f9130470e774f?convert_to_webp=true<Swastik> : It will always tend to fit all the points?<ujdcodr> : Which is why you can afford to ignore the stray examples<ujdcodr> : Yep<Vikram_> : What was 1st one?<Vikram_> : Such function exit?<Vikram_> : Exist*<Swastik> : It dint ignore the stray ones in the first one right?<ujdcodr> : Yes<ujdcodr> : Any and every function you can think of exists and can be plotted<ujdcodr> : SVM is very powerful<ujdcodr> : I just gave the basic intuition, but if you're feeling adventurous, you know where to go<ujdcodr> : And with that we conclude our session on SVMs<ujdcodr> : Thank you all for coming<Swastik> : Thanks a lot. It gave a great insight to SVM :)<VS> : Thanks<Vikram_> : Thanks<Ravali> : Thank you