Skip to content

Latest commit

 

History

History
33 lines (24 loc) · 1.87 KB

README.md

File metadata and controls

33 lines (24 loc) · 1.87 KB

This code is an implementation of a Decision Tree classifier from scratch using Python. Here's what each part of the code does:

  1. Entropy Calculation (entropy function):

    • Calculates the entropy of a given set of labels (y) using the formula for entropy in information theory.
  2. Node Class:

    • Represents nodes in the decision tree.
    • Stores information about each node, including the feature to split on, threshold value, child nodes (left and right), and value (if the node is a leaf node).
  3. DecisionTree Class:

    • Initializes and builds the decision tree.
    • Contains methods to fit the tree to the training data (fit), predict labels for new data (predict), and grow the tree recursively (_grow_tree).
  4. fit Method:

    • Grows the tree from the root node using the _grow_tree method.
  5. _grow_tree Method:

    • Recursively builds the decision tree based on conditions like maximum depth, minimum samples to split a node, and number of unique labels.
    • Conducts a greedy search to find the best feature and threshold to split on, creating child nodes accordingly.
  6. Other Helper Methods:

    • _best_criteria: Determines the best feature and threshold to split on based on information gain.
    • _information_gain: Calculates the information gain using entropy.
    • _split: Splits the data based on a threshold value.
    • _most_common_label: Finds the most common label in a given set.
  7. predict Method:

    • Uses the trained decision tree to predict labels for new data by traversing the tree.
  8. _traverse_tree Method:

    • Traverses the tree recursively to predict the label for a single data point.

This implementation constructs a decision tree by recursively choosing splits that maximize information gain, using entropy as a measure of impurity. The tree is then used for prediction based on the learned structure.