programming
ML
Basics

Here's a comprehensive list of commonly used machine learning models in scikit-learn, with descriptions and syntax grouped by type. Mathematical intuition is included where applicable. Feel free to print these for reference!


1. Linear Models

These models assume a linear relationship between the features and the target variable.

1.1 Linear Regression

Syntax:

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)

Description: Predicts a continuous target variable by fitting a linear relationship between input features and the target.

Equation: [ y = w_0 + w_1 x_1 + w_2 x_2 + \dots + w_n x_n ]


1.2 Logistic Regression

Syntax:

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)

Description: Used for binary or multi-class classification by modeling the log odds of the probability of an outcome.

Equation:


2. Decision Tree Models

2.1 Decision Tree Classifier

Syntax:

from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

Description: A tree-based algorithm where decisions are made by splitting the dataset into subsets based on feature values.

Equation: Non-parametric, splits data by minimizing impurity (Gini or Entropy).


2.2 Decision Tree Regressor

Syntax:

from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor()
model.fit(X_train, y_train)

Description: Like the classifier, but predicts continuous target values.


3. Ensemble Methods

3.1 Random Forest Classifier

Syntax:

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)

Description: Combines multiple decision trees to improve classification accuracy. Each tree is trained on a random subset of the data.


3.2 Random Forest Regressor

Syntax:

from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()
model.fit(X_train, y_train)

Description: Similar to Random Forest Classifier, but used for regression tasks.


3.3 Gradient Boosting Classifier

Syntax:

from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier()
model.fit(X_train, y_train)

Description: Builds trees sequentially, where each tree corrects the mistakes of the previous one by minimizing the loss.


3.4 Gradient Boosting Regressor

Syntax:

from sklearn.ensemble import GradientBoostingRegressor
model = GradientBoostingRegressor()
model.fit(X_train, y_train)

Description: Same as Gradient Boosting Classifier, but for regression tasks.


4. Support Vector Machines (SVM)

4.1 SVM Classifier

Syntax:

from sklearn.svm import SVC
model = SVC()
model.fit(X_train, y_train)

Description: Classifies data by finding a hyperplane that best separates the classes. Can be extended to non-linear decision boundaries using kernels.

Equation:


4.2 SVM Regressor

Syntax:

from sklearn.svm import SVR
model = SVR()
model.fit(X_train, y_train)

Description: Similar to SVC but used for regression tasks, fits a hyperplane in feature space.


5. Naive Bayes

5.1 Gaussian Naive Bayes

Syntax:

from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train, y_train)

Description: Assumes features are normally distributed and applies Bayes' theorem for classification.

Equation:


5.2 Multinomial Naive Bayes

Syntax:

from sklearn.naive_bayes import MultinomialNB
model = MultinomialNB()
model.fit(X_train, y_train)

Description: Used for discrete data, especially for text classification like spam filtering.


6. K-Nearest Neighbors (KNN)

6.1 KNN Classifier

Syntax:

from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)

Description: Classifies data points based on the majority label of the closest data points (neighbors).

Equation: No parametric equation; classification based on distance metrics (e.g., Euclidean).


6.2 KNN Regressor

Syntax:

from sklearn.neighbors import KNeighborsRegressor
model = KNeighborsRegressor(n_neighbors=5)
model.fit(X_train, y_train)

Description: Similar to KNN Classifier, but predicts continuous values based on nearest neighbors.


7. Clustering Models

7.1 K-Means Clustering

Syntax:

from sklearn.cluster import KMeans
model = KMeans(n_clusters=3)
model.fit(X_train)

Description: Partitions data into k clusters, where each data point belongs to the cluster with the nearest mean.

Equation:


7.2 DBSCAN

Syntax:

from sklearn.cluster import DBSCAN
model = DBSCAN(eps=0.5, min_samples=5)
model.fit(X_train)

Description: Density-based clustering that groups together closely packed points and marks points in low-density regions as outliers.


8. Neural Networks

8.1 MLP Classifier (Multi-Layer Perceptron)

Syntax:

from sklearn.neural_network import MLPClassifier
model = MLPClassifier(hidden_layer_sizes=(100,))
model.fit(X_train, y_train)

Description: Fully connected feedforward neural network for classification tasks.


8.2 MLP Regressor

Syntax:

from sklearn.neural_network import MLPRegressor
model = MLPRegressor(hidden_layer_sizes=(100,))
model.fit(X_train, y_train)

Description: Similar to MLP Classifier but used for regression tasks.


These are the main models you can use in scikit-learn, organized by type, with corresponding syntax and brief descriptions. You can print these along with the equations and stick them on your wall for easy reference!