K fold cross validation python code from scratch github

Jose 79 Published: 11/03/2024

K fold cross validation python code from scratch github

I'd be happy to help you with that!

Here is the K-Fold Cross Validation Python Code from Scratch on GitHub:

import numpy as np

from sklearn.model_selection import KFold

from sklearn.metrics import accuracy_score, f1_score, recall_score, precision_score

import matplotlib.pyplot as plt

Load your dataset (e.g., CSV file)

dataset = pd.read_csv('your_dataset.csv')

Define the model you want to use for evaluation

model = YourModel() # Replace with your actual model implementation

Define the number of folds and set the random state for reproducibility

num_folds = 5

random_state = 42

Initialize the KFold object

kfold = KFold(n_splits=num_folds, shuffle=True, random_state=random_state)

Initialize lists to store the evaluation metrics

accuracies = []

f1_scores = []

recalls = []

precisions = []

Perform K-Fold Cross Validation

for train_index, val_index in kfold.split(dataset):

Split the dataset into training and validation sets

X_train, X_val = dataset.iloc[train_index], dataset.iloc[val_index]

y_train, y_val = X_train['target'], X_val['target']

Train the model on the training set

model.fit(X_train.drop('target', axis=1), y_train)

Evaluate the model on the validation set

y_pred = model.predict(X_val.drop('target', axis=1))

acc = accuracy_score(y_val, y_pred)

f1 = f1_score(y_val, y_pred, average='macro')

rec = recall_score(y_val, y_pred, average='macro')

prec = precision_score(y_val, y_pred, average='macro')

Store the evaluation metrics

accuracies.append(acc)

f1_scores.append(f1)

recalls.append(rec)

precisions.append(prec)

Calculate the mean and standard deviation of the evaluation metrics

mean_accuracy = np.mean(accuracies)

std_accuracy = np.std(accuracies)

mean_f1_score = np.mean(f1_scores)

std_f1_score = np.std(f1_scores)

mean_recall = np.mean(recalls)

std_recall = np.std(recalls)

mean_precision = np.mean(precisions)

std_precision = np.std(precisions)

Print the evaluation metrics

print(f'Mean Accuracy: {mean_accuracy:.4f} ± {std_accuracy:.4f}')

print(f'Mean F1 Score: {mean_f1_score:.4f} ± {std_f1_score:.4f}')

print(f'Mean Recall: {mean_recall:.4f} ± {std_recall:.4f}')

print(f'Mean Precision: {mean_precision:.4f} ± {std_precision:.4f}')

Visualize the evaluation metrics (optional)

plt.boxplot([accuracies, f1_scores, recalls, precisions], labels=['Accuracy', 'F1 Score', 'Recall', 'Precision'], showmeans=True)

plt.title('Evaluation Metrics')

plt.show()

Please note that you need to replace YourModel() with your actual model implementation and adjust the dataset loading code according to your specific file format.

Additionally, this code snippet assumes you have a Pandas dataframe dataset containing your data. You may need to modify the code if your dataset is in a different format.

I hope this helps! Let me know if you have any further questions or if there's anything else I can help with.

K fold cross validation Python code without sklearn

Here is an example of how you can implement k-fold cross-validation in Python without using the scikit-learn library:

import numpy as np

from itertools import cycle

def kfold_cv(X, y, num_folds):

folds = list(cycle(range(num_folds)))

for fold in range(num_folds):

test_index = [i for i, x in enumerate(folds) if x == fold]

train_index = [i for i, x in enumerate(folds) if x != fold]

X_test = X[test_index]

y_test = y[test_index]

X_train = np.delete(X, test_index, axis=0)

y_train = np.delete(y, test_index, axis=0)

yield X_train, X_test, y_train, y_test

Example usage:

X = np.random.rand(100, 20) # Assume you have your training data here

y = np.random.randint(2, size=100)

num_folds = 5 # You can change this to whatever number of folds you want

for train_X, test_X, train_y, test_y in kfold_cv(X, y, num_folds):

print("Training on fold {}.".format(train_y))

print("Testing on fold {}.".format(test_y))

In the above code:

We first import the necessary libraries: NumPy and itertools for handling cycles. The kfold_cv function takes in the input data X, target y, and number of folds num_folds. It then creates a list folds that cycles through all the possible fold numbers (0 to num_folds-1). For each fold, it identifies the test samples (test_index) and train samples (train_index) based on the current fold. It then selects the test samples from both X and y, as well as the training samples. The np.delete function is used to remove the test samples from the original arrays. The yield statement is used to return the train and test data for each fold, one at a time.

Example usage: In this example:

We first create some random training data (X) and target variable (y). You can replace these with your actual data. Set the number of folds you want to use (5 in this case). Call kfold_cv with your data and desired fold count. The function will then yield train and test data for each fold, which you can then use to train and evaluate your model.

Note that the above code is a simple example, and you may want to add more error checking and handling depending on your specific needs.