K fold cross validation Python code for image classification

Christopher 169 Published: 08/25/2024

K fold cross validation Python code for image classification

Here is a simple implementation of k-fold cross-validation for image classification using the Keras deep learning library and TensorFlow as the backend.

# Import necessary libraries

from sklearn.model_selection import KFold

from keras.preprocessing.image import ImageDataGenerator

from keras.models import Sequential

from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

from keras.optimizers import Adam

Set path for training and testing directories

train_dir = 'path_to_training_directory'

test_dir = 'path_to_testing_directory'

Load train and test data

train_datagen = ImageDataGenerator(rescale=1./255)

validation_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(train_dir, target_size=(224, 224), batch_size=32, class_mode='categorical')

validation_generator = validation_datagen.flow_from_directory(test_dir, target_size=(224, 224), batch_size=32, class_mode='categorical')

Initialize the k-fold cross-validation

kfold = KFold(n_splits=5, shuffle=True)

Initialize accuracy and loss for each fold

accuracy = []

loss = []

for train_index, val_index in kfold.split(X):

X_train, X_val = X[train_index], X[val_index]

y_train, y_val = y[train_index], y[val_index]

Train the model on training set

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=3, validation_data=(X_val, y_val), verbose=0)

Evaluate the model on the validation set

score = model.evaluate(X_val, y_val)

accuracy.append(score[1])

loss.append(score[0])

Calculate the mean and standard deviation of the accuracy and loss

accuracy_mean = sum(accuracy) / len(accuracy)

loss_mean = sum(loss) / len(loss)

print("Mean Accuracy: ", accuracy_mean)

print("Mean Loss: ", loss_mean)

In this code:

The KFold class from scikit-learn is used to initialize the k-fold cross-validation. For each fold, we split the data into training and validation sets using the indices returned by the kfold.split method. A CNN model with convolutional layers, pooling layers, and dense layers is defined. The model is trained on the training set for 3 epochs and evaluated on the validation set using the evaluate method. The accuracy and loss are calculated for each fold and stored in lists. Finally, the mean accuracy and loss are calculated by summing up all values and dividing by the number of folds.

This code demonstrates how to perform k-fold cross-validation with a CNN model on image classification problems in Python using Keras and scikit-learn libraries.

How to import KFold cross-validation in Python?

To perform k-fold cross-validation in Python, you can use the KFold class from the sklearn.model_selection module. Here's an example of how to do it:

from sklearn.model_selection import KFold

from sklearn.datasets import load_iris

from sklearn.svm import SVC

import numpy as np

Load iris dataset

iris = load_iris()

X = iris.data[:, :2] # we only take the first two features.

y = iris.target

Create a k-fold object with 5 folds

kfold = KFold(n_splits=5, shuffle=True)

Loop over each fold in turn:

for train_index, test_index in kfold.split(X):

X_train, X_test = X[train_index], X[test_index]

y_train, y_test = y[train_index], y[test_index]

Train a support vector machine (SVM) model on the current fold

svc = SVC(kernel='linear')

svc.fit(X_train, y_train)

Evaluate the SVM model's performance on the test fold

score = svc.score(X_test, y_test)

print(f"Fold {test_index[0]}/{len(y)} - Accuracy: {score:.3f}")

In this example, we load the iris dataset and create a k-fold object with 5 folds. Then we loop over each fold in turn, training an SVM model on the current train set and evaluating its performance on the corresponding test set.

Here's what you need to know:

KFold initialization: To initialize a KFold object, pass three parameters: n_splits, which specifies the number of folds; shuffle, which determines whether to shuffle the data before splitting it into folds (default is True); and random_state, which sets the seed for the random number generator used in shuffling. Fold iteration: To iterate over each fold, use the split() method of the KFold object, which returns an iterator over tuples containing the indices of the train set and test set for each fold. Data splitting: Inside the loop, you need to split the data into the current train set (X_train, y_train) and test set (X_test, y_test) using the train_index and test_index from the previous step. Model training and evaluation: Train your model (e.g., SVM) on the train set, and evaluate its performance on the test set. In this example, we use the score() method to compute the accuracy of the trained model.

Note that when you're using k-fold cross-validation with an iterative algorithm (like a neural network or a decision tree), you'll need to re-split the data after each iteration to ensure that the model is only evaluated on unseen samples.