Using PCA for digits recognition in MNIST using python

Here is a simple method for handwritten digits detection in python, still giving almost 97% success at MNIST.

We use python-mnist to simplify working with MNIST, PCA for dimentionality reduction, and KNeighborsClassifier from sklearn for classification.

Here are 20 principal components for MNIST training set obtained with this method:

And this is the recognition result:

tested 10000 digits

correct: 9689 wrong: 311

error rate: 3.11 %

got correctly 96.89 %

Here is the code:

def recognizePCA(train, trainlab, test, labels, num=None):

    if num is None:
        num=len(test)

    train4pca = np.array(train)
    test4pca = np.array(test)

    n_components = 20

    #fitting pca
    pca = RandomizedPCA(n_components=n_components).fit(train4pca)



    xtrain = pca.transform(train4pca)

    xtest = pca.transform(test4pca)

    clf = KNeighborsClassifier()

    #fitting knn
    clf = clf.fit(xtrain, trainlab)

    #predicting
    y_pred = clf.predict(xtest[:num])



#checking the result and printing output
    r=0
    w=0
    for i in range(num):
        if y_pred[i] == labels[i]:
            r+=1
        else:
            w+=1
    print "tested ", num, " digits"
    print "correct: ", r, "wrong: ", w, "error rate: ", float(w)*100/(r+w), "%"
    print "got correctly ", float(r)*100/(r+w), "%"

    return pca.components_

Usage:

from mnist import MNIST
import numpy as np
from sklearn.decomposition import RandomizedPCA
from sklearn.neighbors import KNeighborsClassifier

mndata = MNIST('path_to_mnist')
trainims, trainlabels = mndata.load_training()
ims, labels = mndata.load_testing()

recognizePCA(trainims, trainlabels, ims, labels)
Written on May 18, 2015