If you haven’t yet discovered the competitive machine learning site kaggle.com, please do so now. I’ll wait.
Great – so, you checked it out, fell in love and have made it back. I recently downloaded the data for the getting started competition. It consists of 42000 labelled images (28×28) of hand written digits 0-9. The competition is a straight forward supervised learning problem of OCR (Optical Character Recognition). There are two sample R scripts on the site to get you started. They implement the k-nearest neighbours and Random Forest algorithms.
I wanted to get started by visualizing all of the training data by rendering some sort of an average of each character. Visualizing the data is a great first step to developing a model. Here’s how I did it:
Which gives you:
Notice the wobbly looking ‘1’. You can see that there is some variance in the angle of…
View original post 127 more words
Posted on August 30, 2012, in Uncategorized. Bookmark the permalink. Leave a comment.
Leave a comment