We learn more from code, and from great code. Not necessarily always the 1st ranking solution, because we also learn what makes a stellar and just a good solution. I will post solutions I came upon so we can all learn to become better!

I collected the following source code and interesting discussions from the Kaggle held competitions for learning purposes. Not all competitions are listed because I am only manually collecting them, also some competitions are not listed due to no one sharing. I will add more as time goes by. Thank you.


Prudential Life Insurance Assessment [Mon 23 Nov 2015– Mon 15 Feb 2016]
Rossmann Store Sales [Wed 30 Sep 2015– Mon 14 Dec 2015]
Airbnb New User Bookings [Wed 25 Nov 2015– Thu 11 Feb 2016]


Walmart Recruiting: Trip Type Classification [Mon 26 Oct 2015– Sun 27 Dec 2015 ]


Algorithmic Trading Challenge

Allstate Purchase Prediction Challenge – Employee Access Challenge

AMS 2013-2014 Solar Energy Prediction Contest

Belkin Energy Disaggregation Competition

Challenges in Representation Learning: Facial Expression Recognition Challenge

Challenges in Representation Learning: The Black Box Learning Challenge

Challenges in Representation Learning: Multi-modal Learning

Detecting Insults in Social Commentary

EMI Music Data Science Hackathon

Galaxy Zoo – The Galaxy Challenge

Global Energy Forecasting Competition 2012 – Wind Forecasting

KDD Cup 2013 – Author-Paper Identification Challenge (Track 1)

KDD Cup 2013 – Author Disambiguation Challenge (Track 2)

Large Scale Hierarchical Text Classification

Loan Default Prediction – Imperial College London

Merck Molecular Activity Challenge

MLSP 2013 Bird Classification Challenge

Observing the Dark World

PAKDD 2014 – ASUS Malfunctional Components Prediction

Personalize Expedia Hotel Searches – ICDM 2013

Predicting a Biological Response

Predicting Closed Questions on Stack Overflow

See Click Predict Fix

See Click Predict Fix – Hackathon

StumbleUpon Evergreen Classification Challenge

The Analytics Edge (15.071x)

The Marinexplore and Cornell University Whale Detection Challenge

Walmart Recruiting – Store Sales Forecasting



The post Kaggle Competition Past Winner Solutions appeared first on The Big Data Blog.

Source: Kaggle Competition Past Winner Solutions

Leave a Reply

Your email address will not be published. Required fields are marked *


1 2 3
February 17th, 2016

Kaggle Competition Past Winner Solutions

We learn more from code, and from great code. Not necessarily always the 1st ranking solution, because we also learn […]

February 7th, 2016

Installing Kafka on Mac OSX

Apache Kafka is a highly-scalable publish-subscribe messaging system that can serve as the data backbone in distributed applications. With Kafka’s […]

February 5th, 2016

Lucene In-Memory Search Example and Sample Code

More sample code:  Sample code import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.queryParser.QueryParser; import*; […]

February 5th, 2016


I pro­vide a basic index­ing and retrieval code using the PyLucene 3.0 API.Lucene In Action (2nd Ed) cov­ers Lucene 3.0, but […]

January 29th, 2016

NiFi: Thinking Differently About DataFlow

Recently a question was posed to the Apache NiFi (Incubating) Developer Mailing List about how best to use Apache NiFi […]

January 29th, 2016

Apache Nifi (aka HDF) data flow across data center

Short Description: This article provides a step by step overview of how to setup cross data center data flow using […]

January 24th, 2016

Accurately Measuring Model Prediction Error

When assessing the quality of a model, being able to accurately measure its prediction error is of key importance. Often, […]

January 9th, 2016


A time series is a sequence of data points, typically consisting of successive measurements made over a time interval. Forecasting […]

January 7th, 2016

Getting Started with Markov Chains

There are number of R packages devoted to sophisticated applications of Markov chains. These include msm and SemiMarkov for fitting […]

December 26th, 2015

Hadoop filesystem at Twitter

Twitter runs multiple large Hadoop clusters that are among the biggest in the world. Hadoop is at the core of […]