Kaggle Competition Past Winner Solutions

We learn more from code, and from great code. Not necessarily always the 1st ranking solution, because we also learn […]

Posted in GBM, installation, kaggle, Neural Network, Solutions, SVM, XGboost | Tagged | Leave a comment

Installing Kafka on Mac OSX

Apache Kafka is a highly-scalable publish-subscribe messaging system that can serve as the data backbone in distributed applications. With Kafka’s […]

Posted in installation, kafka | Tagged | Leave a comment

Lucene In-Memory Search Example and Sample Code

More sample code: https://github.com/fnp/pylucene/tree/master/samples/LuceneInAction  Sample code import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.*; […]

Posted in In-Memory, installation, Lucence, machine learning, Natural language Processing, Text Processing | Tagged | Leave a comment


I pro­vide a basic index­ing and retrieval code using the PyLucene 3.0 API.Lucene In Action (2nd Ed) cov­ers Lucene 3.0, but […]

Posted in installation, Lucene, machine learning, Natural language Processing, Text Processing | Tagged | Leave a comment

NiFi: Thinking Differently About DataFlow

Recently a question was posed to the Apache NiFi (Incubating) Developer Mailing List about how best to use Apache NiFi […]

Posted in Apache Nifi, Data Flow, DataTrace, installation | Tagged | Leave a comment

Apache Nifi (aka HDF) data flow across data center

Short Description: This article provides a step by step overview of how to setup cross data center data flow using […]

Posted in Apache Nifi, Data Center, Data Flow, Data Track, installation | Tagged | Leave a comment

Accurately Measuring Model Prediction Error

When assessing the quality of a model, being able to accurately measure its prediction error is of key importance. Often, […]

Posted in Error, installation, prediction | Tagged | Leave a comment


A time series is a sequence of data points, typically consisting of successive measurements made over a time interval. Forecasting […]

Posted in installation, kaggle, prediction, rossman, time-series | Tagged | Leave a comment

Getting Started with Markov Chains

There are number of R packages devoted to sophisticated applications of Markov chains. These include msm and SemiMarkov for fitting […]

Posted in installation, Markov Chains, R | Tagged | Leave a comment

Hadoop filesystem at Twitter

Twitter runs multiple large Hadoop clusters that are among the biggest in the world. Hadoop is at the core of […]

Posted in big data, hadoop, hdfs, installation, Twitter | Tagged | Leave a comment