We learn more from code, and from great code. Not necessarily always the 1st ranking solution, because we also learn […]
Apache Kafka is a highly-scalable publish-subscribe messaging system that can serve as the data backbone in distributed applications. With Kafka’s […]
More sample code: https://github.com/fnp/pylucene/tree/master/samples/LuceneInAction Sample code import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.*; […]
I provide a basic indexing and retrieval code using the PyLucene 3.0 API.Lucene In Action (2nd Ed) covers Lucene 3.0, but […]
Recently a question was posed to the Apache NiFi (Incubating) Developer Mailing List about how best to use Apache NiFi […]
Short Description: This article provides a step by step overview of how to setup cross data center data flow using […]
When assessing the quality of a model, being able to accurately measure its prediction error is of key importance. Often, […]
A time series is a sequence of data points, typically consisting of successive measurements made over a time interval. Forecasting […]
There are number of R packages devoted to sophisticated applications of Markov chains. These include msm and SemiMarkov for fitting […]
Twitter runs multiple large Hadoop clusters that are among the biggest in the world. Hadoop is at the core of […]