Building a Naive Bayes Classifier in the Browser using Map-Reduce
The last decade of Javascript performance improvements in the browser provide exciting possibilities for distributed computing. Like SETI and Folding@Home, client-side javascript could be used to...
View ArticleDruid – A Column Oriented Database
I attended a talk at Philly ETE by Metamarkets, a company doing real-time analytics for advertising. Having worked on a couple Oracle-based reporting projects, I entered with interest. Their system is...
View ArticlePhilly ETE – Database as a Value
This was the first time I’ve seen Rich Hickey’s talk on Datomic, which lent great clarity to the product. As implemented, Datomic functions as an immutable database for philosophical reasons, although...
View ArticleBuilding a Terabyte-scale Math Platform
Cliff Click, 0xdataClick represents 0xdata, which is building a system that can handle R-style analysis at a large speed/scale, aimed at companies that do advertising or credit card fraud detection,...
View ArticleGenerating Randomized Sample Data in Python
If you have access to a production data set, it is helpful to generate testing data which follows a similar format, in varying quantities. By introspecting a database, we can identify stated...
View ArticleTalk Summary: What is Acunu?
I recently attended a talk by a sales engineer for Acunu (http://www.acunu.com/), an analytics platform for Cassandra. I came away with a couple interesting notes: - The product aims to build data...
View ArticleDiscovering Senior Developers from Source Code History
In a software company that does consulting, it’s often valuable for engineers to look at a change and know if it was done for a particular client – for instance, if an API feature does not appear to be...
View ArticleTesting ETL Processes
ETL (“extract, transform, load”) come in many shapes, sizes, and product types, and occur under many names – “data migration” projects, business intelligence software, analytics, reporting, scraping,...
View ArticleAuditing Data Modifications in Postgres
Implementing Auditing Storing every change to an application’s database allows for sophisticated forensic analysis- usage trends over time, as a long-range debugger or for implementing data correction...
View ArticlePostgres: Time Travelling Debugger
Imagine you’re an engineer doing phone support for Netflix. The movies they show change regularly: There are various reasons for this – Netflix suddenly thinks you like period pieces, or they get into...
View Article