Quantcast
Channel: Gary Sieling » big data
Browsing all 12 articles
Browse latest View live

Installing CouchDB on a VirtualBox instance with Chef and Vagrant

This assumes you’ve already installed Virtualbox and configured a base VM. mkdir cookbooks cd cookbooks git clone https://github.com/opscode-cookbooks/apt git clone...

View Article



Image may be NSFW.
Clik here to view.

Building a JSON webservice in R

R is a programming language for mathematics and statistics. There are several R libraries available to support web development, including rjson and RJSONIO (note case – R library names are case...

View Article

Building a Naive Bayes Classifier in the Browser using Map-Reduce

The last decade of Javascript performance improvements in the browser provide exciting possibilities for distributed computing. Like SETI and Folding@Home, client-side javascript could be used to...

View Article

Druid – A Column Oriented Database

I attended a talk at Philly ETE by Metamarkets, a company doing real-time analytics for advertising. Having worked on a couple Oracle-based reporting projects, I entered with interest. Their system is...

View Article

Image may be NSFW.
Clik here to view.

Philly ETE – Database as a Value

This was the first time I’ve seen Rich Hickey’s talk on Datomic, which lent great clarity to the product. As implemented, Datomic functions as an immutable database for philosophical reasons, although...

View Article


Building a Terabyte-scale Math Platform

Cliff Click, 0xdataClick represents 0xdata, which is building a system that can handle R-style analysis at a large speed/scale, aimed at companies that do advertising or credit card fraud detection,...

View Article

Generating Randomized Sample Data in Python

If you have access to a production data set, it is helpful to generate testing data which follows a similar format, in varying quantities. By introspecting a database, we can identify stated...

View Article

Talk Summary: What is Acunu?

I recently attended a talk by a sales engineer for Acunu (http://www.acunu.com/), an analytics platform for Cassandra. I came away with a couple interesting notes: - The product aims to build data...

View Article


Image may be NSFW.
Clik here to view.

Discovering Senior Developers from Source Code History

In a software company that does consulting, it’s often valuable for engineers to look at a change and know if it was done for a particular client – for instance, if an API feature does not appear to be...

View Article


Testing ETL Processes

ETL (“extract, transform, load”) come in many shapes, sizes, and product types, and occur under many names – “data migration” projects, business intelligence software, analytics, reporting, scraping,...

View Article

Image may be NSFW.
Clik here to view.

Auditing Data Modifications in Postgres

Implementing Auditing Storing every change to an application’s database allows for sophisticated forensic analysis- usage trends over time, as a long-range debugger or for implementing data correction...

View Article

Image may be NSFW.
Clik here to view.

Postgres: Time Travelling Debugger

Imagine you’re an engineer doing phone support for Netflix. The movies they show change regularly: There are various reasons for this – Netflix suddenly thinks you like period pieces, or they get into...

View Article
Browsing all 12 articles
Browse latest View live




Latest Images