bd914 — Big Data Project 2014 - Foursquare project
Some additional repos to consider:
gandalf.api — Open-Source Decision Engine and Scoring Table for Big-Data.
Real-Big-Sample-Data — Mainly for performance research I would like to have a real real big dataset which covers all imaginably performance edge cases. some of them are: * many many categories whith more then 5 levels of depth. lets say 1k+ categories * configurable products with several options ( in the end 1 configurable = 10k+ simples {20×10×4×6×3}) * multiple websites/stores/storeviews ( should end in 1k storeviews ) Its ok(and suggested) if it ends in a script, which randomly creates this into files readable by an Importer, as this would speed up the generation and the importer can care in the end of optimize the saving to magento.
Little-Big-Data — Data describing topics ranging from Cars and Air Travel to Billionaires and Celebrities
PDS — Probabilistic Data Structures to efficiently analyze and mine big datasets

