My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Downloads
Wiki pages
Links

Welcome to STREAMS!

Introduction

Streams is a high availability, extremely fast, low resource usage real time log collection framework for terrabytes of data.

Links



News

  • version 0.2.3 Released
  • This version uses Zookeeper and ads allot of new features like heartbeat status, jquery ui, on disk full actions etc.

Project Aims

Streams main aims to

  • High Availability for big data log import
  • Maintain data correctness
  • Be scalable to terrabytes of data per day.
  • Provide integration with hadoop for importing data into hadoop hdfs.

Overview

Streams is inspired by Chukwa, an apache hadoop project for importing hadoop log data for monitoring of clusters. Streams aims to provided support for collecting application log data, i.e. not debug information but application logs such as Adserver Logs, Transactional Logs for banking etc.

These logs cannot afford any data loss, data corruption or row duplication. Files are normally in the terrabytes spread accross a cluster of servers. Streams is used to import these data to a smaller cluster 2,3 machines of collectors, then import the collector compressed data into HDFS.

Logs collected are partitioned per date,hour and size, allowing administrators to specify the chunk sizes of collected logs. e.g. Lets say we have log type A and we want to use this on a hadoop cluster for block size 128MB. Streams can import all logs for type A base on daydate and hour and in chunks more or less in 128MB size. This makes the files easier to process in M/R and allows non splittable compression formats to be used.

How to Contribute

The best way to start is to checkout the source code or rpm builds install and run them. Report bugs or new features on the Issues page.

People are encouraged to contribute bug fixes or features as patches. These patches will get reviewed and committed by the project committers.

Mailing Lists and User Groups

Users

Homepage: http://groups.google.com/group/streams-user
Group email: streams-user@googlegroups.com

Powered by Google Project Hosting