|
Project Information
|
Python and ZeroMQ based log processorpylogprocessor is yet another effort to build a flexible and scalable logging platform. It is programmed in Python, and uses ZeroMQ as the transport between components. Easy to extend- Get data from different sources (Readers).
- Use custom parsers (Parsers).
- Keep state for multiline logs using different backends (shelve, redis, mongodb, ...) (State Stores).
- Store records into files, Cassandra, Hadoop, ... (Writers).
- Program your own Readers, Parsers, State Stores and Writers.
Distributed- Use ZeroMQ for messaging between components.
- Use 1-N Parsers, Readers, State Stores or Writers.
- Use tcp, rpc or any other transport provided by ZeroMQ.
ReadersReaders are used to read records (lines, binary chunks, ...). Every record is then forwarded to the Parser, untouched, using a push socket. This socket uses round-robin to deliver messages to connected Parsers. The goal of Readers is to pick data from files, 'tail -f files', rsyslog+zeromq servers, or any other source. ParsersParsers extract tokens from the Readers and build simple messages with the collected information. These messages are then forwarded through a push socket (round-robin) to State Stores. State StoresThe need of this component arises from multi-line log files. The goal is to get all the information about a record (many lines eventually), and to compose "the full picture" about it. Only after a record is complete will the State Store send it to the Writer. State Stores should keep unfinished records in memory, in helper files, databases or anywhere else until they are complete (or expire) WritersWriters get data from pull sockets (the "other end" of push sockets). These get full records and store them into files, databases, NoSQL applications, Hadoop, ... Known limitationsMany! Just to mention two: - First, and most important, this is pre-alpha software. It's not finished (actually, almost "not started" yet).
- While aiming to be a fully distributed system, it is not at this moment. I should elaborate on this further (TODO), but you can't guarantee full distribution with push/pull sockets and with multi-line records unless you manage to send all the parsed information (N Parsers) for a record to a single State Store, and unless this feed is ordered; and this is far from trivial (or better, not programmed yet). This means that I should probably change simple push/pull sockets to a way more complex pattern.
Nevertheless, current functionallity allows you to use n Readers and Writers (depending on the backends), and this is often the real bottleneck of a log system.
UsageThe application uses a yaml file to launch Readers, Parsers, State Stores and Writers. These can be individual processes that communicate through rpc on a single machine, or you may launch several of these on many servers and use tcp; ZeroMQ is the key for this flexibility. Given the "full distribution" issue, I would recommend the following: - Single-line log files: N Readers (one per file) + M Parsers + L State Stores + Z Writers (depending on the backend, of course. You may only use 1 Writer to write to a file, but you may use Z Writers for 1 Mysql backend)
- Multi-line log files: One Reader + 1 Parser + 1 State Store + N Writers (1 if writing to a regular file)
|