This is just a problem with our current system. We have a distributed number of stations all around the country gathering information and sending to our database. number stations are about 300 and our database server is Postgresql 8.4 running on Debian 6.(two quad core xeon with a single 1TB hard disk). because it doesn't scale very good we decided to change our architecture. because of lots of relational and non relational systems out there, I have problem choosing for our needs. here is our challenges:
- current system is generating 4 million records per day. we are planning to go beyond 1000 stations. we need a product that could handle 2000+ connections per second on our current server in next 5 years.
- we do not need much tests on data validity.(I think we need NoSQL here). stations don't even check that data is delivered or not. they just send and pass.
- currently we are using triggers to update our report tables based on input data. the problem is after a month we got bloat indexes which need many maintenance tasks. Is there any trigger like mechanisms that doesn't cause bloat indexes or we should forget real time report generations and use OLAP products?
- the DBMS should handle at maximum 5TB of data on a single table. on our current system because it gets high load, we delete past data (past two or three month). thus we can not take reports from data that belongs to last year.
- we doesn't need much ACID features. we need simple insert and select. We don not have two phase commit in our systems. we need extreme fast inserts.
- Is table partitioning good for our problem? (we categorize our data based on date)
- As I said we need real time reports.(when I say real time I mean if reports is for yesterday, it doesn't have problem). Are there any open source OLAP product for direct feed?
- we do not need much high availability that force us use cluster.