Due to Data Loss Prevention (aka DLP) some companies are requested to keep records of the traffic that enters and leaves his networks. The main issue nowdays is that is almost impossible to store all this traffic, per example lets assume a scenario of a 1Gigabit of inbound/outbound traffic link. This will generate something like 6.000 Terabytes of traffic. Obviously a very expensive storage is need.
The point is:
Is really need to store all this traffic? I don't think so.
The main point here is to extract only the valuable information from the traffic and then storage it. Some techonogies available today (all DPI based) uses complex data filtering algorithms to instead of store all layers of info, extract the necessary content and associate it with the Metadata needed. This will reduce storage and also will reduce processing.
Let's take by example the HTTP Traffic
HTTP is supported by many different applications (browsers, editors, generators, etc). But what really interests is the content itself and not all the data that comes togueter. So to reduce storage is necessary to extract this content, separating it from the application-level information that is of lesser interest, along with just the necessary meta-data (ip source, ip dest, time, etc.).
More than finantial and military institutions, all companies that deals with high confidential data or classified customers info will be requested to be compliant with Data Loss Prevention requirements.
Conclusion
In a DLP project it's always very important to understant wich type of traffic is being generated in the newtork. Understand exactly what needs to be stored (don't forget that you need to remain compliant with regulations), for how long this data need to be stored and them finaly start to specify the solution you need. This can solve time, problems and money.
More...