Streaming SQL Approaches Insist in Ignoring Causality by PatternStorm


 
Thread Tools Search this Thread
Special Forums News, Links, Events and Announcements Complex Event Processing RSS News Streaming SQL Approaches Insist in Ignoring Causality by PatternStorm
# 1  
Old 09-05-2008
Streaming SQL Approaches Insist in Ignoring Causality by PatternStorm

Tim Bass
09-05-2008 07:25 AM
The following excellent discussion is reposted from Streaming SQL approaches insist in ignoring causality*by PatternStorm.

The recent paper “Towards a Streaming SQL Standard” by Oracle and Streambase unifies and generalizes two different execution models of Streaming SQL: Oracle’s and StreamBase’s.

While it’s true that the generalization succeeds in overcoming the unability of both execution models of producing correct results for astonishing simple queries (showing evidence of the actual limitations of these two Streaming SQL languages) it is also true that the generalization is closer to being overly complex than natural and intuitive.

The root cause behind the actual limitations of these two Streaming SQL languages is that their execution models “hardcode” the way events can be related to each other: in the Oracle case events are partially ordered by timestamp, in the StreamBase case events are totally ordered by time of arrival. These design decisions (natural in a stream oriented lamguage) have strong implications on what queries can be answered correctly, particularly when these queries involve joins of derived streams.

The generalization, of course, mainly consists in providing a new operator that allows the user to establish custom ordering relationships among the events (the SPREAD operator), which is good news but takes us to the fundamental issue: event processing cannot be reduced to stream processing, that is, to the processing of events that are totally or partially ordered by a pre-defined relationship (as Oracle and StreamBase actual implementations do), on the contrary, no particular ordering can be assumed because the user needs to be able to order the events in different ways in order to solve different problems. This is what event processing is about and the paper provides evidence that Streaming SQL approaches have found the need to move towards that direction and are having trouble in their way.

For instance, one of the queries used in the paper as an example of a query that StreamBase cannot solve (but Oracle can) is the following: correlate the stream that contains the total number of cars on the road for each time interval with the stream that contains the total average speed of the cars on the road for each time interval in order to detect the situation where the avergae speed is below 45 and the total number of cars is two or more. This query can be very easily and more robustly solved if you order the events by causality rather than by time, that is, if you have each position report update the average speed stream and the total number of cars stream and then you causally relate each position report to the new average speed event and the new total number of cars event that it generates; then the query is just a matter of detecting all report speeds that are causally related both to an average speed event below 45 and a total number of cars event of two or more (notice that this approach is more robust than Oracle’s time-based one because it works without requiring derived streams to be synchronized with the report speed stream)

Conclusions:
  • Event Processing is a generalization of Stream Processing (as the paper shows)
  • Event Processing requires providing the ability to the user of creating custom relationships among events and then define patterns/queries using those custom relationships.
  • Causality is more often than not a more robust and easier criteria to order events than time or order of arrival.
  • Event Processing Languages should support causality.
Regards,
PatternStorm



Source...
Login or Register to Ask a Question

Previous Thread | Next Thread
Login or Register to Ask a Question