Louis Lovas
Mon, 05 Nov 2007 20:35:07 -0500
Bending the Nail<spanstyle='font-size:10.0pt;font-family:Verdana'>In my recent blogs(
Whenall you have is a hammer everything looks like a nail and <ahref="http://apama.typepad.com/my_weblog/2007/10/hitting-the-nai.html">Hittingthe nail on the head) one could conclude that I've been overly inflammatoryto SQL-based CEP products. I really have no intention to be seditious, justsimply factual. I've been building and designing software far too long to havean emotional response to newly hyped products or platforms. Witnessing both myown handy work and many compelling technologies fade from glory all too soonhas steeled me against any fervent reactions. I've always thought </span><spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'>thesoftware business is for the young. As with most professions one gets harden byyears of experience, some good, and some only painful lessons. Nonetheless,over time that skill, knowledge and experience condenses. The desire to sharethat (hard-won) wisdom is all too often futile. The incognizant young are too busyrepeating the same mistakes to take notice. Funny thing is ... wisdom is anaffliction that inevitably strikes us all. </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'> </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'>Well,enough of the philosophical meanderings just the facts please ... </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'> </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'>In arecent blog, I explored the need for CEP-based applicationsto manage
state. As a representativeexample, I used the algo-trading example of managing <istyle='mso-bidi-font-style:normal'>Orders-in-Market</i>. The need to processboth incoming market data and take care of
Ordersplaced is paramount to the goals of algorithmic trading.<spanstyle='mso-spacerun:yes'> </span>I'll delve a bit deeper into the statemanagement requirement but this time focusing on the management of complexmarket data, the
input if you will tothe algorithmic engine. Aggregation of market data is a trend emerging acrossall asset classes in Capital Markets. Simply put, Aggregation is the process of collecting and ordering quotedata (bids & asks) from multiple sources of liquidity into a consolidatedOrder Book. In the end, this is a classic sort/merge problem. Incoming quotesare dissected and inserted into a cache organized by symbol, Exchange and/ormarket maker and sorted Bid and Ask prices. Aggregation of market data is applicable to many asset classes (i.e.Equities, Foreign Exchange and Fixed Income). The providers of liquidity in anyasset class share a number of common constructs but an equal number of uniqueoddities. For the Aggregation engine, there are also common requirements (i.e.sorting/merging) and a few unique nuances. It's the role of the Aggregationengine to understand each provider's subtleties and normalize them for theconsuming audience. For example, different Equities Exchanges (or Banksproviding FX liquidity) can use slightly different symbol naming conventions.Likewise, transaction costs can (or should) have an influence on the quoteprices. Many FX providers put a
time-to-live(TTL) on their streaming quotes, which implies theAggregation engine has to handle price expirations (and subsequently eject themfrom its cache). In the event of a network (or other) disconnection, the cachemust be cleansed of that provider's (now stale) prices. The Aggregation enginemust account for these (and a host of other needs) since its role is to providea single unified view of an Order Book to trading applications.<spanstyle='mso-spacerun:yes'> </span>The trading applications can be on both sidesof the equation. A typical Buy-side consumer is a Best Execution <spanclass=SpellE>algo</span>. Client orders or Prop desk orders are filled bysweeping the aggregate book from the top. For Market Makers, aggregation can bethe basis for a
Request <spanclass=GramE>For</span> Quote (RFQ) system.</span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'> </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'> </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'>Atfirst glance, one would expect that SQL-based CEP engines would be able tohandle this use-case effectively. After all sorting and merging (joining) is acommon usage of SQL in the database world and streaming SQL does provide Joinand Gather type operators. However, the complexities of an Aggregation modelquickly outstrip the use of SQL as an efficient means of implementation. Themodel requires managing/caching a complex multi-dimensional data structure. Foreach symbol, multiple arrays of a price structure are necessary, one for thebid side another for the ask side. Each element in the price structure wouldinclude total quantity available at
thisprice and a list of Providers. Each Provider entry in turn, ends up being acomplex structure in itself since it would include any symbol mapping,transaction costing, expiration and connectivity information. At the top levelof the aggregate book would be a summation of the total volume available (persymbol of course). Algo's more interested in completeorder fulfillment (i.e.
fill-or-kill)would want this summary view. </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'> </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'>Usingstream SQL to attempt to accomplish this would mean flattening this logicalmulti-dimension
object into therow/column format of a SQL table. SQL tables can contain only scalar values;multidimensional-
nesscan only be achieved by employing multiple tables. I don't mean to imply thisis undesirable or illogical. Initially it seems like a natural fit.<spanstyle='mso-spacerun:yes'> </span>However, an Aggregated Book is more than justit's structure, but as I mentioned above, a wealth of processing logic. In theend one would be bending the SQL language to perform unnatural acts in anyattempt to implement this complex use-case.</span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'> </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'>Toillustrate an unnatural act, here's a very simple streamSQLexample. The purpose of this bit of code is to increment an integercounter, (<istyle='mso-bidi-font-style:normal'>
TradeID</i><istyle='mso-bidi-font-style:normal'>
= <spanclass=SpellE>TradeID</span> + 1</i>) on receipt of every tick (<spanclass=SpellE>
TradesIn</span>) event and producea new output stream of ticks (
Trades_with_ID)that now includes that integer counter - a
trade identifier of sorts.</span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'> </span>
CREATE INPUT STREAM TradesIn (
Symbol string(5),
Volume <spanclass=SpellE>int</span>,
Price double
);
CREATE MEMORY TABLE TradeIDTable (
TradeIDint,
RowPointerint,
PRIMARY <spanclass=GramE>KEY(</span>RowPointer) USING <spanclass=SpellE>btree</span>
);
CREATE STREAM Trades_with_ID;
INSERT INTO TradeIDTable (<spanclass=SpellE>RowPointer</span>, TradeID)
SELECT 1 AS RowPointer, 0 AS <spanclass=SpellE>TradeID</span>
FROM TradesIn
ON DUPLICATE KEY UPDATE
TradeID<spanstyle='font-size:9.0pt;font-family:"Courier New";color:navy'> = TradeID+1</span>
RETURNING TradesIn.SymbolAS Symbol, TradesIn.Volume ASVolume, TradesIn.PriceAS Price, TradeIDTable.TradeID AS TradeID
INTO Trades_with_ID;
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'> </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'>The <istyle='mso-bidi-font-style:normal'>state</i> to manage and the processing logicin this small stream SQL snippet is no more than incrementing an integercounter (i.e.
i<istyle='mso-bidi-font-style:normal'> = i + 1</i>). Inorder to accomplish this very simple task a memory table (<spanstyle='color:navy'>TradeIDTable</span>) is used to INSERT and thenSELECT a single row (</span>
<spanstyle='font-size:9.0pt;font-family:"Courier New";color:navy'>1 AS <spanclass=SpellE>RowPointer</span></span>) that contains thatincrementing integer (<spanstyle='font-size:9.0pt;font-family:"Courier New";color:navy'>ON DUPLICATE KEYUPDATE TradeID = TradeID +1</span>) when a new TradesInevent in received. In a way, a rathercreative use of SQL don't you think? However, simply extrapolate the staterequirements beyond <spanstyle='color:navy'>TradeID</span><spanstyle='color:navy'> int</span> and the processinglogic beyond <spanstyle='color:navy'>TradeID</span><spanstyle='color:navy'> = TradeID + 1</span> and youquickly realize you would be bending the language to the point of breaking.
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'> </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'> </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'>In the commercialapplication world, relational databases are an entrenched and integralcomponent. SQL is the language for applications to interact with thosedatabases. As applications have grown incomplexity, the data needs have also grown in complexity. One outgrowth of thisis a new breed of application service known as Object-Relational (O/R) mapping.O/R mapping technologies have emerged to fill the <ahref="http://en.wikipedia.org/wiki/Object-Relational_impedance_mismatch">impedancemismatch between an application's object view of data and SQL's flattwo-dimensional view. A wealth of <ahref="http://en.wikipedia.org/wiki/List_of_object-relational_mapping_software">O/Rproducts are available today so the need for such technologiesclearly exists. </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'> </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'>Why amI mentioning O/R technologies in a CEP blog? <spanclass=GramE>Simply to emphasize the point that the SQL language, as validatedby the very existence of O/R technologies in the commercial space, is a poorchoice for CEP applications.</span> As I've mention in previous <spanclass=SpellE>blogs</span>, programming languages that provide the vernacular toexpress both complex structures (objects) and complex semantics (programminglogic) are as necessary for Aggregation as they are for Orders-in-Market or anyCEP application. </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'> </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'>So whatsort of language is appropriate for CEP? Well, there is always the choice ofJava or C++. </span>
Using traditional languages such as Java and C++clearly provide this expressiveness and can be used to build applications inany domain. However, trailing along behind that expressiveness is also risk.Using these languages means you start an application's implementation at thebottom rung of the ladder. The risk associated with this is evident in many afailed project. A step up is Domain-specificlanguages. For the domain of streaming data, Event Programming Languages (<spanclass=SpellE>EPL's</span>) are clear winners. Like C++ and Java they containsyntax for defining complex objects (like an Aggregated Order Book) andimperative execution but they also include a number of purposed declarativeconstructs specifically designed to process streaming data efficiently. <ahref="http://www.progress.com/apama/apama_esp/index.ssp">Apama'sMonitorScript is one such EPL.
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Verdana'> </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'> </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'> </span>
<spanstyle='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'> </span>
Source...