Relational Grid

Over the years working in risk systems for banks, the structure of what we have to do has always been the same.

  1. Load the input data for your calculations: "Give me all of the details of all of the trades that belong to book 'ABC'"
  2. Based on that input data, generate what you want to do: "For each of the following scenarios ('FTSE20%UP', 'FTSE20%DOWN', ...), perturb the market data appropriately and value the book"
  3. Split the stuff that you want to do into lots of separate "tasks": (e.g. Value trade '101452' for scenario 'FTSE20%UP')
  4. Farm out those tasks onto a "grid" of computers (thousands of them) so that they can be executed in parallel
  5. Store the results in a reporting database
  6. Generate reports based on what you got from the reporting database

There's third party software readily available to do step 4 for the most part.

But what I presented for steps 1, 2 and 3 are just examples. Usually, they are much more complicated - and people are always coming up with new stuff - due to changing market conditions, new trades, new types of market data (and hence new ways of "perturbing" them). All this means that you usually require some programmer to write code (and they usually have to write a new report at the end too). Because all of this takes too long, the trader usually just doesn't bother - and has to make do with some ad-hoc measure instead. There are only so many hours a day to perform "ad hoc" reports, and "what if" analysis.

If only you could drive the whole thing backwards from step 6, and have it automatically figure out steps 1 through 5...

That would dramatically decrease the turnaround time (merely a few days to generate some new risk measure or new scenario report).

That's what I'm working on at the moment... and I honestly think it's the coolest idea I've come up with in a long time (who wants to hear false modesty anyway!).

So you just type in something like:

select tradeId, tradevalue.value, scenario.id
from trades, tradevalue, scenario, marketdata
where trades.bookId in ('ABC')
and tradevalue.trade_xml = trades.trade_xml
and scenarios.market = marketdata.market
and marketdata.trade_xml = trades.trade_xml
and marketdata.valuation_date = 'today'
and tradevalue.market = scenarios.perturbed_market
and scenario.id in ('FTSE20%UP', 'FTSE20%DOWN')

and it would automatically execute your query on the grid of computers for you.

The trick is, many of the above "tables" aren't really tables - they're grid tasks. The "tradevalue" and "scenarios" tables above are not real tables in a database - it just looks that way. In reality, for each new "trade_xml, market" input combination, a new grid task is created. One way of looking at the "grid tasks" is that they are infinitely big tables - they have an infinite number of rows. Each "row" in that table is an execution of a grid task. But we always use them by "joining" to other tasks, in effect specifying the inputs to those tasks.

Also, just like how relational databases allow you to define "views" for commonly used sub-queries, and as an abstraction layer, you can compose grid tasks out of subtasks in a similar way. For example, if you were using montecarlo techniques to value your trades, your "tradevalue" task might really be another "join" and "group" operation over hundreds of different "sample paths":

define view tradevalue as
select average(pv)
from tradevalue_path
where tradevalue_path.seedpath > 1 and tradevalue_path.seedpath < 1000

(the above "view" is still infinite because we have not specified trade_xml yet).

By providing a library of "primitive" grid tasks, users can then write their own custom components or "views" and compose them together to quickly generate the sorts of answers they need to see - without needing to go through a 2 or 3 month release cycle while the new reports and "grid code" are written.

Of course, you wouldn't actually have to use SQL to define all this - any system for specifying joins, projections, grouping etc. will do. SQL is a bit too high level actually. I've used an XML format instead. But writing some sort of "SQL parser" on top wouldn't be hard. The hard part is the algorithm to get from your "relational algebra" representation to individual tasks to be performed on your grid.

What do you think? Comments and suggestions welcome.

This page last modified on