Excel-DNA: Three Stories

Intro: Simulation

A couple of days ago I found a spreadsheet, potentially quite an interesting one. In theory, it should run a simple simulation (50000 paths by default) written in VBA - I’d say it’s several minutes of work in the worst case. However, it took slightly more. One working day, to be precise. I still have no idea what it tried to do, why it ate 97% of CPU and even what exactly it computed, because all that ended with a weird error and crashed Excel. The code is of unknown origin and protected, so no way to check it and I just gave up.

[Originally posted here.]

Now think about an average Excel user. He/she doesn’t care about how exactly it works, in which language written or how difficult it was for you to implement. What is important then?

  • functionality: everything what comes to mind can become a function;

  • simplicity: nobody wants to write VBA (and usually anything else too), but it’s always nice to have some useful UDFs at hand;

  • reliability: you have a well-tested library already, so why not to call it instead of rewriting in a poor language?

  • performance: why wait for a day when there’re highly optimized libraries and distributed computing is already invented?

For those who hasn’t tried it yet - check Excel-DNA. It’s an open-source project, which allows to integrate .NET into Excel. And yes, actually not only .NET ;) Documentation, samples, links to related posts can be found on the project page.

Spreadsheet

 

Keep Simple Things Simple

Let’s take log-linear interpolation as an example. Simple? Sure. We want to keep calculations clear in case if someone’d like to modify it - took 9 rows for me: functions like OFFSET and MATCH are not very friendly. And you still need to remember about sorting, extrapolation, duplicate values. What if there’re thousands of points?

Instead we call a UDF, which does all the work, it behaves in a specified way and can be reused in the future. I won’t describe how to create an addin - for more information check the project homepage. Briefly, in addition to a standard F# project we need a reference to ExcelDna.Integration.dll, interpolation.dna file with a reference to our library and a copy of ExcelDna.xll (x64 version is also available), renamed to interpolation.xll.

The future UDF has ExcelFunction attribute 1. For consistency with Excel function names it’s called LOGLINEAR. You can also add a description, category and so on. All parameters and output here are float arrays.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16:   

17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
open ExcelDna.Integration

[<AutoOpen>]
module Interpolation =

    let private interpolate curve (point: float) =
        let len = Array.length curve
        let u =
            if point < fst curve.[0] then 1
            elif point > fst curve.[len-1] then len-1
            else Array.findIndex (fun (v, _) -> point < v) curve
        let (xu, yu), (xd, yd) = curve.[u], curve.[u-1]
        let lnp = log yd + (log yu - log yd) / (xu - xd) * (point - xd)
        exp lnp

    [<ExcelFunction(Name="LOGLINEAR", Description="Log-Linear Interpolation",
Category="Custom", IsThreadSafe = true)>]
    let loglinear (xs: _[]) (ys: _[]) points =
        let curve =
            Seq.zip xs ys
            |> Seq.distinctBy fst
            |> Seq.sort
            |> Seq.toArray

        if curve.Length < 2 then failwith "at least 2 points are required"
        Array.map (interpolate curve) points

After loading the add-in in Excel (double-click on interpolation.xll in output folder), you can just type LOGLINEAR and see the results!2.

Excel-Interpolation

 

Make Complex Things Simple

Well, it’s cool, but I still can do this interpolation by hands - and it’ll work, you don’t need to be super-smart to do a couple of subtractions and multiplications without mistakes. But things are getting more interesting when a bunch of standard functions is not enough.

How about machine learning with Excel? Of course, it can be handy only for experiments with adequately small amounts of data. But for sure, not something I’d want to write in VBA.

A lot of my fellow traders use random forests for feature selection (it is actually what I like about RF most) before feeding the data into actual models, neural nets or anything else. So let’s take a look at rtp addin, which can help to find important features and get rid of useless (and potentially harmful?) ones. Important point: this example is for demo purposes only! I took some Yahoo stock prices - it’s not that easy to get good data for free, so can’t call it realistic; the same works for features3. If you want a bit more for free, there’re also two old kaggle competitions: Benchmark Bond Trade Price Challenge  and Algorithmic Trading Challenge  (the winners used RF).

Anyway, we are lucky because all data is numeric. Even more than that - we have F# R Type Provider and can use R’s randomForest package!

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
open RProvider
open RProvider.``base``
open RProvider.randomForest

open ExcelDna.Integration

[<ExcelFunction(Name = "IMPORTANCE", 
    Description = "Variable importance measures as produced by R's randomForest ")>]
let importance (names: obj[]) (values: float[,]) (ys: float[]) =
    let cols = min names.Length (values.GetLength 1)
    let rows = min ys.Length (values.GetLength 0)

    let xs = 
        names
        |> Seq.take cols
        |> Seq.mapi (fun j name ->
            string name, Array.init rows (fun i -> values.[i, j]))
        |> namedParams
        |> R.data_frame            

    let rf = R.randomForest(xs, ys)
    (R.importance rf).Value

Look at the R-Importance tab: we call the function {=IMPORTANCE(Names,Data,Result)} and get a nice set of values - the greater the value, the greener and more important it is. For example, P5 and P6 (close prices for day-5 and day-6 respectively) seem to be not very useful.

Spreadsheet

Adding features is pretty straightforward too. Say, we want to calculate Simple Moving Average with different offsets:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
[<ExcelFunction(Name = "SMA", Description="Simple Moving Average")>]
let sma (values: float[]) (nobj: float) =
    let n = int nobj
    let len = values.Length
    if len < n || n < 2 then box ExcelError.ExcelErrorNA
    else 
        let res = Array2D.zeroCreate len 1

        Seq.windowed n values
        |> Seq.map Seq.average
        |> Seq.iteri (fun i v -> res.[i+n-1, 0] <- box v)
        resize res

The interesting thing here is resize function: it allows you not to select the whole output range when calling a function, but just type =SMA(Values,5) in the top cell and result array is automatically resized. The code of these examples is available on github.

Excel-DNA makes it possible to bring all .NET power to the spreadsheets. Just try it ^_^

 

 

  1. see interpolation project here.

  2. as a reminder, array formulae are entered with Ctrl+Shift+Enter.

  3. and also my experience has nothing to do with trading.

How to Name a Cat

A fun conversation about mountains, monads, types and heels reminded me about the first University years and one of our favourite math jo...… Continue reading

Keywords Mix

Published on March 15, 2016

Traditions vs Statistics: Sechseläuten

Published on April 25, 2015