# Trying out Deedle with Bones and Regression

## November 02, 2013

I usually don’t need to run a regression anywhere, but it’s kind of chasing me recently, starting with the Asset Pricing class and several variations of returns regressions (signed up to look at the familiar things from a different point of view… well, I definitely succeeded: have you ever thought about drawing the returns, prices and discount factors in space, all at once? 1. But I ‘cheated’ and completed the assignments with R.

[Originally posted here.]

Though that was only the beginning - my cousin, MD student, was measuring the deflection of bones and other samples with different loads. And this time I decided to try out Deedle and help her to explore the data.

Hint for Mono users: if XS doesn’t load the main Deedle project, you can manually update the fsproj file (delete the reference to FSharp.Core), reload it, add references to Math.NET and FSharp.Data libs - it’ll work nicely.

  1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:  #r "FSharp.Data.dll" #load "Deedle.fsx" open System open Deedle // load data from a csv file let ds = Frame.ReadCsv(__SOURCE_DIRECTORY__ + "/deflection.csv") val ds : Frame = p sample type length width height deflection 0 -> 0.98 bone 47 7.3 4.4 0.06 1 -> 1.96 bone 47 7.3 4.4 0.12 

Then we checked out some properties of different samples groups - say, the average sample deformation. For simplicity we’ll use only the bones group, load (“p”) and deflection columns.

  1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27:  // choose several columns and group the data by sample type let bySample = ds.Columns.[["sample type"; "p"; "deflection"]] |> Frame.groupRowsByString "sample type" val bySample : Frame<(string * int),string> = sample type p deflection bone 0 -> bone 0.98 0.06 1 -> bone 1.96 0.12 ... ... ... ... duralumin 6 -> duralumin 0.98 0.03 ... ... ... ... // average deflection by sample type bySample.Columns.[["sample type"; "deflection"]] |> Frame.meanLevel Pair.get1Of2 val it : Frame = deflection bone -> 0.228333333333333 duralumin -> 0.116666666666667 ... -> ... // select the data for bones let bones = (Frame.nest bySample).["bone"] val bones : Frame = sample type p deflection 0 -> bone 0.98 0.06 1 -> bone 1.96 0.12 ...-> ... ... ... 

You may notice that some of the values in the table are missing (the handwriting can be completely unparsable!), by default they are omited, but we can always specify how we want this data to be filled using Direction or a custom function.

  1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:  // note that there're missing values in this dataset let deflections = bones?deflection val deflections : Series = 0 -> 0.06 1 -> 0.12 ...-> ... 12 -> Series.mean deflections val it : float = 0.2283333333 // omit missing values deflections |> Series.dropMissing |> Series.mean val it : float = 0.2283333333 // fill missing values by copying forward deflections |> Series.fillMissing Direction.Forward |> Series.mean val it : float = 0.2542857143 

Now let’s check if there’s any relation between the deflection and load. In theory, it’s supposed to be linear and we’re going to test that with a linear regression.

  1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:  /// Find slope and intercept with linear regression let linearRegression xs ys = (...) // drop rows with missing values let bonesreg = Frame.dropSparseRows bones val bonesreg : Frame = sample type p deflection 0 -> bone 0.98 0.06 ... -> ... ... ... 5 -> bone 5.88 0.41 let load = Series.values bonesreg?p let defl = Series.values bonesreg?deflection let slope, intercept = linearRegression load defl val slope : float = 0.07142857143 val intercept : float = -0.01666666667 

Does this line make a good fit? Here is a chart with a couple of samples from this dataset.

On the other hand a classical metric like R^2 can help to answer this question too, especially when it’s extremely simple to add a new column to the dataframe and perform some operations.

The new library is tried out, the lab is completed - everyone is happy ^_^

  1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27:  bonesreg?prediction <- intercept + slope * bonesreg?p bonesreg?residualsq <- bonesreg?prediction - bonesreg?deflection |> Series.mapValues (fun x -> x*x) bonesreg val it : Frame = sample type p deflection prediction residualsq 0 -> bone 0.98 0.06 0.0533333333333334 4.44444444444441E-05 1 -> bone 1.96 0.12 0.123333333333333 1.11111111111111E-05 ...-> bone ... ... ... ... let sdvs = Frame.sdv bonesreg val sdvs : Series = sample type -> p -> 1.83341211951923 deflection -> 0.131059782796503 prediction -> 0.130958008537088 residualsq -> 1.72132593164778E-05 // compute the metrics: let rsquare = let x = sdvs.["prediction"] / sdvs.["deflection"] in x * x val rsquare : float = 0.9984475063 let df = Frame.countRows bonesreg - 2 |> float val df : float = 4.0 let tvalue = sqrt (rsquare / (1. - rsquare) * df) val tvalue : float = 50.71981861 let se = (Series.sum bonesreg?residualsq) / df |> sqrt val se : float = 0.005773502692 

Yes, it’s that simple.

namespace System
namespace Deedle
val ds : Frame<int,string>

Full name: Regression.ds

Multiple items
module Frame

from Deedle

--------------------
type Frame<'TRowKey,'TColumnKey (requires equality and equality)> =
interface IDynamicMetaObjectProvider
interface INotifyCollectionChanged
interface IFsiFormattable
interface IFrame
new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
private new : rowIndex:IIndex<'TRowKey> * columnIndex:IIndex<'TColumnKey> * data:IVector<IVector> -> Frame<'TRowKey,'TColumnKey>
member AddSeries : column:'TColumnKey * series:ISeries<'TRowKey> -> unit
member AddSeries : column:'TColumnKey * series:seq<'V> -> unit
member AddSeries : column:'TColumnKey * series:ISeries<'TRowKey> * lookup:Lookup -> unit
member AddSeries : column:'TColumnKey * series:seq<'V> * lookup:Lookup -> unit
...

Full name: Deedle.Frame<_,_>

--------------------
type Frame =
static member CreateEmpty : unit -> Frame<'R,'C> (requires equality and equality)
static member FromColumns : cols:Series<'TColKey,Series<'TRowKey,'V>> -> Frame<'TRowKey,'TColKey> (requires equality and equality)
static member FromColumns : cols:Series<'TColKey,ObjectSeries<'TRowKey>> -> Frame<'TRowKey,'TColKey> (requires equality and equality)
static member FromColumns : columns:seq<KeyValuePair<'ColKey,ObjectSeries<'RowKey>>> -> Frame<'RowKey,'ColKey> (requires equality and equality)
static member FromColumns : columns:seq<KeyValuePair<'ColKey,Series<'RowKey,'V>>> -> Frame<'RowKey,'ColKey> (requires equality and equality)
static member FromColumns : rows:seq<Series<'ColKey,'V>> -> Frame<'ColKey,int> (requires equality)
static member FromRecords : values:seq<'T> -> Frame<int,string>
static member FromRecords : series:Series<'K,'R> -> Frame<'K,string> (requires equality)
static member FromRowKeys : keys:seq<'K> -> Frame<'K,string> (requires equality)
static member FromRows : rows:Series<'TColKey,Series<'TRowKey,'V>> -> Frame<'TColKey,'TRowKey> (requires equality and equality)
...

Full name: Deedle.Frame

--------------------
new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>

static member Frame.ReadCsv : path:string * ?hasHeaders:bool * ?inferTypes:bool * ?inferRows:int * ?schema:string * ?separators:string * ?culture:string -> Frame<int,string>
static member Frame.ReadCsv : stream:IO.Stream * ?hasHeaders:bool * ?inferTypes:bool * ?inferRows:int * ?schema:string * ?separators:string * ?culture:string -> Frame<int,string>
val bySample : Frame<(string * int),string>

Full name: Regression.bySample

property Frame.Columns: ColumnSeries<int,string>
val groupRowsByString : column:'a -> frame:Frame<'b,'a> -> Frame<(string * 'b),'a> (requires equality and equality)

Full name: Deedle.Frame.groupRowsByString

property Frame.Columns: ColumnSeries<(string * int),string>
val meanLevel : keySelector:('R -> 'a) -> frame:Frame<'R,'C> -> Frame<'a,'C> (requires equality and equality and equality)

Full name: Deedle.Frame.meanLevel

module Pair

from Deedle

val get1Of2 : v:'a * 'b -> 'a

Full name: Deedle.Pair.get1Of2

val bones : Frame<int,string>

Full name: Regression.bones

val nest : frame:Frame<('R1 * 'R2),'C> -> Series<'R1,Frame<'R2,'C>> (requires equality and equality and equality)

Full name: Deedle.Frame.nest

val deflections : Series<int,float>

Full name: Regression.deflections

Multiple items
module Series

from Deedle

--------------------
type Series<'K,'V (requires equality)> =
interface IFsiFormattable
interface ISeries<'K>
new : pairs:seq<KeyValuePair<'K,'V>> -> Series<'K,'V>
new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
new : index:IIndex<'K> * vector:IVector<'V> * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder -> Series<'K,'V>
member Aggregate : aggregation:Aggregation<'K> * observationSelector:Func<DataSegment<Series<'K,'V>>,KeyValuePair<'TNewKey,OptionalValue<'R>>> -> Series<'TNewKey,'R> (requires equality)
member Aggregate : aggregation:Aggregation<'K> * keySelector:Func<DataSegment<Series<'K,'V>>,'TNewKey> * valueSelector:Func<DataSegment<Series<'K,'V>>,'R> -> Series<'TNewKey,'R> (requires equality)
member Append : otherSeries:Series<'K,'V> -> Series<'K,'V>
member AsyncMaterialize : unit -> Async<Series<'K,'V>>
override Equals : another:obj -> bool
...

Full name: Deedle.Series<_,_>

--------------------
type Series =
static member ofNullables : values:seq<Nullable<'a0>> -> Series<int,'a0> (requires default constructor and value type and 'a0 :> ValueType)
static member ofObservations : observations:seq<'a0 * 'a1> -> Series<'a0,'a1> (requires equality)
static member ofOptionalObservations : observations:seq<'K * OptionalValue<'a1>> -> Series<'K,'a1> (requires equality)
static member ofValues : values:seq<'a0> -> Series<int,'a0>

Full name: Deedle.FSharpSeriesExtensions.Series

--------------------
new : pairs:seq<Collections.Generic.KeyValuePair<'K,'V>> -> Series<'K,'V>
new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
new : index:Indices.IIndex<'K> * vector:IVector<'V> * vectorBuilder:Vectors.IVectorBuilder * indexBuilder:Indices.IIndexBuilder -> Series<'K,'V>

val mean : series:Series<'K,'V> -> 'V (requires equality and member ( + ) and member DivideByInt and member get_Zero)

Full name: Deedle.Series.mean

val dropMissing : series:Series<'K,'T> -> Series<'K,'T> (requires equality)

Full name: Deedle.Series.dropMissing

val fillMissing : direction:Direction -> series:Series<'K,'T> -> Series<'K,'T> (requires equality)

Full name: Deedle.Series.fillMissing

type Direction =
| Backward = 0
| Forward = 1

Full name: Deedle.Direction

Direction.Forward: Direction = 1
val linearRegression : xs:seq<float> -> ys:seq<float> -> float * float

Full name: Regression.linearRegression

Find slope and intercept with linear regression

val xs : seq<float>
val ys : seq<float>
let x, y, xx, xy, n =
Seq.zip xs ys
|> Seq.fold (fun (xsum, ysum, xxsum, xysum, n) (x, y) ->
xsum+x, ysum+y, xxsum+x*x, xysum+x*y, n+1.) (0.,0.,0.,0.,0.)
let slope = (n * xy - x * y) / (n * xx - x * x)
let intercept = (y - slope * x) / n
slope, intercept
val bonesreg : Frame<int,string>

Full name: Regression.bonesreg

val dropSparseRows : frame:Frame<'R,'C> -> Frame<'R,'C> (requires equality and equality)

Full name: Deedle.Frame.dropSparseRows

val values : series:Series<'K,'T> -> seq<'T> (requires equality)

Full name: Deedle.Series.values

val defl : seq<float>

Full name: Regression.defl

val slope : float

Full name: Regression.slope

val intercept : float

Full name: Regression.intercept

val mapValues : f:('T -> 'R) -> series:Series<'K,'T> -> Series<'K,'R> (requires equality)

Full name: Deedle.Series.mapValues

val x : float
val sdvs : Series<string,float>

Full name: Regression.sdvs

val sdv : frame:Frame<'R,'C> -> Series<'C,float> (requires equality and equality)

Full name: Deedle.Frame.sdv

val rsquare : float

Full name: Regression.rsquare

val df : float

Full name: Regression.df

val countRows : frame:Frame<'R,'C> -> int (requires equality and equality)

Full name: Deedle.Frame.countRows

Multiple items
val float : value:'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

--------------------
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>

--------------------
type float = Double

Full name: Microsoft.FSharp.Core.float

val tvalue : float

Full name: Regression.tvalue

val sqrt : value:'T -> 'U (requires member Sqrt)

Full name: Microsoft.FSharp.Core.Operators.sqrt

val se : float

Full name: Regression.se

val sum : series:Series<'K,'V> -> 'V (requires equality and member ( + ) and member get_Zero)

Full name: Deedle.Series.sum

1. to be honest, still don’t get why anyone would need that, maybe that’s the point where being a PhD helps?

### How to Name a Cat

A fun conversation about mountains, monads, types and heels reminded me about the first University years and one of our favourite math jo...… Continue reading

#### Keywords Mix

Published on March 15, 2016