I usually don’t need to run a regression anywhere, but it’s kind of chasing me recently, starting with the Asset Pricing class and several variations of returns regressions (signed up to look at the familiar things from a different point of view… well, I definitely succeeded: have you ever thought about drawing the returns, prices and discount factors in space, all at once? 1. But I ‘cheated’ and completed the assignments with R.

[Originally posted here.]

Though that was only the beginning - my cousin, MD student, was measuring the deflection of bones and other samples with different loads. And this time I decided to try out Deedle and help her to explore the data.

Hint for Mono users: if XS doesn’t load the main Deedle project, you can manually update the fsproj file (delete the reference to FSharp.Core), reload it, add references to Math.NET and FSharp.Data libs - it’ll work nicely.

Let’s start with loading experimental data. No more string splits and manual parsing!

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
#r "FSharp.Data.dll"
#load "Deedle.fsx"

open System
open Deedle
// load data from a csv file
let ds = Frame.ReadCsv(__SOURCE_DIRECTORY__ + "/deflection.csv")
val ds : Frame<int,string> = 
        p         sample type length<mm> width<mm> height<mm> deflection<mm> 
  0  -> 0.98      bone        47         7.3       4.4        0.06 
  1  -> 1.96      bone        47         7.3       4.4        0.12 

Then we checked out some properties of different samples groups - say, the average sample deformation. For simplicity we’ll use only the bones group, load (“p”) and deflection columns.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16:   
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
// choose several columns and group the data by sample type
let bySample = 
    ds.Columns.[["sample type"; "p"; "deflection<mm>"]] 
    |> Frame.groupRowsByString "sample type"
val bySample : Frame<(string * int),string> =
                  sample type p         deflection<mm> 
  bone      0  -> bone        0.98      0.06           
            1  -> bone        1.96      0.12           
  ...      ...    ...         ...                      
  duralumin 6  -> duralumin   0.98      0.03           
  ...      ...    ...         ...                      

// average deflection by sample type
bySample.Columns.[["sample type"; "deflection<mm>"]] |> Frame.meanLevel Pair.get1Of2
val it : Frame<string,string> =
               deflection<mm>    
  bone      -> 0.228333333333333 
  duralumin -> 0.116666666666667 
  ...       -> ...               

// select the data for bones
let bones = (Frame.nest bySample).["bone"]
val bones : Frame<int,string> =
       sample type p         deflection<mm> 
 0  -> bone        0.98      0.06           
 1  -> bone        1.96      0.12           
 ...-> ...         ...       ...            

You may notice that some of the values in the table are missing (the handwriting can be completely unparsable!), by default they are omited, but we can always specify how we want this data to be filled using Direction or a custom function.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
// note that there're missing values in this dataset
let deflections = bones?``deflection<mm>``
val deflections : Series<int,float> =
 0  -> 0.06      
 1  -> 0.12      
 ...-> ...       
 12 -> <missing> 
Series.mean deflections
 val it : float = 0.2283333333 
// omit missing values
deflections |> Series.dropMissing |> Series.mean
 val it : float = 0.2283333333 
// fill missing values by copying forward
deflections |> Series.fillMissing Direction.Forward |> Series.mean
 val it : float = 0.2542857143 

Now let’s check if there’s any relation between the deflection and load. In theory, it’s supposed to be linear and we’re going to test that with a linear regression.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
/// Find slope and intercept with linear regression
let linearRegression xs ys = (...)
// drop rows with missing values
let bonesreg = Frame.dropSparseRows bones
val bonesreg : Frame<int,string> =
        sample type p    deflection<mm> 
   0 -> bone        0.98 0.06           
 ... -> ...         ...  ...            
   5 -> bone        5.88 0.41           

let load = Series.values bonesreg?p
let defl = Series.values bonesreg?``deflection<mm>``
let slope, intercept = linearRegression load defl   
val slope : float = 0.07142857143 
val intercept : float = -0.01666666667 

Chart

Does this line make a good fit? Here is a chart with a couple of samples from this dataset.

On the other hand a classical metric like R^2 can help to answer this question too, especially when it’s extremely simple to add a new column to the dataframe and perform some operations.

The new library is tried out, the lab is completed - everyone is happy ^_^

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16:   
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
bonesreg?prediction <- intercept + slope * bonesreg?p 
bonesreg?residualsq <- bonesreg?prediction - bonesreg?``deflection<mm>`` 
                       |> Series.mapValues (fun x -> x*x)
bonesreg
val it : Frame<int,string> =
       sample type p    deflection<mm> prediction         residualsq           
  0 -> bone        0.98 0.06           0.0533333333333334 4.44444444444441E-05 
  1 -> bone        1.96 0.12           0.123333333333333  1.11111111111111E-05 
 ...-> bone        ...  ...            ...                ...                  

let sdvs = Frame.sdv bonesreg
val sdvs : Series<string,float> =
  sample type    -> <missing>           
  p              -> 1.83341211951923    
  deflection<mm> -> 0.131059782796503   
  prediction     -> 0.130958008537088   
  residualsq     -> 1.72132593164778E-05

// compute the metrics:
let rsquare = let x = sdvs.["prediction"] / sdvs.["deflection<mm>"] in x * x
val rsquare : float = 0.9984475063 
let df = Frame.countRows bonesreg - 2 |> float
val df : float = 4.0 
let tvalue = sqrt (rsquare / (1. - rsquare) * df)
val tvalue : float = 50.71981861 
let se = (Series.sum bonesreg?residualsq) / df |> sqrt
val se : float = 0.005773502692 

Yes, it’s that simple.





namespace System
namespace Deedle
val ds : Frame<int,string>

Full name: Regression.ds

Multiple items
module Frame

from Deedle

--------------------
type Frame<'TRowKey,'TColumnKey (requires equality and equality)> =
  interface IDynamicMetaObjectProvider
  interface INotifyCollectionChanged
  interface IFsiFormattable
  interface IFrame
  new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
  private new : rowIndex:IIndex<'TRowKey> * columnIndex:IIndex<'TColumnKey> * data:IVector<IVector> -> Frame<'TRowKey,'TColumnKey>
  member AddSeries : column:'TColumnKey * series:ISeries<'TRowKey> -> unit
  member AddSeries : column:'TColumnKey * series:seq<'V> -> unit
  member AddSeries : column:'TColumnKey * series:ISeries<'TRowKey> * lookup:Lookup -> unit
  member AddSeries : column:'TColumnKey * series:seq<'V> * lookup:Lookup -> unit
  ...

Full name: Deedle.Frame<_,_>

--------------------
type Frame =
  static member CreateEmpty : unit -> Frame<'R,'C> (requires equality and equality)
  static member FromColumns : cols:Series<'TColKey,Series<'TRowKey,'V>> -> Frame<'TRowKey,'TColKey> (requires equality and equality)
  static member FromColumns : cols:Series<'TColKey,ObjectSeries<'TRowKey>> -> Frame<'TRowKey,'TColKey> (requires equality and equality)
  static member FromColumns : columns:seq<KeyValuePair<'ColKey,ObjectSeries<'RowKey>>> -> Frame<'RowKey,'ColKey> (requires equality and equality)
  static member FromColumns : columns:seq<KeyValuePair<'ColKey,Series<'RowKey,'V>>> -> Frame<'RowKey,'ColKey> (requires equality and equality)
  static member FromColumns : rows:seq<Series<'ColKey,'V>> -> Frame<'ColKey,int> (requires equality)
  static member FromRecords : values:seq<'T> -> Frame<int,string>
  static member FromRecords : series:Series<'K,'R> -> Frame<'K,string> (requires equality)
  static member FromRowKeys : keys:seq<'K> -> Frame<'K,string> (requires equality)
  static member FromRows : rows:Series<'TColKey,Series<'TRowKey,'V>> -> Frame<'TColKey,'TRowKey> (requires equality and equality)
  ...

Full name: Deedle.Frame

--------------------
new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>

static member Frame.ReadCsv : path:string * ?hasHeaders:bool * ?inferTypes:bool * ?inferRows:int * ?schema:string * ?separators:string * ?culture:string -> Frame<int,string>
static member Frame.ReadCsv : stream:IO.Stream * ?hasHeaders:bool * ?inferTypes:bool * ?inferRows:int * ?schema:string * ?separators:string * ?culture:string -> Frame<int,string>
val bySample : Frame<(string * int),string>

Full name: Regression.bySample

property Frame.Columns: ColumnSeries<int,string>
val groupRowsByString : column:'a -> frame:Frame<'b,'a> -> Frame<(string * 'b),'a> (requires equality and equality)

Full name: Deedle.Frame.groupRowsByString

property Frame.Columns: ColumnSeries<(string * int),string>
val meanLevel : keySelector:('R -> 'a) -> frame:Frame<'R,'C> -> Frame<'a,'C> (requires equality and equality and equality)

Full name: Deedle.Frame.meanLevel

module Pair

from Deedle

val get1Of2 : v:'a * 'b -> 'a

Full name: Deedle.Pair.get1Of2

val bones : Frame<int,string>

Full name: Regression.bones

val nest : frame:Frame<('R1 * 'R2),'C> -> Series<'R1,Frame<'R2,'C>> (requires equality and equality and equality)

Full name: Deedle.Frame.nest

val deflections : Series<int,float>

Full name: Regression.deflections

Multiple items
module Series

from Deedle

--------------------
type Series<'K,'V (requires equality)> =
  interface IFsiFormattable
  interface ISeries<'K>
  new : pairs:seq<KeyValuePair<'K,'V>> -> Series<'K,'V>
  new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
  new : index:IIndex<'K> * vector:IVector<'V> * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder -> Series<'K,'V>
  member Aggregate : aggregation:Aggregation<'K> * observationSelector:Func<DataSegment<Series<'K,'V>>,KeyValuePair<'TNewKey,OptionalValue<'R>>> -> Series<'TNewKey,'R> (requires equality)
  member Aggregate : aggregation:Aggregation<'K> * keySelector:Func<DataSegment<Series<'K,'V>>,'TNewKey> * valueSelector:Func<DataSegment<Series<'K,'V>>,'R> -> Series<'TNewKey,'R> (requires equality)
  member Append : otherSeries:Series<'K,'V> -> Series<'K,'V>
  member AsyncMaterialize : unit -> Async<Series<'K,'V>>
  override Equals : another:obj -> bool
  ...

Full name: Deedle.Series<_,_>

--------------------
type Series =
  static member ofNullables : values:seq<Nullable<'a0>> -> Series<int,'a0> (requires default constructor and value type and 'a0 :> ValueType)
  static member ofObservations : observations:seq<'a0 * 'a1> -> Series<'a0,'a1> (requires equality)
  static member ofOptionalObservations : observations:seq<'K * OptionalValue<'a1>> -> Series<'K,'a1> (requires equality)
  static member ofValues : values:seq<'a0> -> Series<int,'a0>

Full name: Deedle.FSharpSeriesExtensions.Series

--------------------
new : pairs:seq<Collections.Generic.KeyValuePair<'K,'V>> -> Series<'K,'V>
new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
new : index:Indices.IIndex<'K> * vector:IVector<'V> * vectorBuilder:Vectors.IVectorBuilder * indexBuilder:Indices.IIndexBuilder -> Series<'K,'V>

val mean : series:Series<'K,'V> -> 'V (requires equality and member ( + ) and member DivideByInt and member get_Zero)

Full name: Deedle.Series.mean

val dropMissing : series:Series<'K,'T> -> Series<'K,'T> (requires equality)

Full name: Deedle.Series.dropMissing

val fillMissing : direction:Direction -> series:Series<'K,'T> -> Series<'K,'T> (requires equality)

Full name: Deedle.Series.fillMissing

type Direction =
  | Backward = 0
  | Forward = 1

Full name: Deedle.Direction

Direction.Forward: Direction = 1
val linearRegression : xs:seq<float> -> ys:seq<float> -> float * float

Full name: Regression.linearRegression

 Find slope and intercept with linear regression

val xs : seq<float>
val ys : seq<float>
let x, y, xx, xy, n =
        Seq.zip xs ys
        |> Seq.fold (fun (xsum, ysum, xxsum, xysum, n) (x, y) ->
            xsum+x, ysum+y, xxsum+x*x, xysum+x*y, n+1.) (0.,0.,0.,0.,0.)
    let slope = (n * xy - x * y) / (n * xx - x * x)
    let intercept = (y - slope * x) / n
    slope, intercept
val bonesreg : Frame<int,string>

Full name: Regression.bonesreg

val dropSparseRows : frame:Frame<'R,'C> -> Frame<'R,'C> (requires equality and equality)

Full name: Deedle.Frame.dropSparseRows

val load : seq<float>

Full name: Regression.load

val values : series:Series<'K,'T> -> seq<'T> (requires equality)

Full name: Deedle.Series.values

val defl : seq<float>

Full name: Regression.defl

val slope : float

Full name: Regression.slope

val intercept : float

Full name: Regression.intercept

val mapValues : f:('T -> 'R) -> series:Series<'K,'T> -> Series<'K,'R> (requires equality)

Full name: Deedle.Series.mapValues

val x : float
val sdvs : Series<string,float>

Full name: Regression.sdvs

val sdv : frame:Frame<'R,'C> -> Series<'C,float> (requires equality and equality)

Full name: Deedle.Frame.sdv

val rsquare : float

Full name: Regression.rsquare

val df : float

Full name: Regression.df

val countRows : frame:Frame<'R,'C> -> int (requires equality and equality)

Full name: Deedle.Frame.countRows

Multiple items
val float : value:'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

--------------------
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>

--------------------
type float = Double

Full name: Microsoft.FSharp.Core.float

val tvalue : float

Full name: Regression.tvalue

val sqrt : value:'T -> 'U (requires member Sqrt)

Full name: Microsoft.FSharp.Core.Operators.sqrt

val se : float

Full name: Regression.se

val sum : series:Series<'K,'V> -> 'V (requires equality and member ( + ) and member get_Zero)

Full name: Deedle.Series.sum

  1. to be honest, still don’t get why anyone would need that, maybe that’s the point where being a PhD helps?

How to Name a Cat

A fun conversation about mountains, monads, types and heels reminded me about the first University years and one of our favourite math jo...… Continue reading

Keywords Mix

Published on March 15, 2016

Traditions vs Statistics: Sechseläuten

Published on April 25, 2015