How to Name a Cat

A fun conversation about mountains, monads, types and heels reminded me about the first University years and one of our favourite math jokes. Here’s an attempt to translate it (original):

- Btw, Basis is a good name for a cat.
- Yeah, imposing...
- "Basis, come here". 
  "Yes... Basis is full... Look at our complete Basis."
  "Basis, shoo from the bed! Where is your space?"
- "C'mon, convert to normal!"
- "Basis, we'll measure you!"
  "We want to know your dimension"
  And if we get another cat with the same name, we can make a change of Basis.
  And we also can rotate Basis.
- And shear ))
- And scale a bit! (in one dimension though)
  And when it stands, it's orthogonal!
  The paws are the same, so normalized too!
  i.e. we have EUCLIDEAN BASIS!!!
  Decided.

cat

Keywords Mix

I got recently caught up with the paper work and came accross some fun notes from years ago! Couldn’t wait to share this contextual keywords mix, the courtesy of @v2_matveev:

1
2
3
4
5
6
7
partial class where<partial> where partial: where<partial>
{
	dynamic dynamic<yield>(dynamic dynamic)
	{
		return dynamic<yield>(dynamic);
	}
}

Yes, it compiles :)

Traditions vs Statistics: Sechseläuten

What do you expect from this summer? Should you worry about 20:39 or not? Sunglasses or umbrella? Let’s find out! 1

Traditions

Each country has its own fun traditions and festivals, and Switzerland is not an exception. One of the most exciting events is the spring festival, Sechseläuten, which is celebrated in Zürich in April. And it’s also a day when you realize, that there’re some people in town except the tourists walking along the Bahnhofstrasse on Saturdays! The trees become green, the flags appear here and there, the city becomes lively and colourful. And this year the day was just perfectly sunny. Not the best one for the drivers though because of the parade of Guilds (everyone I showed the video especially likes the bear ~1:57 :).

Parade

The culmination of Sechseläuten is the burning of Böögg, a figure of snowman - the winter symbol. There’s a belief, that the faster the head explodes, the nicer summer is going to be: say, below 10 min is for sunny and warm weather, above 15 min - for rains and cold… This year it took 20 min 39 s! Doesn’t look very nice, huh?

Statistics

So is there any correlation between the Böögg’s forecast and actual weather? I couldn’t resist checking the facts and downloaded the daily data for the Zürich weather station from the European Climate Assessment & Dataset - it’s free and contains a lot of information for the 20th century, including the period we’re interested in, I was lucky to find it rather quickly! One can find the yearly Sechseläuten statistics, including the burning times, on the official Sechseläuten website (in German).

The criteria for what makes a nice summer are pretty subjective - I picked the mean, min and max temperature, hours of sunshine, cloud cover, humidity and precipitation. Let’s assume the cold summer means lower temperatures, or more clouds, or less sunshine, or all of these together.

Now when we know more or less what we’d like to look at, we need to extract that information from the data - a bunch of simple csv files.

The weather datasets uses -9999 as indication of missing data, and another relevant column contains quality codes. As there’s no requirement for each day’s data to be present, we’ll select only the valid records.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
type QualityCode =
  | Valid = 0
  | Suspect = 1
  | Missing = 9
  
// read the dataset from a file, index by DATE and remove unnecessary columns
let readData (fileName: string) = 
    Frame.ReadCsv fileName  
    |> Frame.indexRowsInt "DATE"
    |> Frame.filterRowValues (fun r -> r?Q = float QualityCode.Valid)
    |> Frame.dropCol "SOUID"
    |> Frame.dropCol "Q"

let dataFrames = 
    List.map readData
        [
            "mean_temperature.txt" // TG, in 0.1 C
            "min_temperature.txt"  // TN, in 0.1 C
            "max_temperature.txt"  // TX, in 0.1 C
            "precipitation.txt"    // RR, in 0.1 mm
            "sunshine.txt"         // SS, in 0.1 hrs
            "humidity.txt"         // HU, in 1%
            "clouds.txt"           // CC, in oktas
        ]

Cloud cover is in oktas, where 0 means perfectly clear sky and 8 - completely cloudy.

Now we merge all these datasets, select only the summer months, and transform the temperatures to be in degrees C, sunshine - in hours (per day) and so on:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
let weatherData = 
    Frame.mergeAll dataFrames 
    |> Frame.filterRows (fun date _ -> let m = date / 100 % 100 in 6 <= m && m <= 8)
    |> Frame.mapCols (fun col vs -> 
        let series = vs.As() 
        match col with
        | "HU" -> series / 100.0
        | "CC" -> series
        | _ -> series / 10.0)  

Finally, we combine the burning times and weather averages for corresponding years:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
let weatherDataByYear =
    weatherData
    |> Frame.groupRowsByIndex (fun date -> date / 10000)
    |> Frame.applyLevel fst Stats.mean

let fullStats = Frame.join JoinKind.Inner boeoeggStats weatherDataByYear
fullStats.SaveCsv("fullStats.csv", keyNames = ["year"])

Now that we have something to look at, let’s start with the mean temperatures. We will use R for creating the charts:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
meanTempPlot = qplot(duration, 
                     TG, 
                     data   = stats, 
                     geom   = c("point", "smooth"), 
                     method = "lm", 
                     color  = TG,
                     main   = "Mean Temperature ~ Duration", 
                     xlab   = "Duration, min", 
                     ylab   = "Temperature, C")

meanTempBoxplot = qplot(duration, 
                        TG, 
                        data = stats, 
                        geom = c("boxplot", "jitter"), 
                        outlier.colour = "red", 
                        fill = I("#3366FFAF"),
                        main = "", 
                        xlab = "Duration, min", 
                        ylab = "Temperature, C")

multiplot(meanTempPlot, meanTempBoxplot)

Mean Temperature

It’s hard not to notice that there’s no dependency between the mean temperature and duration. However, there’re some outliers (the red dots): for example, the point in the top-left with the highest temperature is too far from the clustered group. Ironically, that was the most “accurate” prediction, in the year 2003, when it took less than 6 min for the head to explode and the summer was very warm and sunny.

The next two “best” years are in the rows with 20+ min. Even though in 2013 the weather was not that bad despite of 35 min for burning. The data in the table below is sorted by mean temperatures:

1: 
statsByTG = stats[order(stats$TG),]  
year time avg T min T max T precipitation sunshine humidity cloud cover
  (min) (C) (C) (C) (mm) (hrs)   (oktas)
1956 4.00 15.07 11.12 21.13 5.70 5.64 0.74 5.66
1978 12.00 15.38 11.42 20.27 4.52 5.53 0.73 5.02
1980 17.00 15.63 12.10 20.24 3.91 4.69 0.75 5.30
1972 8.00 15.77 11.86 20.70 4.84 6.28 0.78 5.36
1965 20.00 15.84 11.77 22.21 5.17 6.15 0.77 5.32
1966 16.00 15.84 11.62 22.12 5.24 6.36 0.74 5.21
1968 5.00 16.03 11.72 22.48 3.91 6.94 0.71 5.20
               
2013 35.18 18.53 NA NA NA 8.14 0.70 4.21
1952 6.00 18.75 13.68 25.73 3.05 8.79 0.65 4.14
1950 50.00 18.76 13.57 25.87 3.43 8.72 0.65 4.08
1983 24.33 19.06 14.15 24.63 1.72 6.77 0.67 4.02
1994 21.92 19.15 14.36 24.77 2.77 7.09 0.72 4.34
2003 5.70 21.66 16.53 28.02 2.29 9.20 0.63 3.68

 

As we see, the weather tends to be average - very dry, or rainy, or hot summers are exceptions. If you try the random forests (regression) with this data, % Var explained is negative.

Weather

One can argue, that for a number of reasons (the quality of weather measures, different structure of the Böögg itself or anything else) the years 1950 and 2014 are not comparable. I tried to run the same experiment with the data for the last 15 years, removing outliers and adding the labels for bad/ok/nice weather to make it a classification problem instead of regression - but the main conclusion is the same. Take a look at the historical averages for each of the summer days (the dark-blue line): the normal range (steel-blue) is quite narrow, even though the lowest and highest points (light-blue) may vary a lot2:

Historical Data

Expectations

The results are pretty reassuring: we can be optimistic and hope that this summer will be great, or at least average - which is also quite nice. The weather is not an easy thing to predict months in advance!

Reality

Update: The summer 2015 happened to be extremely hot and beat some temperature records, not only in Europe. Let’s hope we’ll have more of a regular summer kind in the future.

And I’ve recently found out there’s also an official study (in German): Böögg Prognose.

 

 

  1. I realized that I forgot to publish this post on time (the date of event) only several days ago, then I decided to convert the data processing part and use Deedle… Maybe that’s because of sunny Milan, but it took some time to figure out which functions to use for relatively simple groupings and aggregations.

  2. To see how you can create similar kinds of plots check this post.

Reshaping Arrays in .NET

Reshape is one of these little functions, which look so simple and straightforward that you don’t think much about them. It’s quite useful and, I’m pretty sure, familiar to everyone who’s written something in a language like Matlab. However, in other languages it might become a bit tricky.

Question

I asked a bunch of people how they would write reshape in a .NET language. That’s not really a problem when you know the exact type and dimensionality, e.g. when you always convert 2D-array to 3D, but what if you don’t? Given the limitations of generic constraints, the only requirement to the output (in addition to correctness, of course) was an ability to get the full information about the underlying types. And that’s where the challenge starts…

Requirements

Let’s take a look at a somewhat simplified problem - converting a single-dimensional array into a multidimensional one. My usecase was reading the files in .mat binary format as a part of the type provider: read the bytes, figure out the type and if it’s supposed to be an array - make it an array. Type conversions are not relevant for reshaping itself, so we can skip that for now.

  • The input is a single-dimensional array of any type and another array with dimension sizes;
  • The output is another array, properly reshaped, where the type information is preserved (not necessary explicitly);
  • Use standard types if possible.

Meet System.Array

Unfortunately, that’s not the case for fancy type signatures, so we’ll go with a plain old System.Array.

The simplest example is 1D -> 1D “conversion”, it’s possible to get the type information or cast the output to a real array type, whether it’s int[], string[], float[][] or anything else:

1: 
2: 
3: 
4: 
5: 
let f (xs: System.Array) = xs

let xs = [| 1;2;3;4;5 |]
let ys = f xs // val ys : System.Array = [|1; 2; 3; 4; 5|]
ys.GetType().Name // "Int32[]"

Great! The next step is to add a dimension:

1: 
2: 
let f2 (xs: System.Array) = [| xs; xs |]
let ys = f2 xs // val ys : System.Array [] = [|[|1; 2; 3; 4; 5|]; [|1; 2; 3; 4; 5|]|]

This one doesn’t work for us because of its return type, so:

1: 
2: 
3: 
let f3 (xs: Array) = [| xs; xs|] :> Array;;
let ys = f3 xs // val ys : Array = [|[|1; 2; 3; 4; 5|]; [|1; 2; 3; 4; 5|]|]
ys.GetType().Name // "Array[]"

And the same output with Array.init methods:

1: 
2: 
3: 
let f4 (xs: Array) = Array.init 2 (fun _ -> xs) :> Array;;
let ys = f4 xs // val ys : System.Array = [|[|1; 2; 3; 4; 5|]; [|1; 2; 3; 4; 5|]|]
ys.GetType().Name // "Array[]"

You also can’t cast the result any more - ys :?> int[][] throws an exception:

System.InvalidCastException: Cannot cast from source type to destination type. at <StartupCode$FSI_0017>.$FSI_0017.main@ () [0x00000] in <filename unknown>:0 at (wrapper managed-to-native) System.Reflection.MonoMethod:InternalInvoke (System.Reflection.MonoMethod,object,object[],System.Exception&) at System.Reflection.MonoMethod.Invoke (System.Object obj, BindingFlags invokeAttr, System.Reflection.Binder binder, System.Object[] parameters, System.Globalization.CultureInfo culture) [0x00000] in <filename unknown>:0 Stopped due to error

And yes, of course, you can get the ints out of it:

1: 
2: 
3: 
let arr = ys.GetValue 0 // val arr : obj = [|1; 2; 3; 4; 5|]
arr.GetType().Name // val it : string = "Int32[]"
arr :?> int[] // val it : int [] = [|1; 2; 3; 4; 5|]

That’s somewhat disappointing, because we do need something castable, with more or less concrete type. But it’s quite intuitive: the argument is Array - the output is Array[], in the typed version 'T[] - 'T[][]:

1: 
2: 
3: 
4: 
let f5 (xs: int[]) = Array.init 2 (fun _ -> xs) :> Array
let ys = f5 xs // val ys : System.Array = [|[|1; 2; 3; 4; 5|]; [|1; 2; 3; 4; 5|]|]
ys.GetType().Name // val it : string = "Int32[][]"
ys :?> int[][] // val it : int [] [] = [|[|1; 2; 3; 4; 5|]; [|1; 2; 3; 4; 5|]|]

or something like that:

1: 
2: 
3: 
let f6 (xs: Array) = Array.init 2 (fun _ -> xs :?> int[]) :> Array
let ys = f6 xs // val ys : Array = [|[|1; 2; 3; 4; 5|]; [|1; 2; 3; 4; 5|]|]
ys.GetType().Name // val it : string = "Int32[][]"  

At this point we’re back to System.Array and its method CreateInstance:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
let f7 (xs: Array) =                   
    let t = xs.GetType()                                                
    let ys = Array.CreateInstance(t, 2)
    for i in 0..1 do ys.SetValue(xs, i)
    ys 
let ys = f7 xs // val ys : Array = [|[|1; 2; 3; 4; 5|]; [|1; 2; 3; 4; 5|]|]
ys.GetType().Name // val it : string = "Int32[][]"  

Reshape

So this is the way to go. For an arbitrary number of dimensions we can use the same set of functions, the only thing left is filling the array with actual values.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
// init array of specific type
let fill t (len: int) f =
    let xs = Array.CreateInstance(t, len)
    for i in 0..len - 1 do xs.SetValue(f i, i)
    xs

// prod of prev dimenstions: [|2; 3; 4|] -> [|1; 2; 6; 24|]
let inline prods dims = Array.scan ((*)) 1 dims 

// all array types from T[][]..[] to T[]
let types (t: Type) n =
    (t, 0)
    |> Seq.unfold (fun (t, i) -> if i = n then None else Some (t, (t.MakeArrayType(),i+1)))
    |> Seq.toArray
    |> Array.rev

For simplicity let’s assume that all inputs are already nice and valid (no nulls, the product of dimensions is equal to the length of array and so on), then reshape might look like:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
let reshape (arr: Array) (dims: int[]) =
    let t = arr.GetType().GetElementType()
    let ts = types t dims.Length

    let rec init dim k =
        if dim = dims.Length - 1 then 
            fill ts.[dim] dims.[dim] (fun i -> arr.GetValue (k * dims.[dim] + i))
        else			
            fill ts.[dim] dims.[dim] (fun i -> init (dim+1) (k * dims.[dim] + i))
    init 0 0

This function initializes the elements sequentially:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
reshape [|1..12|] [|2;3;2|]                                                   
// val it : Array =
//   [|[|[|1; 2|]; [|3; 4|]; [|5; 6|]|]; [|[|7; 8|]; [|9; 10|]; [|11; 12|]|]|]
reshape [|1..12|] [|2;6|];;  
// val it : Array = [|[|1; 2; 3; 4; 5; 6|]; [|7; 8; 9; 10; 11; 12|]|]
reshape [|1..12|] [|1;1;12|];;                                                   
// val it : Array = [|[|[|1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12|]|]|]
(reshape [|1..12|] [|2;3;2|]).GetValue 1 :?> int[][]
// val it : int [] [] = [|[|7; 8|]; [|9; 10|]; [|11; 12|]|]

However, that’s not how the original function works. For example (you can try it out in octave-online),

squeeze(reshape((1:12),[2 3 2])(2,:,:))
 ans =
 
    2    8
    4   10
    6   12

It might take a while to get a feeling how the arrays are reshaped, just run some examples and check where the numbers go. Meanwhile here’s a function which does what’s required:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
let reshape (arr: Array) (dims: int[]) = 
    let t = arr.GetType().GetElementType()

    let ps = prods dims
    let ts = types t dims.Length

    let rec init dim k =
        if dim = dims.Length - 1 then
            fill ts.[dim] dims.[dim] (fun i -> arr.GetValue (ps.[dim] * i + k))
        else 
            fill ts.[dim] dims.[dim] (fun i -> init (dim+1) (ps.[dim] * i + k))
    init 0 0  
	
(reshape [|1..12|] [|2;3;2|]).GetValue 1 :?> int[][];;                             
// val it : int [] [] = [|[|2; 8|]; [|4; 10|]; [|6; 12|]|]

In the end we do get a new array together with its type information, so the type provider can help you to avoid the casts from System.Array.

Interop Adventure: From .NET to Matlab and Back

A lot of exciting things start with the word interop, at least the chances to make several surprising discoveries are pretty high. So, we decided to call some Matlab functions from .NET, what could possibly go wrong?

This post as a short summary of our discoveries by trial and error method.

Why would anyone need that?

Easy - because something we needed was implemented in Matlab and that was something using a bunch of mathy packages, which we didn’t want to spend years rewriting. The calls performance wasn’t a big issue either. On the other hand, it worked quite well and produced expected results 1 and should have been a benchmark for the code which we actually moved to .NET.

What are the options?

There’s not that many options to choose from, so if you don’t want to deal with C or COM (and don’t mind buying an extra package) you go for the NE Builder. There’s also an F# Matlab Type Provider, which wasn’t an option for us though. The nice thing about the builder is that it generates “type-safe” APIs and the libraries can be deployed on the machines without Matlab installation, the only requirement is MCR (Matlab Compiler Runtime) - which is free.

Data conversion

Type safe APIs are great. When they are type safe. And they definitely were… to some extent. You start with writing an interface and compiling a library, carefully type-checking Matlab sources in your head 2 and writing down the output, then run the builder - and hopefully grab the compiled dlls. If something breaks at this stage - it’s pure luck, because the error’d be more or less comprehensible.

Attention now goes to:
- types (careful with number conversions and out parameters),
- names (function and its parameters),
- project definition (the functions should be mapped to the API in the build project).

The errors probably mean that a function is missing, or its parameter, or it was renamed… However, you still can successfully build the library which will happilly crash at runtime. And here the fun begins.

There’re some docs about types and data conversion. Unfortunately, I don’t have Matlab installed on my machine now, so can’t check that, but seems like some things changed for the better in the most recent version (2014b).

Structs and Classes

The first thing to remember is that the names of fields/properties should match exactly, including case. If you have a property ‘Name’ in C# and call ‘name’ in Matlab, you’ll get a runtime error. Just as if you call non-existent property. Obviously, all the checks for existing fields like isfield(Person, 'Name') become useless and need to be replaced.

It’s also a good idea to define the data structures with structs and not classes, otherwise dealing with function is quite painful.

Numerical Types

Just be very careful about the numbers. It’s easy when you have only doubles, but everything else requires extra attention:

int32(5) + [0.1 1 10].^2
Error using  + 
Integers can only be combined with integers of the same class, or scalar doubles.  

But these are ok:

5 + [0.1 1.0].^2  
int32(5) + 0.1.^2

Interesting, but all the examples work in octave.

Arrays

Note: Multidimensional arrays of above C# types are supported. Jagged arrays are not supported.

When a null is passed from C# to MATLAB, it will always be marshaled into [] in MATLAB as a zero by zero (0 x 0) double.

Depending on a use case multidimensional arrays might be not useful at all, but passing around jagged arrays actually somehow worked, though the dimensions were mixed up: you pass [A x B x C] and get [B x C x A] in Matlab, but nothing permute(matrix, [3 1 2]) couldn’t fix. And, of course, the opposite when passing the results back. Also, passing nulls didn’t work, so to avoid all the NRE we had to create empty arrays, like 1x1x0, that didn’t kill anyone.

What you don’t want to use is a Matlab function returning 3d+ array. At least if there’s a chance that the last dimensions are singleton, because Matlab ‘ignores’ them. Everything is fine when you have 1000x1000x1000 or 2x3x4, but 1x1x1 is the same as 1x1, so .NET wrapper (which expects 3d, i.e. 1x1x1) will fail.

When you finally got the data right

There’s a bunch of Matlab functions which you can’t use in a deployed application, e.g. addpath, you’ll have to remove them or check for isdeployed flag and use only when it’s false.

Setting up Matlab builds on a build server was another special kind of pain, when it turned out that if the installation path (which happened to be Program Files something) contained a space, Matlab couldn’t resolve the dependencies…

But - with some patience, of course - it worked out, happy end! Hope that your way will be more peaceful than this one ;)

 

 

  1. as you might guess, there were more results than ‘expected’, so we ended up fixing the Matlab part too. And the fact that the Optimization toolbox was changed between different Matlab versions was quite exciting too.

  2. pleease, please always add a comment describing function parameters in the code, at least the matrix dimensions! Unless you hate all the people.