Optimization

This section shows how to optimize hyperparameters of covariance functions in Ariadne. The package currently implements two methods:

Metropolis-Hastings algorithm for sampling from the posterior. This method requires specifying a prior distribution for each hyperparameter.
Simple gradient descent algorithm. The implementation includes only a very basic form of gradient descent.

Metropolis-Hastings algorithm

The Metropolis-Hastings algorithm is a simple Monte Carlo method for sampling from the posterior distribution. Let \(\theta\) represent hyperparameters of a covariance function. Then the posterior distribution is defined as

\[p(\theta \mid X, Y) \propto p(Y \mid X, \theta) p(\theta)\]

where \(X\) are locations of observations and \(Y\) are the observed function values. Then \(p(Y \mid X, \theta)\) is the Gaussian process likelihood of observed data.

To be able to sample from the posterior distribution, first we need to specify a prior distribution over hyperparameters \(p(\theta)\). For the squared covariance function, hyperparameters are the lengthscale \(l\), signal variance \(\sigma^2\) and noise variance \(\sigma^2_{\text{noise}}\).

\[\theta = \left( l, \sigma^2, \sigma^2_{\text{noise}} \right)\]

Specifying prior distribution

Because the hyperparameters must be positive, we will use a simple log-normal prior distribution for each hyperparameter.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11:

#r "Ariadne.dll"
open Ariadne.GaussianProcess
open Ariadne.Kernels

#r "MathNet.Numerics.dll"
open MathNet.Numerics

let rnd = System.Random(3)
let lengthscalePrior = Distributions.LogNormal.WithMeanVariance(2.0, 4.0, rnd)
let variancePrior = Distributions.LogNormal.WithMeanVariance(1.0, 5.0, rnd)
let noisePrior = Distributions.LogNormal.WithMeanVariance(0.5, 1.0, rnd)

Ariadne includes a simplified method to work with squared exponential kernels. We can define a prior distribution over squared covariance hyperparameters and directly sample squared covariance kernels from it.

1: 
2: 
3:

let prior = SquaredExp.Prior(lengthscalePrior, variancePrior, noisePrior)
let kernel = prior.Sample()
let gp = kernel.GaussianProcess()

Now that we specified a prior distribution, we can use Metropolis-Hastings algorithm to get samples from the posterior distribution of hyperparameters given training data.

1: 
2: 
3: 
4: 
5: 
6:

// Optimization
open Ariadne.Optimization

let data = 
  [{Locations = (...)
    Observations = (...) }]

To sample from the posterior distribution with squared exponential kernel, we can use a built-in function optimizeMetropolis. This function runs Metropolis-Hastings algorithm with symmetric proposal distribution, i.e. basic Metropolis algorithm.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16:

// Metropolis sampling for squared exponential with default settings
let newKernel1 = 
    kernel
    |> SquaredExp.optimizeMetropolis data MetropolisHastings.defaultSettings prior

// Specify custom parameters
open Ariadne.Optimization.MetropolisHastings

let settings = 
    { Burnin = 500     // Number of burn-in iterations
      Lag = 5           // Thinning of posterior samples
      SampleSize = 100 } // Number of thinned posterior samples

let newKernel2 = 
    kernel
    |> SquaredExp.optimizeMetropolis data settings prior

The Metropolis sampler takes a specified total number of samples, 50 by default, and returns the posterior mean estimate for each hyperparameter.

Squared exponential, l = 0.52, σ² = 22.41, σ²_{noise} = 0.26

We can compare difference in fit of the randomly sampled covariance parameters kernel and sampled values in newKernel2. You might need to re-run the sampler for more iterations or with a different starting location to arrive to a good posterior estimate of hyperparameter values. Note that each iteration of the Metropolis-Hastings sampler has \(\mathcal{O}(N^3)\) time complexity.

1: 
2: 
3: 
4: 
5: 
6:

let gpInit = kernel.GaussianProcess()
let gpOptim = newKernel2.GaussianProcess()

let loglikInit = gpInit.LogLikelihood data
let loglikOptim = gpOptim.LogLikelihood data
printfn "Initial log likelihood: %f \nFinal log likelihood: %f" loglikInit loglikOptim

Initial log likelihood: -231.973587 
Final log likelihood: -41.512115

1: 
2: 
3: 
4: 
5:

open FSharp.Charting

kernel.GaussianProcess() |> plot data
|> Chart.WithTitle("Randomly sampled kernel", InsideArea=false)
|> Chart.WithMargin(0.0,10.0,0.0,0.0)

Chart

1: 
2: 
3:

newKernel2.GaussianProcess() |> plot data
|> Chart.WithTitle("Optimized kernel (MH)", InsideArea=false)
|> Chart.WithMargin(0.0,10.0,0.0,0.0)

Chart

Running general Metropolis-Hastings

It is also possible to run general Metropolis-Hastings algorithm to obtain samples from the posterior distribution by calling sampleMetropolisHastings function. The function operates over an array of parameter values. As parameters, it takes functions that compute log likelihood, transition probability, Markov Chain settings and a proposal sampler. See the source code for details.

Gradient descent algorithm

It is also possible to fit hyperparameters of covariance functions using any gradient-based optimization algorithm. Ariadne currently implements only basic gradient descent.

The gradientDescent function operates over an array of parameter values. It requires a function that computes gradient of log likelihood with respect to all hyperparameters. There is an implementation of gradient for squared exponential covariance kernel. Because there are local maxima in the likelihood function, gradient descent might require several restarts with different initial locations and step sizes.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14:

// Gradient descent settings
let gdSettings = 
    { GradientDescent.Iterations = 10000; 
      GradientDescent.StepSize = 0.01}

let gradientFunction parameters = SquaredExp.fullGradient data parameters
// Run gradient descent
let kernelGD = 
    gradientDescent gradientFunction gdSettings (kernel.Parameters)
    |> SquaredExp.ofParameters

let gpGD = kernelGD.GaussianProcess()
printfn "Original Gaussian process likelihood: %f" (gpInit.LogLikelihood data)
printfn "Optimized Gaussian process likelihood: %f" (gpGD.LogLikelihood data)

Original Gaussian process likelihood: -231.973587
Optimized Gaussian process likelihood: -37.614942

1: 
2: 
3:

gpGD |> plot data
|> Chart.WithTitle("Optimized kernel (GD)", InsideArea=false)
|> Chart.WithMargin(0.0,10.0,0.0,0.0)

Chart

namespace Ariadne

module GaussianProcess

from Ariadne

module Kernels

from Ariadne

namespace MathNet

namespace MathNet.Numerics

val rnd : System.Random

Full name: Optimization.rnd

namespace System

Multiple items
type Random =
  new : unit -> Random + 1 overload
  member Next : unit -> int + 2 overloads
  member NextBytes : buffer:byte[] -> unit
  member NextDouble : unit -> float

Full name: System.Random

--------------------
System.Random() : unit
System.Random(Seed: int) : unit

val lengthscalePrior : Distributions.LogNormal

Full name: Optimization.lengthscalePrior

namespace MathNet.Numerics.Distributions

Multiple items
type LogNormal =
  new : mu:float * sigma:float -> LogNormal + 1 overload
  member CumulativeDistribution : x:float -> float
  member Density : x:float -> float
  member DensityLn : x:float -> float
  member Entropy : float
  member InverseCumulativeDistribution : p:float -> float
  member Maximum : float
  member Mean : float
  member Median : float
  member Minimum : float
  ...

Full name: MathNet.Numerics.Distributions.LogNormal

--------------------
Distributions.LogNormal(mu: float, sigma: float) : unit
Distributions.LogNormal(mu: float, sigma: float, randomSource: System.Random) : unit

Distributions.LogNormal.WithMeanVariance(mean: float, var: float, ?randomSource: System.Random) : Distributions.LogNormal

val variancePrior : Distributions.LogNormal

Full name: Optimization.variancePrior

val noisePrior : Distributions.LogNormal

Full name: Optimization.noisePrior

val prior : SquaredExp.Prior

Full name: Optimization.prior

module SquaredExp

from Ariadne.Kernels

Multiple items
type Prior =
  new : lengthscalePrior:LogNormal * signalVariancePrior:LogNormal * noiseVariancePrior:LogNormal -> Prior
  member DensityLn : se:SquaredExp -> float
  member ParamsDensityLn : parameters:float [] -> float
  member Sample : unit -> SquaredExp
  member LengthscalePrior : LogNormal
  member NoiseVariancePrior : LogNormal
  member SignalVariancePrior : LogNormal

Full name: Ariadne.Kernels.SquaredExp.Prior

--------------------
new : lengthscalePrior:Distributions.LogNormal * signalVariancePrior:Distributions.LogNormal * noiseVariancePrior:Distributions.LogNormal -> SquaredExp.Prior

val kernel : SquaredExp.SquaredExp

Full name: Optimization.kernel

member SquaredExp.Prior.Sample : unit -> SquaredExp.SquaredExp

val gp : GaussianProcess<float>

Full name: Optimization.gp

member SquaredExp.SquaredExp.GaussianProcess : unit -> GaussianProcess<float>

module Optimization

from Ariadne

val data : Observation<float> list

Full name: Optimization.data

[|1988.0; 1993.0; 1996.0; 1999.0; 2001.0; 2002.0; 2003.0; 2004.0; 2005.0;
2006.0; 2007.0; 2008.0; 2009.0|];

[|-15.52307692; 9.056923077; 6.786923077; -1.843076923; 0.2769230769;
-3.623076923; -2.063076923; -2.183076923; -1.813076923; 2.806923077;
4.386923077; 2.946923077; 0.7869230769|];

val newKernel1 : SquaredExp.SquaredExp

Full name: Optimization.newKernel1

Multiple items
module SquaredExp

from Ariadne.Optimization

--------------------
module SquaredExp

from Ariadne.Kernels

val optimizeMetropolis : data:seq<Observation<float>> -> settings:MetropolisHastings.Settings -> prior:SquaredExp.Prior -> initialKernel:SquaredExp.SquaredExp -> SquaredExp.SquaredExp

Full name: Ariadne.Optimization.SquaredExp.optimizeMetropolis

module MetropolisHastings

from Ariadne.Optimization

val defaultSettings : MetropolisHastings.Settings

Full name: Ariadne.Optimization.MetropolisHastings.defaultSettings

val settings : Settings

Full name: Optimization.settings

val newKernel2 : SquaredExp.SquaredExp

Full name: Optimization.newKernel2

val optimizeMetropolis : data:seq<Observation<float>> -> settings:Settings -> prior:SquaredExp.Prior -> initialKernel:SquaredExp.SquaredExp -> SquaredExp.SquaredExp

Full name: Ariadne.Optimization.SquaredExp.optimizeMetropolis

val gpInit : GaussianProcess<float>

Full name: Optimization.gpInit

val gpOptim : GaussianProcess<float>

Full name: Optimization.gpOptim

val loglikInit : float

Full name: Optimization.loglikInit

member GaussianProcess.LogLikelihood : data:seq<Observation<'T>> -> float

val loglikOptim : float

Full name: Optimization.loglikOptim

val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn

namespace FSharp

namespace FSharp.Charting

val plot : data:seq<Observation<float>> -> gp:GaussianProcess<float> -> ChartTypes.GenericChart

Full name: Ariadne.GaussianProcess.plot

type Chart =
  static member Area : data:seq<#value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string -> GenericChart
  static member Area : data:seq<#key * #value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string -> GenericChart
  static member Bar : data:seq<#value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string -> GenericChart
  static member Bar : data:seq<#key * #value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string -> GenericChart
  static member BoxPlotFromData : data:seq<#key * #seq<'a2>> * ?Name:string * ?Title:string * ?Color:Color * ?XTitle:string * ?YTitle:string * ?Percentile:int * ?ShowAverage:bool * ?ShowMedian:bool * ?ShowUnusualValues:bool * ?WhiskerPercentile:int -> GenericChart (requires 'a2 :> value)
  static member BoxPlotFromStatistics : data:seq<#key * #value * #value * #value * #value * #value * #value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string * ?Percentile:int * ?ShowAverage:bool * ?ShowMedian:bool * ?ShowUnusualValues:bool * ?WhiskerPercentile:int -> GenericChart
  static member Bubble : data:seq<#value * #value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string * ?BubbleMaxSize:int * ?BubbleMinSize:int * ?BubbleScaleMax:float * ?BubbleScaleMin:float * ?UseSizeForLabel:bool -> GenericChart
  static member Bubble : data:seq<#key * #value * #value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string * ?BubbleMaxSize:int * ?BubbleMinSize:int * ?BubbleScaleMax:float * ?BubbleScaleMin:float * ?UseSizeForLabel:bool -> GenericChart
  static member Candlestick : data:seq<#value * #value * #value * #value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string -> CandlestickChart
  static member Candlestick : data:seq<#key * #value * #value * #value * #value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string -> CandlestickChart
  ...

Full name: FSharp.Charting.Chart

static member Chart.WithTitle : ?Text:string * ?InsideArea:bool * ?Style:ChartTypes.TextStyle * ?FontName:string * ?FontSize:float * ?FontStyle:System.Drawing.FontStyle * ?Background:ChartTypes.Background * ?Color:System.Drawing.Color * ?BorderColor:System.Drawing.Color * ?BorderWidth:int * ?BorderDashStyle:ChartTypes.DashStyle * ?Orientation:ChartTypes.TextOrientation * ?Alignment:System.Drawing.ContentAlignment * ?Docking:ChartTypes.Docking -> ('a0 -> 'a0) (requires 'a0 :> ChartTypes.GenericChart)

static member Chart.WithMargin : left:float * top:float * right:float * bottom:float -> ('T -> 'T) (requires 'T :> ChartTypes.GenericChart)

val gdSettings : GradientDescent.Settings

Full name: Optimization.gdSettings

module GradientDescent

from Ariadne.Optimization

val gradientFunction : parameters:float [] -> float []

Full name: Optimization.gradientFunction

val parameters : float []

val fullGradient : data:seq<Observation<float>> -> parameters:float [] -> float []

Full name: Ariadne.Kernels.SquaredExp.fullGradient

val kernelGD : SquaredExp.SquaredExp

Full name: Optimization.kernelGD

val gradientDescent : gradientFun:(Parameters -> Parameters) -> settings:GradientDescent.Settings -> initialLocation:Parameters -> float []

Full name: Ariadne.Optimization.gradientDescent

property SquaredExp.SquaredExp.Parameters: float []

val ofParameters : parameters:seq<float> -> SquaredExp.SquaredExp

Full name: Ariadne.Kernels.SquaredExp.ofParameters

val gpGD : GaussianProcess<float>

Full name: Optimization.gpGD