Time Series Data Structures

Every project I’ve ever worked on included some form of temporal data. Even with that constant, each time the format was different – and seemingly arbitrary differences with varying success.

For sake of discussion, we need to put out some definitions.

Term Definition
<<timestamp>> a time value of any sort, for our discussion consider it whatever date and/or time related value you’re concerned with
<<start>> a time value for the OLDEST or MOST HISTORIC time frame of the sequence
<<end>> a time value for the NEWEST or MOST FUTURISTIC time frame of the sequence
<<dt>> a length of time the value is effective for
<<sequence ‘x’>> a unique title, name, or identifier for a given time sequence (in this case ‘x’)
<<resolution>> an expected frequency of data for one or more sequences
<<max resolution>> an upper bounds of the data frequency, for sequences of non-constant frequency
<<value>> some string, numeric, or complex structure that is a value you are interested in

Here are a just a few example formats. Everything has its pros and cons, and the list is in no way exhaustive.

Simple t/v sequence

Pretty straight forward – but you had better remember the context of your call. There are plenty of downsides here. We know a time and value, but no idea what sequence we are looking at, what time boundaries we’ve requested, or what frequency the data should be in.

[
  { "t": <<timestamp>>, "v": <<value>> },
  { "t": <<timestamp>>, "v": <<value>> },
  { "t": <<timestamp>>, "v": <<value>> },
  ... and so on ...
]

Sequence with context

We get some context of our data back – mainly stats about each series. We can also call for multiple series at once – getting back as many series as we want. There is a trade off between convinces of access and data redundancy in the structure itself.

[
  {
    id: "Series 'x'",
    start: <<start>>,
    end: <<end>>,
    stats: {
      min: <<value>>,
      max: <<value>>,
      avg: <<value>>,
      sample_count: <<value>>,
      ... and so on ...
    },
    [
      { t: <<timestamp>>, v: <<value>> },
      { t: <<timestamp>>, v: <<value>> },
      { t: <<timestamp>>, v: <<value>> },
      ... and so on ...
    ]
  }
  ... and some more sequences ...
]

Fixed resolution multi-series

A more complicated structure and compact for lack of redundancy. A downside of this format is the expectation for a fixed resolution of each sequence. If we have varying resolutions, this format gets a little ridiculous.

{
  sequences: {
    <<series 'x'>>: {
      stats: ... stats from previous example ...,
      values: [
        <<value>>,
        <<value>>,
        <<value>>,
        ... and so on ...
      ]
    },
    <<series 'y'>>: ... same pattern ...,
    <<series 'z'>>: ... same pattern ...,
    ... and more named series ...
  },
  start: <<start>>,
  end: <<end>>,
  resolution: <<resolution>>
}
Written by mackay on November 23, 2013 Categories: API Tags: , , ,
Comments Off on Time Series Data Structures