Storing and publishing sensor data
Now that I have started to monitor some magnitudes at home, like power consumption or the front door opening, I have to do something with this information. The sensor information can be useful for two purposes, mainly:
- analysing it to know more about your environment or your life-style patterns (like when and how you spend the money you spend on energy)
- taking real-time actions based on events from your sensors (open a light when you pass by a dark corridor at night, receive a notification when someone enters your house,…)
To analyse the information you will first have to store it in some way you could retrieve it later, graph it, summarize it, perform different time range roll ups or moving averages or detect time patterns,… whatever. Even if the goal of the sensor information is to trigger events (watch out: the door sensor is running out of batteries!) you will probably want to have it stored somewhere and save a log of actions as well.
So, where to store this information? You have different options available:
- Relational databases (RDBMS) like MySQL or PostgreSQL. If you have some programming background this is pretty easy, we are all used to store data on a MySQL or similar. But mind the tables will GROW very quickly. A sensor reporting every minute will create more than half a million entries a year.
- NO-SQL databases like MongoDB or Cassandra. They are popular and everyone wants to use them. Cassandra is often used to store time-series data like server load or log files and the Internet is full of information about how to store it, retrieve it, partition it,… MongoDB has also some nice features like capped collections.
- Round-robin databases. They have been around for a while. RRDtool version 1.0 was born in 1999 and MRTG is even older. Data is stored in circular buffers, you can define the size of the buffer and the step or how often a new value is inserted into the buffer. When the buffer is full old data gets overwritten and this happens over and over. So you have another buffer for aggregated data (like daily data) and another one for even larger scale data (like weekly data) and so on.
- Time series databases. And finally there are a number of databases that have been specifically designed to store time-series data: fast inserts, fast range lookups, roll ups,…
Now, some companies are providing time series storage services online. They provide a public API, graphing capabilities and some of them even roll ups or data aggregation. They have been born around the Internet of Things hype. Maybe the most popular is Cosm (formerly Pachube) but it's not the only one: TempoDB, Nimbits or ThingSpeak are very good options.
Cosm has been around for a while and it is still the most used by the IoT community. AFAIK, there is no restriction on the number of data points you can store for free if you keep your data “public”. Data is structured in “feeds” and “datastreams”. They have an API you can use to push data individually or in batches. You can also use it to backup the data you have in Cosm. Their charts are nice looking but they have predefined time ranges that automatically aggregate your data so you have little control on what and how you want to graph. They have a number of goodies like triggers, graph builder (to insert your Cosm charts in your site), tags, apps, localization based search,…
TempoDB is a relatively new player. It's free up to 5 million data points per database, that's roughly 9.5 years for a 1 samples/minute data. You can have several “series” of data per database. Their API is really good and they have a number of clients including a python one :^). Their charts might not be as “pretty” as those from Cosm but you have full control on the data you are visualizing: date range, interval and aggregation functions, and they are FAST (Cosm can be really sluggish sometimes). Your data is private, so there is no sense in having a search tool. They have tags and I've been told they are about to release a notification service.
Nimbits has a desktop-like interface base on ExtJS. It is basically a paid service (their free quota is 1000 API requests per day, which is OK for 1 sensor reporting at a rate of 1 samples every 2 minutes) and it costs $20/2 million requests (roughly $20 per year if you have 4 sensors reporting at a 1 sample/minute rate).
ThingSpeak is very similar to Cosm, data is presented in a “channel” where you can see multiple charts for different magnitudes (they call them “fields”), a map and even a video from Youtube. It's open source, so you can clone the code from Github and install it in you server. Their web service is limited to 1 API request every 15 seconds per channel, which is OK as long as you group all the data from one channel in a single request (that's it if you have more than one field per channel). For each field you have a lot of options to define your chart: time range, time scale, different aggregation functions, colors, scale,…
The information in this post is not very detailed but I hope it can be an introduction to anyone who is looking for ways to store or publish sensor data. Right now I'm using Cosm and TempoDB, as well as storing all my data in a local MySQL database (not the best option but it works for now). There are plenty of options I still have to explore. In my next post I will talk about the daemon I'm using to push data to Cosm and TempoDB, in the meantime you can check the code on Github.
"Storing and publishing sensor data" was first posted on 19 January 2013 by Xose Pérez on tinkerman.cat under Analysis, Learning and tagged cloud services, cosm, nimbits, storing sensor data, tempdb, thingspeak.