3. What is time series data?
• A time series data is a sequence of data points
made from the same source over the time interval.
• If you have a time series data and plot it, one of
your axes will be always a time.
11. Time series data is good for
• Internet of Things (e.g. sensors data)
• Alerting
• Monitoring
• Real Time Analytics
12. InfluxDB is I in TICK stack
• Telegraf - time data collector
• InfluxDB - time series database
• Chronograf - time series data visualization
• Kapacitor - time series data processing and
alerting
13. InfluxDB features
• SQL-like query language
• Schemaless
• Case sensitive
• Data types: string, float64, int64, boolean
14. Measurement
• Measurement (or Point) is a single record (row) in
InfluxDB data store
• Each measurement has time (as primary key), tags
(indexed columns) and fields (not indexed
columns)
21. Querying
• Show databases:
> SHOW DATABASES
• Select database:
> USE workshop
• Show measurements („tables”)
> SHOW MEASUREMENTS
• Simple select all
> SELECT * FROM measurement_name
22. Querying (2)
• Select with limit:
> SELECT * FROM measure LIMIT 10
• Select with offset:
> SELECT * FROM measure OFFSET 10
• Select where clause:
> SELECT * FROM measure WHERE tag1 = ’value1’
• Select with order clause:
> SELECT * FROM measure ORDER BY cpu DESC
23. Querying (3)
• Operators:
= equal to
<>, != not equal to
> greater than
< less than
=~ matches against (REGEX)
!~ doesn’t matches against (REGEX)
24. Aggregations - COUNT()
Returns the number of non-null values.
> SELECT count(<field>) FROM measure
> SELECT count(cpu) FROM cpu_temp
WHERE time > '2016-07-04'
AND time < '2016-07-05'
GROUP BY time(1h)
25. Aggregations - MEAN()
Returns the mean (average) value of a single field
(calculates only for non-null values).
> SELECT mean(<field>) FROM measure
> SELECT mean(cpu) FROM cpu_temp
WHERE time > '2016-07-04'
AND time < '2016-07-05'
GROUP BY time(1h)
26. Aggregations - MEDIAN()
Returns the middle value from the sorted values in
single field (Its similar to PERCENTILE(field, 50).
> SELECT median(<field>) FROM measure
> SELECT median(cpu) FROM cpu_temp
WHERE time > '2016-07-04'
AND time < '2016-07-05'
GROUP BY time(1h)
27. Aggregations - SPREAD()
Returns the difference between minimum and
maximum value of the field.
> SELECT spread(<field>) FROM measure
> SELECT spread(cpu) FROM cpu_temp
WHERE time > '2016-07-04'
AND time < '2016-07-05'
GROUP BY time(1h)
28. Aggregations - SUM()
Returns the sum of all values in a single field.
> SELECT sum(<field>) FROM measure
> SELECT sum(cpu) FROM cpu_temp
WHERE time > '2016-07-04'
AND time < '2016-07-05'
GROUP BY time(1h)
29. Selectors - BOTTOM(N)
Returns the smaller N values in a single field.
> SELECT bottom(<field>, <N>) FROM measure
> SELECT bottom(cpu, 5) FROM cpu_temp
WHERE time > '2016-07-04'
AND time < '2016-07-05'
GROUP BY time(1h)
30. Selectors - FIRST()
Returns the oldest values of a single field.
> SELECT first(<field>) FROM measure
> SELECT first(cpu) FROM cpu_temp
WHERE time > '2016-07-04'
AND time < '2016-07-05'
GROUP BY time(1h)
31. Selectors - LAST()
Returns the newest values of a single field.
> SELECT last(<field>) FROM measure
> SELECT last(cpu) FROM cpu_temp
WHERE time > '2016-07-04'
AND time < '2016-07-05'
GROUP BY time(1h)
32. Selectors - MAX()
Returns the highest value in a single field.
> SELECT max(<field>) FROM measure
> SELECT max(cpu) FROM cpu_temp
WHERE time > '2016-07-04'
AND time < '2016-07-05'
GROUP BY time(1h)
33. Selectors - MIN()
Returns the lowest value in a single field.
> SELECT min(<field>) FROM measure
> SELECT min(cpu) FROM cpu_temp
WHERE time > '2016-07-04'
AND time < '2016-07-05'
GROUP BY time(1h)
34. Selectors - PERCENTILE(N)
Returns the N-percentile value for sorted values of a
single field.
> SELECT percentile(<field>, <N>) FROM measure
> SELECT percentile(cpu, 95) FROM cpu_temp
WHERE time > '2016-07-04'
AND time < '2016-07-05'
GROUP BY time(1h)
35. Selectors - TOP(N)
Returns the largest N values in a single field.
> SELECT top(<field>, <N>) FROM measure
> SELECT top(cpu, 5) FROM cpu_temp
WHERE time > '2016-07-04'
AND time < '2016-07-05'
GROUP BY time(1h)
36. GROUP BY clause
InfluxDB supports GROUP BY clause with tag values,
time intervals, tag values and time intervals and
GROUP BY with fill().
37. Downsampling
InfluxDB can handle hundreds of thousands of data
points per second. Working with that much data over
a long period of time can create storage concerns. A
natural solution is to downsample the data; keep the
high precision raw data for only a limited time, and
store the lower precision, summarized data for much
longer or forever.
38. Data retention
A retention policy is the part of InfluxDB’s data
structure that describes for how long InfluxDB keeps
data and how many copies of those data are stored
in the cluster. A database can have several RPs and
RPs are unique per database.