Install Linux collector

Telegraf includes multiple input plugins to monitor and measure Linux systems. This section describes the setup of the Telegraf agent for Linux.

Install Telegraf agent

see section Telegraf Agent

Telegraf config

The newbIT TSA is deliverd with a ready to use Telegraf config file for Linux, with recommended metrics to monitor. The following plugins will be enabled:

cpu plugin

"The cpu plugin collects standard cpu metrics as defined in man proc. All architectures do not support all of these metrics. How cpu time is measured: it is the number of jiffies (1/100 of a second for x86 systems) that the system has been in user mode, user mode with low priority (nice), system mode, idle task, I/O wait, IRQ (hardirq), and softirq respectively. The IRQ (hardirq) is the direct response to a hardware event. The IRQ takes minimal work for queuing the "heavy" work up for the softirq to execute. The softirq runs at a lower priority than the IRQ and therefore may be interrupted more frequently. The total for all CPUs is given at the top, while each individual CPU is listed below with its own statistics. The cpu time values are not collected per default (only percentages). If you want these values to be added too, you need to update your telegraf configuration file and set collect_cpu_time = true in the section: [[inputs.cpu]].

Plugin tags:

  • cpu
  • number
  • cpu-total
Telegraf field Data source Unit Description
cpu_time_user /proc/stat CPU Time The cpu time spent in user mode.
cpu_time_nice /proc/stat CPU Time The cpu time spent in user mode with low priority (nice)
cpu_time_system /proc/stat CPU Time The cpu time spent in system mode.
cpu_time_idle /proc/stat CPU Time The cpu time spent in the idle task
cpu_time_iowait /proc/stat CPU Time The cpu time waiting for I/O to complete
cpu_time_irq /proc/stat CPU Time The cpu time servicing interrupts
cpu_time_softirq /proc/stat CPU Time The cpu time servicing softirqs
cpu_time_steal /proc/stat CPU Time "Stolen cpu time, which is the time spent in other operating systems when running in a virtualized environment
cpu_time_guest /proc/stat CPU Time The cpu time spent running a virtual CPU for guest operating systems under the control of the Linux kernel.
cpu_time_guest_nice /proc/stat CPU Time The cpu time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel)
cpu_usage_user /proc/stat percent Percentage of cpu time spent in user mode.
cpu_usage_system /proc/stat percent Percentage of cpu time spent in user mode with low priority (nice)
cpu_usage_idle /proc/stat percent Percentage of cpu time spent in system mode.
cpu_usage_nice /proc/stat percent Percentage of cpu time spent in the idle task
cpu_usage_iowait /proc/stat percent Percentage of cpu time waiting for I/O to complete
cpu_usage_irq /proc/stat percent Percentage of cpu time servicing interrupts
cpu_usage_softirq /proc/stat percent Percentage of cpu time servicing softirqs
cpu_usage_steal /proc/stat percent "Percentage of stolen cpu time, which is the time spent in other operating systems when running in a virtualized environment
cpu_usage_guest /proc/stat percent Percentage of cpu time spent running a virtual CPU for guest operating systems under the control of the Linux kernel.
cpu_usage_guest_nice /proc/stat percent Percentage of cpu time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel)

disk plugin

"The disk input plugin gathers metrics about disk usage. Note that usedpercent is calculated by doing used / (used + free), not used / total, which is how the unix df command does it. See [https://en.wikipedia.org/wiki/Df(Unix)](https://en.wikipedia.org/wiki/Df_(Unix)) for more details. To monitor the Docker engine host from within a container you will need to mount the host's filesystem into the container and set the HOST_PROC environment variable to the location of the /proc filesystem. If desired, you can also set the HOST_MOUNT_PREFIX environment variable to the prefix containing the /proc directory, when present this variable is stripped from the reported path tag. docker run -v /:/hostfs:ro -e HOST_MOUNT_PREFIX=/hostfs -e HOST_PROC=/hostfs/proc telegraf

Plugin tags:

  • fstype
  • device
  • path
  • mode
Telegraf field Data source Unit Description
free /bin/df bytes Filesystem space still available
total /bin/df bytes Filesystem total size
used /bin/df bytes Filesystem used space
used_percent /bin/df percent Filesystem used space in percentage
inodes_free /bin/df counter Free inodes in filesystem
inodes_total /bin/df counter Total inodes in filesystem
inodes_used /bin/df counter Used inodes in filesystem

diskio plugin

"The diskio input plugin gathers metrics about disk traffic and timing. To monitor the Docker engine host from within a container you will need to mount the host's filesystem into the container and set the HOST_PROC environment variable to the location of the /proc filesystem. Additionally, it is required to use privileged mode to provide access to /dev. If you are using the device_tags or name_templates options, you will need to bind mount /run/udev into the container. docker run --privileged -v /:/hostfs:ro -v /run/udev:/run/udev:ro -e HOST_PROC=/hostfs/proc telegraf

Plugin tags:

  • name
  • serial
Telegraf field Data source Unit Description
reads /sys/block//stat counter The total number of reads completed successfully
writes /sys/block//stat counter The total number of writes completed successfully.
read_bytes /sys/block//stat bytes The total number of bytes read successfully.
write_bytes /sys/block//stat bytes The total number of bytes written successfully.
read_time /proc/diskstats milliseconds The total number of milliseconds spent by all reads.
write_time /proc/diskstats milliseconds The total number of milliseconds spent by all writes.
io_time /sys/block//stat milliseconds The total number of milliseconds spent doing I/Os.
weighted_io_time /sys/block//stat milliseconds "The number of milliseconds spent doing I/Os. This field is incremented at each I/O start, I/O completion, I/O merge, or read of these stats by the number of I/Os in progress (field 9) times the number of milliseconds spent doing I/O since the last update of this field. This can provide an easy measure of both I/O completion time and the backlog that may be accumulating.
iops_in_progress /sys/block//stat milliseconds The number of I/Os currently in progress. Incremented as requests are given to appropriate struct request_queue and decremented as they finish.

kernel plugin

"This plugin is only available on Linux. The kernel plugin gathers info about the kernel that doesn't fit into other plugins. In general, it is the statistics available in /proc/stat that are not covered by other plugins as well as the value of /proc/sys/kernel/random/entropy_avail. The metrics are documented in man proc under the /proc/stat section. The metrics are documented in man 4 random under the /proc/stat section.

Plugin tags:

  • none
Telegraf field Data source Unit Description
boot_time /proc/stat seconds "The boot time, measured in the number of seconds since January 1, 1970, otherwise known as the epoch.
context_switches /proc/stat counter The total number of context switches across all CPUs.
disk_pages_in /proc/stat counter The number of pages the system paged in (from disk)
disk_pages_out /proc/stat counter The number of pages the system paged out (from disk).
interrupts /proc/stat counter The number of interrupts the system has experienced since boot time.
processes_forked /proc/stat counter "The number of processes and threads created, which includes (but is not limited to) those created by calls to the fork() and clone() system calls.
entropy_avail /proc/sys/kernel/random/entropy_avail counter Contains the value of available entropy (pool of random numbers used for /dev/random)

mem plugin

The mem plugin collects system memory metrics.

Plugin tags:

  • none
Telegraf field Data source Unit Description
active /proc/meminfo kilobytes The amount of memory that has been used more recently and is usually not reclaimed unless absolutely necessary.
available /proc/meminfo kilobytes "Estimation of how much memory is available for starting new applications, without swapping.
buffered /proc/meminfo kilobytes Memory used by kernel buffers (Buffers in /proc/meminfo)
cached /proc/meminfo kilobytes Memory used by the page cache and slabs (Cached and SReclaimable in /proc/meminfo)
free /proc/meminfo kilobytes Unused memory (MemFree and SwapFree in /proc/meminfo)
inactive /proc/meminfo kilobytes Memory that has not been used recently and can be swapped out.
slab /proc/meminfo kilobytes In-kernel data structures cache
total /proc/meminfo kilobytes Total installed memory (MemTotal and SwapTotal in /proc/meminfo)
used /proc/meminfo kilobytes Used memory (calculated as MemTotal - MemFree - Buffers - Cached - Slab)
available_percent /proc/meminfo percent Percentage of available memory
used_percent /proc/meminfo percent Percentage of used memory
wired /proc/meminfo kilobytes "Memory in use by the Kernel, which includes the networking stack. This memory cannot be swapped out.
commit_limit /proc/meminfo kilobytes "Based on the overcommit ratio (vm.overcommit_ratio), this is the total amount of memory currently available to be allocated on the system. This limit is only adhered to if strict overcommit accounting is enabled (mode 2 in vm.overcommit_memory).
committed_as /proc/meminfo kilobytes "The amount of memory presently allocated on the system. The committed memory is a sum of all of the memory which has been allocated by processes, even if it has not been "used" by them as of yet.
dirty /proc/meminfo kilobytes Memory waiting to be written back to disk
high_free /proc/meminfo kilobytes x86_32bit related values RHEL5 and lower
high_total /proc/meminfo kilobytes x86_32bit related values RHEL5 and lower
huge_page_size /proc/meminfo counter The size of a hugepage (usually 2MB on an Intel based system)
huge_pages_free /proc/meminfo counter The number of hugepages not being allocated by a process
huge_pages_total /proc/meminfo counter Number of hugepages being allocated by the kernel (Defined with vm.nr_hugepages)
low_free /proc/meminfo kilobytes x86_32bit related values RHEL5 and lower
low_total /proc/meminfo kilobytes x86_32bit related values RHEL5 and lower
mapped /proc/meminfo kilobytes "Files which have been mmaped, such as libraries
page_tables /proc/meminfo kilobytes Amount of memory dedicated to the lowest level of page tables. This can increase to a high value if a lot of processes are attached to the same shared memory segment.
shared /proc/meminfo kilobytes Total used shared memory
swap_cached /proc/meminfo kilobytes "Memory that is present within main memory, but also in the swapfile.
swap_free /proc/meminfo kilobytes The remaining swap space available.
swap_total /proc/meminfo kilobytes Total swap space available.
vmalloc_chunk /proc/meminfo kilobytes The largest contiguous block of vmalloc area which is free.
vmalloc_total /proc/meminfo kilobytes The total size of vmalloc memory area.
vmalloc_used /proc/meminfo kilobytes The amount of vmalloc area which is used.
write_back /proc/meminfo kilobytes Memory which is actively being written back to disk.
write_back_tmp /proc/meminfo kilobytes Memory used by FUSE for temporary writeback buffers.

processes plugin

"This plugin gathers info about the total number of processes and groups them by status (zombie, sleeping, running, etc.) On linux this plugin requires access to procfs (/proc), on other OSes it requires access to execute ps. Another possible configuration is to define an alternative path for resolving the /proc location. Using the environment variable HOST_PROC the plugin will retrieve process information from the specified location. docker run -v /proc:/rootfs/proc:ro -e HOST_PROC=/rootfs/proc

Plugin tags:

  • none
Telegraf field Data source Unit Description
blocked /bin/ps -eo state counter "The sum of processes having state = "D
running /bin/ps -eo state counter "The sum of processes having state = "R
sleeping /bin/ps -eo state counter "The sum of processes having state = "S
stopped /bin/ps -eo state counter "The sum of processes having state = "T" or "t
total /bin/ps -eo state counter The total number of processes.
zombie /bin/ps -eo state counter "The sum of processes having state = "Z
dead /bin/ps -eo state counter "The sum of processes having state = "X
wait /bin/ps -eo state counter "The sum of processes having state = "W
idle /bin/ps -eo state counter "The sum of processes having state = "I
paging /bin/ps -eo state counter "The sum of processes having state = "W
total_threads /bin/ps -eo nlwp counter The total number of threads

swap plugin

"The swap plugin collects system swap metrics. For more information on what swap memory is, read All about Linux swap space.

Plugin tags:

  • none
Telegraf field Data source Unit Description
free sysinfo() bytes Swap space still available
in /proc/vmstat bytes Number of bytes the system has swapped in from disk per second.
out /proc/vmstat bytes Number of bytes the system has swapped out to disk per second.
total sysinfo() bytes Total swap space size
used N/A bytes Swap spaced used
used_percent N/A percent Swap space used in percentage

system plugin

"The system plugin gathers general stats on system load, uptime, and number of users logged in. It is similar to the unix uptime command.

Plugin tags:

  • none
Telegraf field Data source Unit Description
load1 /proc/loadavg counter Cpu load is a measurement of cpu over or under-utilization in a Linux system; the number of processes which are being executed by the cpu or in waiting state (averages 1 minute).
load5 /proc/loadavg counter Cpu load is a measurement of cpu over or under-utilization in a Linux system; the number of processes which are being executed by the cpu or in waiting state (averages 1 minute).
load15 /proc/loadavg counter Cpu load is a measurement of cpu over or under-utilization in a Linux system; the number of processes which are being executed by the cpu or in waiting state (averages 1 minute).
n_users last -f /var/run/utmp counter The total number of users.
n_cpus golang runtime package counter The total number of cpu's
uptime /proc/stat seconds Uptime of the system

Import Grafana Linux dashboard

Import the newbIT Grafana dashboard for Linux.

Filename: linux_dashboard.json

Additional Information:w!

Please read the following documentation for detailed information about the used Telegraf plugins:

results matching ""

    No results matching ""