ameetmk / curated-system-tools

A list of tools and commands to know what's going on in the system

Home Page:https://github.com/chtefi/curated-system-tools

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A list of tools to help debugging issues or simply check what's going on in the system.

Classic

Linux is assumed. On OSX, options can be way different.

  • top cw : something's taking up all the cpu or mem ?
  • htop : a colorful top, easy to play with
  • ps fauxww : list of all processes with command line + hierarchy
  • free -h : memory and swap
  • df -h : mount points
  • iptables -L -v : firewall rules
  • dmesg -T: kernel messages. Can be fulfilled of iptables denied message :-) or other useful stuff to check in case of problems
  • env: list the environment variables
  • uptime: checkout 1min/5min/15min load average
  • strace: trace system calls and signals a program does (file open, read, stat, mmap, ...). strace -e open uptime 2>&1

System resources

List of tools used to look after system performances (mem, cpu, disks, network, processes, files..) :

  • sysdig : a console ui to monitor (live and snapshots) several aspects of the system sudo sysdig 'proc.name=java' -w ~/sysdig.scap
  • iostat : i/o accesses iostat -m -x -d 2
  • vmstat : mem/swap/cpu vmstat 1
  • mpstat : check the stats for each cores, useful to spot single-threaded apps (if unbalanced) mpstat -P ALL 1
  • ifstat : like iostat, vmstat, but for network interfaces
  • netstat : details about all the network connections of the system netstat -putel. netstat -anr
  • ss : a bit like netstat, list all sockets (tcp/udp), their state ss -ta (TCP, all)
  • dstat : *stat all-in-one
  • sar : monitor network, devices sar -n DEV 2 All commands in a nice pic: http://www.brendangregg.com/Perf/linux_observability_sar.png
  • iotop : top, with i/o !
  • iperf : test maximum bandwidth (tcp/udp) iperf -c server -f m -d
  • ulimit: memory, open files, and misc size limits for the user (often, the open file limit must be raised if the server contains hot apps) ulimit -n 2000000 (open file descriptors)

Another repo with great scripts using ftrace under the hood: https://github.com/brendangregg/perf-tools

Network

  • dig: query dns servers dig +short github.com dig +nocmd github.com any +multiline +noall +answer
  • traceroute: find the way to any host. This website is nice to test from multiple locations around the world: http://mtr.guru/
  • host: resolve dns/ip host -t ANY github.com
  • nmap: The famous tool to know which ports are opened: nmap -sT -vv -p 1-65535 [ip]
  • ngrep: a simple tcpdump with grep features! can listen to specific or all interfaces, given port, and match patterns.
$ ngrep -d any "Value" port 2003
interface: any
filter: (ip or ip6) and ( port 2003 )
match: Value
####
T 172.17.0.1:54820 -> 172.17.0.2:2003 [AP]
  com.ctheu.test.Value 42 1486331086.

System devices

  • hdparm : check drives settings hdparm -t /dev/sda8
  • ethtool : check the ethernet cards settings (speed, duplex etc. if you have a doubt) ethtool eth0

Topology

  • lstopo: a wonderful tool to draw the topology of the server (show cpus, their caches, the physical sockets, the memory) into a nice big picture lstopo --output-format txt -v

Performance

A tons of good links and presentations here: http://www.brendangregg.com/linuxperf.html.

Java specifics

  • jstat : like iostat, vmstat, for java processes jstat -gc -t -h30 [vmid] 1s : monitor Java GC
  • jvisualvm : packaged with java, ultra useful
  • jmc : Java Mission Control. A better jvisualvm

System tuning

  • /proc/sys/vm/vfs_cache_pressure
  • /proc/sys/vm/swappiness
  • /proc/sys/vm/zone_reclaim_mode (Disable NUMA)

Misc info

  • cat /proc/cpuinfo : list of cpus of the system with details (type, MHz, cache size..)
  • lscpu : shorter
  • /proc/sys/fs/nr_open: hard limit of the current number of file handles the kernel can handle
  • /proc/sys/fs/file-max: current number of file handles the kernel can handle
  • /proc/sys/fs/file-nr: file handles currently opened/used file handles/the max (= file-max)
  • /proc/sys/vm/nr_hugepages: map huge memory pages (if using Java with a big heap, set also +UseLargePages)

sysctl can be used to change the values: sysctl -w fs.file-max=786046. Or /etc/sysctl.conf.

Network tuning

Flags I grab here and there, not optimal or anything, just to know they exist.

  • net.ipv4.tcp_slow_start_after_idle = 0
  • net.core.netdev_max_backlog = 5000
  • net.ipv4.tcp_no_metrics_save = 1
  • net.ipv4.tcp_sack = 1
  • net.ipv4.tcp_timestamps = 1
  • net.ipv4.tcp_window_scaling = 1
  • net.core.wmem_max = 12582912
  • net.core.rmem_max = 12582912
  • net.ipv4.tcp_rmem = 10240 87380 12582912 (tcp receive buffer thresholds)
  • net.ipv4.tcp_wmem = 10240 87380 12582912 (tcp sendbuffer buffer thresholds)
  • net.ipv4.tcp_mem = 10000000 10000000 10000000 (tcp memory autotuning, define low/middle/max thresholds)

https://wwwx.cs.unc.edu/~sparkst/howto/network_tuning.php

  • nf_conntrack can be very important too
sysctl -w fs.file-max="9999999"
sysctl -w fs.nr_open="9999999"
sysctl -w net.core.netdev_max_backlog="4096"
sysctl -w net.core.rmem_max="16777216"
sysctl -w net.core.somaxconn="65535"
sysctl -w net.core.wmem_max="16777216"
sysctl -w net.ipv4.ip_local_port_range="1025       65535"
sysctl -w net.ipv4.tcp_fin_timeout="30"
sysctl -w net.ipv4.tcp_keepalive_time="30"
sysctl -w net.ipv4.tcp_max_syn_backlog="20480"
sysctl -w net.ipv4.tcp_max_tw_buckets="400000"
sysctl -w net.ipv4.tcp_no_metrics_save="1"
sysctl -w net.ipv4.tcp_syn_retries="2"
sysctl -w net.ipv4.tcp_synack_retries="2"
sysctl -w net.ipv4.tcp_tw_recycle="1"
sysctl -w net.ipv4.tcp_tw_reuse="1"
sysctl -w vm.min_free_kbytes="65536"
sysctl -w vm.overcommit_memory="1"
ulimit -n 9999999

To do some testing, it's possible to alter the quality of the network traffic:

  • tc qdisc add dev wlan0 root netem loss 10%
  • tc qdisc add dev eth0 root netem delay 80ms 15ms distribution normal

Resources

About

A list of tools and commands to know what's going on in the system

https://github.com/chtefi/curated-system-tools