toni-moreno / oracle_collector

Tool for gather metrics from any oracle instance installed in any Server as execd plugin for telegraf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Oracle Collector

Oracle Collector is an Open Source tool to get oracle metrics from any oracle instance running in any compatible ( Linux ) server, it runs as an execd plugin for telegraf

Install from precompiled packages

All releases here.

releases

Building and Run from master

If you want to build a package yourself, or contribute. Here is a guide for how to do that.

Dependencies

  • Go 1.19

Get Code

go get -d github.com/toni-moreno/oracle_collector/...

Building the backend

cd $GOPATH/src/github.com/toni-moreno/oracle_collector
go run build.go build           

Creating minimal package tar.gz

After building frontend and backend you will do

go run build.go pkg-min-tar

Creating and running docker image

make -f Makefile.docker
docker run tonimoreno/oracle_collector:latest -version
docker run  tonimoreno/oracle_collector:latest -h
docker run  -p 4090:4090 -v /mylocal/conf:/opt/oracle_collector/conf -v /mylocal/log:/opt/oracle_collector/log tonimoreno/oracle_collector:latest [options]

Recompile backend on source change (only for developers)

To rebuild on source change (requires that you executed godep restore)

go install github.com/unknwon/bra@latest
bra run  

Running first time ( outside telegraf )

You will need to set up oracle client environment variables LD_LIBRARY_PATH and ORACLE_HOME to run the collector.

export LD_LIBRARY_PATH=/opt/oracle/product/21c/dbhomeXE/lib/
export ORACLE_HOME=/opt/oracle/product/21c/dbhomeXE

Create a connection user.

You will need a monitoring user with proper grants to query all needed info.

Use this for Oracle >= 12.1

sqlplus "/ as sysdba" @./conf/recreate_user_C##MONIT.sql

Use this for Oracle < 12.1

sqlplus "/ as sysdba" @./conf/recreate_user_C##MONIT_legacy.sql

To execute without any configuration you need a minimal oracle_collector.toml file on the conf directory.

cp conf/sample.oracle_collector.toml conf/oracle_collector.toml
./bin/oracle_collector [options]

Running as Telegraf plugin.

Oracle collector will run as telegraf execd plugin you can use the sample in the conf dir. Telegraf will be executed as root user, so you will need to setup oracle client environment variables in the execd config file.

cp conf/telegraf-execd-example.conf /etc/telegraf.d/oracle_collector.conf
systemctl restart telegraf.service

Basic Usage

$ ./bin/oracle_collector -h
Usage of ./bin/oracle_collector:
   -config: config file
   -logdir: log directory where to create all log files
  -pidfile: path to pid file 
  -version: display the version

Gathered Info.

Oracle collector gathers 2 kind of different informations.

  1. Oracle Monitored/Discovered instances
  2. Internat stats.

Oracle Monitored Measurements/Metrics

The agent is capable to get metrics organized and sent as ILP ( InfluDB Line Protocol) measurements. It takes 2 non configurable measurements. And a lot of other configurable measurements in the metric section of the config file[[oracle-monitor.mgroup.metric]].

Non configurable measurements.

These measurements will be sent always even if any metric [[oracle-monitor.mgroup.metric]] and any metric group [[oracle-monitor.mgroup]] exiting in the config file.

oracle_status

Get info from v$instance and [v$database] and [v$pdbs]

  select
    INSTANCE_NUMBER,
    INSTANCE_NAME,
    HOST_NAME,
    -- for version <v18c (12.2.0.2)
    VERSION
    -- only for verion > v18c (12.2.0.2)
    VERSION_FULL AS VERSION,
    STARTUP_TIME,
    FLOOR((SYSDATE - STARTUP_TIME) * 60 * 60 * 24) as UPTIME,
    STATUS,
    DATABASE_STATUS,
    INSTANCE_ROLE,
    ACTIVE_STATE,
    BLOCKED,
    SHUTDOWN_PENDING,
    ARCHIVER
  from V$INSTANCE

And

  select
    DBID,
    NAME,
    CREATED,
    DB_UNIQUE_NAME,
    OPEN_MODE,
    DATABASE_ROLE,
    FORCE_LOGGING,
    LOG_MODE
  from v$database`
  • tags
    • all extra_labels from the [oracle-discovery] config section
    • all [global_tags] configured in the parent telegraf config
    • db: name o the DB
    • db_unique_name: de unique name for the DB
    • instance: name of the instance
    • instance_role: Indicates whether the instance is an active instance or an inactive secondary instance.

How much info it sends to the backend depens on the oracle_status_extended_info flag on the [oracle-discovery] section:

  • fields
    • From system process
      • proc_ok (boolean): True when process (proc_pid) is ok, false first time when detected is down.
      • proc_pid (integer): PID from the Discovered PMON process
    • From v$instance view:
      • inst_number (integer):
      • inst_status (string):
      • inst_uptime_sec (integer):
      • inst_version (string):
      • inst_active_state (string): (extended only)
      • inst_startup_time: (string): (extended only)
      • inst_bloqued (string): (extended only)
      • inst_db_status (status): (extended only)
      • inst_shutdown_pending (string): (extended only)
      • inst_archiver (string): (extended only)
    • from v$database view:
      • db_role (string):
      • db_open_mode (string):
      • db_created (string): (extended only)
      • db_log_mode (string): (extended only)
      • db_force_logging (string): (extended only)
    • from v$pdbs view:
      • pdbs_total (integer):
      • pdbs_active (integer):

oracle_pdb_status

Get info from v$pdbs for Oracle version > 12.1 each "discovery period" with the following query.

for Oracle < 12.1.0.2

select 
  CON_ID,NAME,OPEN_MODE,RESTRICTED,TOTAL_SIZE,
from v$pdbs

for Oracle >= 12.1.0.2

select 
  CON_ID,NAME,OPEN_MODE,RESTRICTED,RECOVERY_STATUS,TOTAL_SIZE,BLOCK_SIZE
from v$pdbs
  • tags

    • all extra_labels from the [oracle-discovery] config section
    • all [global_tags] configured in the parent telegraf config
    • db: name o the DB
    • db_unique_name: de unique name for the DB
    • instance: name of the instance
    • instance_role: Indicates whether the instance is an active instance or an inactive secondary instance.
    • pdb_name: the name of the pdb
  • fields

    • open_mode (string)
    • restricted (string)
    • recovery_status (string)
    • total_size (integer)
    • block_size (integer)

Configurable measurements.

On each [oracle-monitor.mgroup.metric] section you can define measurment name,tags,and fiends as follows.

[[oracle-monitor.mgroup.metric]]
# Resource
id = "resource_query_XXXX"
context = "resource"
labels = [ "resource_name" ]
metrics_desc = { current_utilization= "Generic counter metric from v$resource_limit view in Oracle (current value).", limit_value="Generic counter metric from v$resource_limit view in Oracle (UNLIMITED: -1)." }
metrics_type = { current_utilization='integer',limit_value='integer',used_pct='float'}
request='''
SELECT 
    resource_name,
    current_utilization,
    CASE WHEN TRIM(limit_value) LIKE 'UNLIMITED' THEN '-1' ELSE TRIM(limit_value) END as limit_value,
    CASE 
        WHEN TRIM(limit_value) LIKE 'UNLIMITED' THEN 0 
        WHEN TRIM(limit_value) LIKE '0' THEN 0
        ELSE ROUND(((current_utilization*100)/limit_value),3)
    END as USED_PCT
FROM v$resource_limit
'''
  • measurement_name: will be set with the value in the context struct field
  • tags: will be set with all common tags , extra_labels from the [oracle-discovery] config section and [global_tags] configured in the parent telegraf config appended by the labels struct list. ( each label should be a column in the response query)
  • fields: will be set and type trasnformed if needed by the definition in the in metrics_type struct field

Tags and fields will taken from the resulted query in field request

In the above example:

measurement: "resource"

  • tags
    • all extra_labels from the [oracle-discovery] config section
    • all [global_tags] configured in the parent telegraf config
    • resource_name
  • fields
    • current_utilization (integer):
    • limit_value (integer)
    • used_pct(float)

Internal Statistics.

Oracle collector gathers also some internal processes informattion. The name of its measurements depens on the measurement_prefix parameter from the [self-monitor] config section.

These are the collected measurements:

runtime_gvm_stats

Gathers information about the Go Virtual Machine runtime stats. ( Garbage collection, goroutines and memory) Refer to MemStats to get detailed info on memory stats.

  • tags
    • all extra_labels from the [self-monitor] config
    • all [global_tags] configured in the parent telegraf config
  • fields
    • runtime_goroutines: returns the number of goroutines that currently exist.
    • mem.mallocs: is the cumulative count of heap objects allocated.
    • mem.alloc: is bytes of allocated heap objects.
    • mem.frees: is the cumulative count of heap objects freed.
    • mem.sys: is the total bytes of memory obtained from the OS.
    • mem.heap_alloc_bytes: is bytes of allocated heap objects.
    • mem.heap_sys_bytes: HeapSys is bytes of heap memory obtained from the OS.
    • mem.heap_idle_bytes: HeapIdle is bytes in idle (unused) spans.
    • mem.heap_in_use_bytes: HeapInuse is bytes in in-use spans.
    • mem.heap_released_bytes: HeapReleased is bytes of physical memory returned to the OS.
    • mem.heap_objects: HeapObjects is the number of allocated heap objects.
    • mem.stack_in_use_bytes: is bytes in stack spans.
    • mem.m_span_in_use_bytes: MSpanInuse is bytes of allocated mspan structures.
    • mem.m_cache_in_use_bytes: MCacheInuse is bytes of allocated mcache structures.
    • gc.total_pause_ns: is the cumulative nanoseconds in GC stop-the-world pauses since the program started.
    • gc.pause_per_interval: pause on each gathering request_period interval.
    • gc.pause_per_second: pause acummulated per second.
    • gc.gc_per_inteval: number of gc's per request_periodinterval.
    • gc.gc_per_second: number if gc's per second.

discover_stats

Gathers information about the oracle instance discovery process.

  • tags
    • all extra_labels from the [self-monitor] config
    • all [global_tags] configured in the parent telegraf config
  • fields
    • discovered_all/all: (discovered_all) number of discovered processes with the oracle_discovery_sid_regex process pattern.
    • discovered_new/new: (discovered_new) number of new instances since last discovery process that is not connected
    • discovred_current/current: (discovered_current) number of currently discovered and also connected oracle instances
    • undiscovered: number of instances which has been undiscovered since the last discovery process
    • connect_errors: number of oracle instances with errors in the connecting process.
    • connect_errors_skipped: number of oracle instances with errors but skipped from the error list by the regexp rules in the oracle_discovery_skip_errors_regex parameter.
    • *undiscovered_sid_names: list of SID names which has beed undetected since the last discovery process. ( separeted by ":")
    • discovered_sid_names list of SID names detected and currently trying to connect (maybe with errors or not) (separated by ":).
    • errconnect_sid_names list of SID names with connetion errors (separated by ":")
    • errconnect_skipped_sid_names list of SID with connection errors but skipped from the error list by the regexp rules in the oracle_discovery_skip_errors_regex parameter

collect_stats

This measurement is needed to know how long collector takes on each query on each database and how many results (metrics) are we gathering in each query.

  • tags
    • all extra_labels from the [self-monitor] config
    • all [global_tags] configured in the parent telegraf config
    • db: the database where collecting metrics
    • db_unique_name: the unique name for the database
    • instance: the instance where connected
    • instance_role: the role of the instance.
    • metric_group: the name of the metric group where is configured
    • metric_context: the context of the metric group where is configured ( context parameter)
    • metric_id: the uniq id of the metric ( id parameter or contextif id not configured)
  • fields
    • num_metrics: num of collected metrics from this query metric.
    • duration_us: duration of the query in microseconds

sql_driver_stats

Gather information on each collector to each DB instance connection with these sql generic stats

  • tags
    • all extra_labels from the [self-monitor] config
    • all [global_tags] configured in the parent telegraf config
    • instance: the instance name
  • fields
    • idle_conn: The number of idle connections.
    • inuse_conn: The number of connections currently in use.
    • max_idle_closed: The total number of connections closed due to SetMaxIdleConns.
    • max_idle_time_closed: The total number of connections closed due to SetConnMaxIdleTime.
    • max_open_connections:
    • open_connections: The number of established connections both in use and idle.
    • wait_count: The total number of connections waited for.
    • wait_duration_ms: The total time blocked waiting for a new connection.

About

Tool for gather metrics from any oracle instance installed in any Server as execd plugin for telegraf

License:MIT License


Languages

Language:Go 86.7%Language:Shell 12.7%Language:Dockerfile 0.6%