turbot / steampipe-plugin-turbot

Use SQL to instantly query the Turbot CMDB. Open source CLI. No DB required.

Home Page:https://hub.steampipe.io/plugins/turbot/turbot

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Autogenerate tables based on graphQL queries in local files

Joeturbot opened this issue · comments

Is your feature request related to a problem? Please describe.
The existing turbot_* tables are insufficient to cover the breadth of the Turbot API. Further, queries like resource_control_policy.graphql are easy to do in raw GraphQL but painful to do in Steampipe.

There is also considerable difficulty for users new to Turbot and to Steampipe with grokking where the filtering happens, whether Turbot-side or Steampipe-side. This difficulty can lead to long running queries when a user specifies Steampipe-side filtering then unintentionally pulls over hundreds of thousands of rows from Turbot. I believe/hope that if the user specifies the GraphQL themselves, it will be clear to them where the filtering/joining is happening.

Describe the solution you'd like
Specify a path to a file or directory on the local file system that contains GraphQL files. On launch, Steampipe will parse these query files then autogenerate tables and columns to match these queries. The overal operation would be similar to how dynamic tables are created for the CSV, Terraform and GoogleSheets plugins.

Describe alternatives you've considered
There aren't any really, other than to write the GraphQL queries into a general purpose programming language.

Additional context
The Turbot GraphQL API is incredibly rich. The current approach to statically defined tables removes some of that richness and imposes additional development load on Turbot plugin developers to implement each new table.

I think this means we will need the queries themselves and the Turbot SDL (which is not yet publically available). The query itself prevents halting problems on the SDL. The SDL provides type inference which the query does not contain.

This is an example query

query ResourceControlPolicyQuery {
  resources(paging: "", filter: "resourceTypeId:'tmod:@turbot/aws#/resource/types/aws'") {
    items {
      data
      controls(filter: "state:alarm,error,ok") {
        items {
          reason
          state
          turbot {
            dependencies {
              policyValues {
                items {
                  state
                  value
                  type {
                    modUri
                    description
                    categoryUri
                    input
                  }
                }
              }
            }
          }
        }
      }
    }
    metadata {
      stats {
        total
      }
    }
  }
}

This is sample output from a query:

{
  "data": {
    "resources": {
      "items": [
        {
          "data": {
            "Id": "934217203932",
            "Arn": "arn:aws:organizations::676944191433:account/o-c3a5y4wd52/934217203932",
            "Name": "000 @ Turbot",
            "Tags": [
              {
                "Key": "Name",
                "Value": "Nathan Dev Env"
              }
            ],
            "Email": "aws+000@turbothq.com",
            "Status": "ACTIVE",
            "JoinedMethod": "INVITED",
            "JoinedTimestamp": "2014-10-29T16:10:54.000Z"
          },
          "controls": {
          ...
         }
   etc

The goal would be to configure Steampipe to send that query to Turbot and present the result as one (or more) tables?

Correct. My hope is that GraphQL Query + Turbot SDL = Steampipe table with proper type definitions on each column. Altering the queries and relaunching Steampipe would "recompile" the table definition to match results of the query.

Dunno what SDL looks like, or means in this context, but anyway I'm still trying to understand the client- vs server-side filtering issue.

Today in the course of testing AWS auth in Steampipe Cloud I used the CloudFormation method to provision a role for cross-account trust, and have been poking around in turbot.com and steampipe-plugin-turbot to see what's what.

Here's a GraphQL query that immediately finds the role I just created.

graphql query

query Resource {
  resource(id: "arn:aws:iam::899206412154:role/steampipe_cloud") {
    data
    metadata
    trunk {
      title
    }
    turbot {
      akas
      id
      tags
    }
  }
}

graphql output

{
  "data": {
    "resource": {
      "data": {
        "Arn": "arn:aws:iam::899206412154:role/steampipe_cloud",
        "Path": "/",
        "Tags": [
          {
            "Key": "vaec:CKID",
            "Value": "abc125"
          },
          {
            "Key": "vaec:VAECID",
            "Value": "xyz456"
          }
        ],
        "RoleId": "AROA5CXHA655BRJNCS2SR",
        "Policies": [],
        "RoleName": "steampipe_cloud",
        "CreateDate": "2021-12-28T18:14:59.000Z",
        "Description": "",
        "RoleLastUsed": {},
        "AttachedPolicies": [
          {
            "PolicyArn": "arn:aws:iam::aws:policy/ReadOnlyAccess",
            "PolicyName": "ReadOnlyAccess"
          }
        ],
        "MaxSessionDuration": 28800,
        "AssumeRolePolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Action": "sts:AssumeRole",
              "Effect": "Allow",
              "Condition": {
                "StringEquals": {
                  "sts:ExternalId": "u_c6fcejhe4mvehj43v7a0:vv8hdzr9"
                }
              },
              "Principal": {
                "AWS": "arn:aws:iam::316881668097:root"
              }
            }
          ]
        }
      },
      "metadata": {
        "aws": {
          "accountId": "899206412154",
          "partition": "aws"
        },
        "createTimestamp": "2021-12-28T18:14:59.000Z"
      },
      "trunk": {
        "title": "Turbot > Dunder Mifflin > AWS > 899206412154 > steampipe_cloud"
      },
      "turbot": {
        "akas": [
          "arn:aws:iam::899206412154:role/steampipe_cloud"
        ],
        "id": "244412933699404",
        "tags": {
          "vaec:CKID": "abc125",
          "vaec:VAECID": "xyz456"
        }
      }
    }
  }
}

steampipe query

Here's an analogous steampipe query that finds it very slowly.

select
  *
from
  turbot_resource
where
  data ->> 'Arn' = 'arn:aws:iam::899206412154:role/steampipe_cloud'

And here's one that finds it immediately.

select
  *
from
  turbot_resource
where
  resource_type_uri = 'tmod:@turbot/aws-iam#/resource/types/role'
  and data ->> 'Arn' = 'arn:aws:iam::899206412154:role/steampipe_cloud'

steampipe schema

> .inspect turbot_resource
+-------------------+--------------------------+--------------------------------------------------
| column            | type                     | description                                      
+-------------------+--------------------------+--------------------------------------------------
| akas              | jsonb                    | AKA (also known as) identifiers for the resource.
| create_timestamp  | timestamp with time zone | When the resource was first discovered by Turbot.
| data              | jsonb                    | Resource data.                                   
| filter            | text                     | Filter used for this resource list.              
| id                | bigint                   | Unique identifier of the resource.               
| metadata          | jsonb                    | Resource custom metadata.                        
| parent_id         | bigint                   | ID for the parent of this resource. For the Turbo
| path              | jsonb                    | Hierarchy path with all identifiers of ancestors 
| resource_type_id  | bigint                   | ID of the resource type for this resource.       
| resource_type_uri | text                     | URI of the resource type for this resource.      
| tags              | jsonb                    | Tags for the resource.                           
| timestamp         | timestamp with time zone | Timestamp when the resource was last modified (cr
| title             | text                     | Title of the resource.                           
| trunk_title       | text                     | Title with full path of the resource.            
| update_timestamp  | timestamp with time zone | When the resource was last updated in Turbot.    
| version_id        | bigint                   | Unique identifier for this version of the resourc
| workspace         | text                     | Specifies the workspace URL.                     
+-------------------+--------------------------+--------------------------------------------------

steampipe json output

> .output json


[
 {
  "akas": [
   "arn:aws:iam::899206412154:role/steampipe_cloud"        
  ],
  "create_timestamp": "2021-12-28T10:15:05-08:00",
  "data": {
   "Arn": "arn:aws:iam::899206412154:role/steampipe_cloud",
   "AssumeRolePolicyDocument": {
    "Statement": [
     {
      "Action": "sts:AssumeRole",
      "Condition": {
       "StringEquals": {
        "sts:ExternalId": "u_c6fcejhe4mvehj43v7a0:vv8hdzr9"
       }
      },
      "Effect": "Allow",
      "Principal": {
       "AWS": "arn:aws:iam::316881668097:root"
      }
     }
    ],
    "Version": "2012-10-17"
   },
   "AttachedPolicies": [
    {
     "PolicyArn": "arn:aws:iam::aws:policy/ReadOnlyAccess",
     "PolicyName": "ReadOnlyAccess"
    }
   ],
   "CreateDate": "2021-12-28T18:14:59.000Z",
   "Description": "",
   "MaxSessionDuration": 28800,
   "Path": "/",
   "Policies": [],
   "RoleId": "AROA5CXHA655BRJNCS2SR",
   "RoleLastUsed": {},
   "RoleName": "steampipe_cloud",
   "Tags": [
    {
     "Key": "vaec:CKID",
     "Value": "abc125"
    },
    {
     "Key": "vaec:VAECID",
     "Value": "xyz456"
    }
   ]
  },
  "filter": null,
  "id": 244412933699404,
  "metadata": {
   "aws": {
    "accountId": "899206412154",
    "partition": "aws"
   },
   "createTimestamp": "2021-12-28T18:14:59.000Z"
  },
  "parent_id": 208149752063137,
  "path": [
   162618828978177,
   197293160400056,
   197293169049322,
   208149752063137,
   244412933699404
  ],
  "resource_type_id": 162661932236396,
  "resource_type_uri": "tmod:@turbot/aws-iam#/resource/types/role",
  "tags": {
   "vaec:CKID": "abc125",
   "vaec:VAECID": "xyz456"
  },
  "timestamp": "2021-12-28T10:15:05-08:00",
  "title": "steampipe_cloud",
  "trunk_title": "Turbot > Dunder Mifflin > AWS > 899206412154 > steampipe_cloud",
  "update_timestamp": "2021-12-28T10:41:53-08:00",
  "version_id": 244414580090703,
  "workspace": "https://demo-turbot.cloud.turbot-dev.com"
 }
]

questions

Is this an example of a query that's a no-brainer in graphql but requires a bit more know-how to run efficiently using steampipe-plugin-turbot?

How/why would the schema for a table derived from the graphql output differ from the one defined by the plugin?

Is this an example of a query that's a no-brainer in graphql but requires a bit more know-how to run efficiently using steampipe-plugin-turbot?
I think it might be. Here's the underlying GraphQL query for your slow query:

Slow Query

select
  *
from
  turbot_resource
where
  data ->> 'Arn' = 'arn:aws:iam::899206412154:role/steampipe_cloud'

the underlying GraphQL query

query SteampipeSide {
  resources(filter: "") {
    metadata {
      stats {
        total
      }
    }
  }
}

In Demo, this returned 32878 results.

Fast Query

select
  *
from
  turbot_resource
where
  resource_type_uri = 'tmod:@turbot/aws-iam#/resource/types/role'
  and data ->> 'Arn' = 'arn:aws:iam::899206412154:role/steampipe_cloud'

Underlying GraphQL query:

query TurbotSide{
  resources(filter:"resourceTypeId:tmod:@turbot/aws-iam#/resource/types/role") {
    metadata{
      stats{
        total
      }
    }
  }
}

In Demo, this returns a mere 867 "rows". This is only 2% of the total resources. No surprise to me that the quick query is well, quick.

How/why would the schema for a table derived from the graphql output differ from the one defined by the plugin?

By "...the one defined by the plugin", I believe you mean the default turbot_* tables that are already implemented. I'm going to refer to these as the "static tables" hereafter. For simple queries that only query one of resources, policy_settings, or controls, I don't expect the difference between the dynamic and static tables to be that big. However, for a query that hits two or more, then the Steampipe table schema will be very different. The resource_controls_policies query is a great example.

query ResourceControlPolicyQuery {
  resources(paging: "", filter: "resourceTypeId:'tmod:@turbot/aws#/resource/types/aws'") {
    items {
      data
      controls(filter: "state:alarm,error,ok") {
        items {
          reason
          state
          turbot {
            dependencies {
              policyValues {
                items {
                  state
                  value
                  type {
                    modUri
                    description
                    categoryUri
                    input
                  }
                }
              }
            }
          }
        }
      }
    }
    metadata {
      stats {
        total
      }
    }
  }
}
}}}}

An incomplete and rough DDL for this query might look like

create table resources_controls_policies (
   resource_data json,
   control_reason string,
   control_state string,
   policyvalues_state string,
   policyvalues_value string,
   policyvalues_state string,
   policyvalues_type json,   
   metadata_stats json
)

A rough (and not executable) Steampipe query to get the same thing using only static tables would look like:

with resources as (
  select id as resource_id, data as resource_data 
  from turbot_resource
),
controls as (
  select id as control_id, reason as control_reason, state as control_state, resource_id 
  from turbot_control
),
policy_values as (
  --the policy_values table doesn't exist yet so these values are pure speculation
  select id as policy_id, state as policy_state, value as policy_value, control_id
  from turbot_policy_values
)
select
   r.resource_data,
   c.control_state,
   p.policy_value
from resources r 
left join controls c on r.resource_id = c.resource_id
left join policy_values as p on p.control_id = c.control_id
where filter = 'resourceTypeId:"tmod:@turbot/aws-iam#/resource/types/role"'

Improved steampipe query:

with resources as (
  select id as resource_id, data as resource_data 
  from turbot_resource
  where filter = 'resourceTypeId:"tmod:@turbot/aws-iam#/resource/types/role"'
),
controls as (
  select id as control_id, reason as control_reason, state as control_state, resource_id 
  from turbot_control
  where filter = 'resourceTypeId:"tmod:@turbot/aws-iam#/resource/types/role" state:alarm'
),
policy_values as (
  --the policy_values table doesn't exist yet so these values are pure speculation
  select id as policy_id, state as policy_state, value as policy_value, control_id
  from turbot_policy_values
)
select
   r.resource_data,
   c.control_state,
   p.policy_value
from resources r 
left join controls c on r.resource_id = c.resource_id
left join policy_values as p on p.control_id = c.control_id

This still suffers (some) from the "Transfer the World" problem as Steampipe has to get all the IAM roles and all the controls relevant to IAM roles then the policy values. Although, as I think about it, if someone filtered for all AWS resources, then you still have transfer the world. The only difference would be the relative performance of your local workstation vs the Turbot DB. The Turbot DB is almost always going to be far more powerful/faster than your local workstation at running queries.

Closing. This is a cool idea but outside the scope of development.