sad-spirit / pg-builder

Query builder for Postgres backed by SQL parser

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Extract relation names from parsed SQL string

mapcentia opened this issue · comments

I'm would like to extract all relation names, which are included in a query (both select, update, insert and delete). The statements are arbitrary. I can get the FromList, but I'm not sure from there to get the relation names. In this case I'm after bar and bup:

use sad_spirit\pg_builder\Lexer;
use sad_spirit\pg_builder\Parser;

$strings = "SELECT *, (select foo from bar limit 1) as bib FROM bup";

$lexerStandard = new Lexer(array(
    'standard_conforming_strings' => true
));

$parser = new Parser($lexerStandard);
$test = $parser->parseSelectStatement($strings);

var_dump($test->from);

This is actually covered in the docs: https://github.com/sad-spirit/pg-builder/wiki/walkers

You need to subclass BlankWalker and override walkRelationReference() method. When you do

$test->dispatch($walker);

that method will be called for every relation name appearing in the query.

Oops, I missed that part:

both select, update, insert and delete

You'll need to override walkInsertTarget() and walkUpdateOrDeleteTarget() methods as well.

I couldn't get walkInsertTarget and walkUpdateOrDeleteTarget to work so I did this:

class TableWalker extends BlankWalker
{
    /**
     * @var array<string>
     */
    private $relations;

    public function walkRelationReference(RelationReference $rangeItem): void
    {
        $this->relations[] = (string)$rangeItem->name;
    }

    public function walkInsertStatement(Insert $statement): void
    {
        $this->relations[] = (string)$statement->relation->relation;
    }

    public function walkUpdateStatement(Update $statement): void
    {
        $this->relations[] = (string)$statement->relation->relation;
    }

    public function walkDeleteStatement(Delete $statement): void
    {
        $this->relations[] = (string)$statement->relation->relation;
    }
    // DOESN'T WORK (method not called)
    public function walkUpdateOrDeleteTarget(UpdateOrDeleteTarget $statement): void
    {
        $this->relations[] = (string)$statement->relation->relation;
    }

    /**
     * @return array<string>
     */
    public function getRelations(): array
    {
        return $this->relations;
    }
}

But when I do something like this:

$factory = new StatementFactory();
$select = $factory->createFromString("WITH t AS (DELETE FROM foo) DELETE FROM bar");
$walker = new TableWalker();
$select->dispatch($walker);
print_r($walker->getRelations());

I only get bar:
Array([0] => bar)

Also some thing like this:
delete from foo where id in (select id from bar)
Only yields foo and not bar

Is where a way to write the class so I always get all relations used?

Sorry, you are right, there were indeed missing dispatch() calls in BlankWalker: cbf265b

The fix is included in the most recent release, so please upgrade if you can. I'll see whether I can backport it to 0.4 branch and make a new release from that.

As for your code, when you do

    public function walkDeleteStatement(Delete $statement): void
    {
        $this->relations[] = (string)$statement->relation->relation;
    }

you are not dispatching to any child nodes of Delete thus CTEs and subselects are not processed.

Compare the above with actual code from master branch:

    public function walkDeleteStatement(Delete $statement)
    {
        $statement->with->dispatch($this);
        $statement->relation->dispatch($this);
        $statement->using->dispatch($this);
        $statement->where->dispatch($this);
        $statement->returning->dispatch($this);
        return null;
    }

I'm using 1.0.0-beta. As I understand, the behaviour I'm getting is excepted. But how do I walk subselects in delete and update statements?

If you are using 1.0.0-beta, then extending BlankWalker and implementing (only!) walkRelationReference(), walkInsertTarget(), walkUpdateOrDeleteTarget() should work properly, you don't need to change other methods as this may break walking parts of the query.

The problem in previous versions was that dispatch() calls leading to walkInsertTarget(), walkUpdateOrDeleteTarget() were omitted, e.g.

        $statement->relation->dispatch($this);

in the above code. That's exactly the same problem you have with your code now: if you override walkDeleteStatement() then for subselects and CTEs to work you need to make sure you dispatch to Nodes representing these:

    public function walkDeleteStatement(Delete $statement): void
    {
        // this will dispatch to child nodes
        parent::walkDeleteStatement($statement);
        // your additional actions (not really needed as walkUpdateOrDeleteTarget() will be called in the above)
        $this->relations[] = (string)$statement->relation->relation;
    }

Thanks a lot! By implementing only walkRelationReference(), walkInsertTarget() and walkUpdateOrDeleteTarget() it now works