dolthub / go-mysql-server

A MySQL-compatible relational database with a storage agnostic query engine. Implemented in pure Go.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

driver/: dsn database name appears to be ignored

paralin opened this issue · comments

Looking at driver.go at OpenConnector, there are some weird things:

  • The query string is parsed for options correctly (jsonAs)
  • The uri is then re-encoded again without the jsonAs parameter.
  • The entire URI is passed to d.provider.Resolve()
    • Resolve() returns the list of databases (catalog) and the name.
  • The name returned by Resolve() is ignored (?) what is it for?
  • The dbs slice uses the entire dsn string as the key creating a separate Engine for each unique dsn.

Issues:

  • The database name in the DSN is ignored
  • Is it really necessary to construct a separate sqle Engine for each unique dsn string?
  • Only the Hostname part of the URI should be used for the server name (imo).

Primarily, I'm wondering the best way to have the dsn database name parsed & used as the db for the Connector?

Possibly the way to fix this is:

  • Use the Hostname as the server name passed to provider Resolve()
  • Use the path (without the / prefix) as the database name from the DSN (if any)
  • Use the Hostname as the server name as the key for s.dbs
  • Use the database name in conn.newContextWithQuery - either call (*sql.Context).SetCurrentDatabase or add a new option sql.WithDatabase(dbName)

Does this seem like I'm thinking in the right direction here? Is there some reason the dsn is pretty much ignored in driver/ today? thanks & cc - @zachmu @firelizzard18

It's been well over a year since I looked at this code, so I'm doing my best to reconstruct why I did things the way I did.

The name returned by Resolve() is ignored (?) what is it for?

It is eventually passed to sql.NewBaseSessionWithClientServer. I think that's its entire purpose. If you look at the original MR, the value returned from Resolve is stored in the field server which is then passed to sql.NewSession and that is the only thing that's ever done with it. My second MR makes things more complicated, but I think that's still the only purpose of that value.

The session requires a server so the driver gives the provider the choice of what server value it should pass to the session.

The dbs slice uses the entire dsn string as the key creating a separate Engine for each unique dsn.

The driver maintains a map of engine instances. The idea is, if multiple connectors are created for the same 'server', they can reuse the same engine. Ultimately, how that reuse happens is up to the provider, since the map is keyed on whatever value it returns.

The database name in the DSN is ignored

The driver does not ignore the database name, it simply passes the entire DSN unchanged to the provider. Thus any further processing of the DSN is the responsibility of the provider. I designed the driver to be provider-agnostic, so the provider can essentially do anything, as long as its consistent.

Is it really necessary to construct a separate sqle Engine for each unique dsn string?

A database provider is passed to the engine when the engine is constructed. Two different DSNs may point to different database providers (e.g. different servers). Thus they need different engines (I think).

Only the Hostname part of the URI should be used for the server name (imo).

Again, all of this is up to the provider. The driver has no idea what the DSN means, it just blindly passes it to the provider then blindly uses whatever 'server' value the provider returns to ensure that each unique 'server' (as defined by the provider) gets a unique engine.

I say 'server' because my intended use for this had nothing to do with actual servers. I am using go-mysql-server and this driver to power Query Pad (a VS Code extension I wrote). Among other things, QueryPad allows the user to load a CSV file and query it as if it were a MySQL database. In cases like that, there is no server.

Possibly the way to fix this is:

  • Use the Hostname as the server name passed to provider Resolve()
  • Use the Hostname as the server name as the key for s.dbs

It would be appropriate to implement those changes in the provider you're using, but not in the driver. I consciously designed the driver so it would support scenarios where there is no server.

  • Use the path (without the / prefix) as the database name from the DSN (if any)
  • Use the database name in conn.newContextWithQuery - either call (*sql.Context).SetCurrentDatabase or add a new option sql.WithDatabase(dbName)

There is already an interface that allows the provider to control the context. The only thing that's missing is a way for the provider to retrieve the DSN from within the NewContext method. If we simply update the driver to pass the DSN to the session instead of the server name, you can easily achieve your goals via the provider. Here's the change to the driver:

diff --git a/driver/driver.go b/driver/driver.go
index 886fca7c..2f2218be 100644
--- a/driver/driver.go
+++ b/driver/driver.go
@@ -150,7 +150,7 @@ func (d *Driver) OpenConnector(dsn string) (driver.Connector, error) {
 	return &Connector{
 		driver:  d,
 		options: options,
-		server:  server,
+		dsn:     dsn,
 		dbConn:  db,
 	}, nil
 }
@@ -199,15 +199,15 @@ func (c *dbConn) close() error {
 type Connector struct {
 	driver  *Driver
 	options *Options
-	server  string
+	dsn     string
 	dbConn  *dbConn
 }
 
 // Driver returns the driver.
 func (c *Connector) Driver() driver.Driver { return c.driver }
 
-// Server returns the server name.
-func (c *Connector) Server() string { return c.server }
+// DSN returns the DSN.
+func (c *Connector) DSN() string { return c.dsn }
 
 // Connect returns a connection to the database.
 func (c *Connector) Connect(ctx context.Context) (driver.Conn, error) {

Now you can implement a provider that meets all of your criteria:

type provider struct{}

func (provider) Resolve(dsn string, options *driver.Options) (string, sql.DatabaseProvider, error) {
	u, err := url.Parse(dsn)
	if err != nil {
		return "", nil, err
	}

	// Resolve the database provider

	// Use the hostname as the key for the driver
	return u.Hostname(), nil, nil
}

func (provider) NewContext(ctx context.Context, conn *driver.Conn, opts ...sql.ContextOption) (*sql.Context, error) {
	c := sql.NewContext(ctx, opts...)

	// Extract the database from the DSN and pass it to the context
	u, err := url.Parse(conn.Session().Address())
	if err != nil {
		return nil, err
	}
	q := u.Query()
	if q.Has("database") {
		c.SetCurrentDatabase(q.Get("database"))
	}
	return c, nil
}

I submitted a PR to restore the driver's tests and example, since those were removed for some reason.

The name returned by Resolve() is ignored (?) what is it for?

It is eventually passed to sql.NewBaseSessionWithClientServer. I think that's its entire purpose. If you look at the original MR, the value returned from Resolve is stored in the field server which is then passed to sql.NewSession and that is the only thing that's ever done with it. My second MR makes things more complicated, but I think that's still the only purpose of that value.

There are no referrers to the BaseSession.addr field (other than the constructor), so I guess there is no purpose to the value.

It makes sense that BaseSession.addr should end up being the Addr portion of the DSN URI (if any).

Again, all of this is up to the provider. The driver has no idea what the DSN means, it just blindly passes it to the provider then blindly uses whatever 'server' value the provider returns to ensure that each unique 'server' (as defined by the provider) gets a unique engine.

That makes sense. It's not clear that ContextBuilder is type-asserted on the Provider from the code. Maybe it would make sense to either rename ContextBuilder to ProviderWithContextBuilder and place the interface definition next to the Provider interface definition in the code (so you see it immediately when you go-to-definition on Provider) OR add another interface which extends Provider and ContextBuilder. That would make it more clear that Provider can implement that (optionally).

I say 'server' because my intended use for this had nothing to do with actual servers. I am using go-mysql-server and this driver to power Query Pad (a VS Code extension I wrote). Among other things, QueryPad allows the user to load a CSV file and query it as if it were a MySQL database. In cases like that, there is no server.

My use case is similar, there's no server, but it still makes sense to read the "hostname" portion of the DSN and use it to select which set of databases to return in the database provider. (Which is possible currently, no problem here.)

It would be appropriate to implement those changes in the provider you're using, but not in the driver. I consciously designed the driver so it would support scenarios where there is no server.

There is already an interface that allows the provider to control the context. The only thing that's missing is a way for the provider to retrieve the DSN from within the NewContext method. If we simply update the driver to pass the DSN to the session instead of the server name, you can easily achieve your goals via the provider. Here's the change to the driver:

OK, this looks good to me, can we also apply these changes to go-mysql-server?

Thanks very much for looking into this! I would not have realized the ContextBuilder thing on my own, at least not without spending more hours trying to grok this.

@firelizzard18 I implemented your suggested change as well as adding some comments clarifying this + the extra interface implying that Provider can be extended with ContextBuilder: #1598

Thanks again for the help.

BaseSession.Address returns addr. That appears to be the only reference to that field. I did not check if anything uses that method.