mongodb / mongo-php-library

The Official MongoDB PHP library

Home Page:https://mongodb.com/docs/php-library/current/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

$manager->selectServer($readPreference) fail to return a Server

hesselprodigentia opened this issue · comments

Bug Report

Instability to retry a Server instance, sometimes return well and another nothing (none expected exception at least).
I found this https://jira.mongodb.org/browse/PHPLIB-729.

Environment

PHP 8.0.10

Nginx 
CosmosDB connection
Laravel/lumen-framework 8.2.1

------------
mongodb
libbson bundled version => 1.18.0
libmongoc bundled version => 1.18.0
libmongoc SSL => enabled
libmongoc SSL library => OpenSSL
libmongoc crypto => enabled
libmongoc crypto library => libcrypto
libmongoc crypto system profile => disabled
libmongoc SASL => disabled
libmongoc ICU => enabled
libmongoc compression => enabled
libmongoc compression snappy => disabled
libmongoc compression zlib => enabled
libmongoc compression zstd => disabled
libmongocrypt bundled version => 1.2.1
libmongocrypt crypto => enabled
libmongocrypt crypto library => libcrypto
mongodb.debug => no value => no value


DB_MONGO_CONNECTION => "mongodb"
DB_COSMOS_DSN => "mongodb://xxxxxxxx/?ssl=true&replicaSet=globaldb&retrywrites=false&maxIdleTimeMS=120000&appName=xxxxxx"
DB_MONGO_DRIVER => "mongodb"
$_SERVER['DB_MONGO_CONNECTION'] => "mongodb"
$_SERVER['DB_COSMOS_DSN'] => "mongodb://xxxxxx/?ssl=true&replicaSet=globaldb&retrywrites=false&maxIdleTimeMS=120000&appName=xxxxxx"
$_SERVER['DB_MONGO_DRIVER'] => "mongodb"
$_ENV['DB_MONGO_CONNECTION'] => "mongodb"
$_ENV['DB_COSMOS_DSN'] => "mongodb://xxxxxxx/?ssl=true&replicaSet=globaldb&retrywrites=false&maxIdleTimeMS=120000&appName=xxxxxx"
$_ENV['DB_MONGO_DRIVER'] => "mongodb"
------------------
name     : mongodb/mongodb
descrip. : MongoDB driver library
keywords : database, driver, mongodb, persistence
versions : * 1.8.0
type     : library
license  : Apache License 2.0 (Apache-2.0) (OSI approved) https://spdx.org/licenses/Apache-2.0.html#licenseText
homepage : https://jira.mongodb.org/browse/PHPLIB
source   : [git] https://github.com/mongodb/mongo-php-library.git 953dbc19443aa9314c44b7217a16873347e6840d
dist     : [zip] https://api.github.com/repos/mongodb/mongo-php-library/zipball/953dbc19443aa9314c44b7217a16873347e6840d 953dbc19443aa9314c44b7217a16873347e6840d
path     : /var/www/code/vendor/mongodb/mongodb
names    : mongodb/mongodb

support
issues : https://github.com/mongodb/mongo-php-library/issues
source : https://github.com/mongodb/mongo-php-library/tree/1.8.0

autoload
psr-4
MongoDB\ => src/
files

requires
ext-hash *
ext-json *
ext-mongodb ^1.8.1
jean85/pretty-package-versions ^1.2
php ^7.0 || ^8.0
symfony/polyfill-php80 ^1.19

requires (dev)
squizlabs/php_codesniffer ^3.5, <3.5.5
symfony/phpunit-bridge 5.x-dev

Test Script

Expected and Actual Behavior

Expected to retrieve a MongoDB\Driver\Server
when \MongoDB\select_server function is called and execute line 431 $manager->selectServer($readPreference)

Behavior
Empty is returned

Debug Log

string(0) ""
string(4002) "#0 /var/www/code/vendor/mongodb/mongodb/src/functions.php(432): MongoDB\Driver\Manager->selectServer(Object(MongoDB\Driver\ReadPreference))
#1 /var/www/code/vendor/mongodb/mongodb/src/Collection.php(651): MongoDB\select_server(Object(MongoDB\Driver\Manager), Array)

PHPLIB-729 is definitely unrelated to this issue. That is an internal ticket to track any PHP changes related to mongodb/specifications#1076, which is a clarification to the cross-driver Server Selection specification. That issue does not actually pertain to how the driver selects a server (what you're doing here). Instead, it's talking about how the driver needs to forward the application's read preference on to the server by way of the $readPreference global command argument. FWIW, I don't think PHP will actually require changes for that issue as it, by way of libmongoc, should already be doing the correct thing.

Moving on: MongoDB\select_server is an internal function and is not intended to be called by applications. We cannot enforce that in PHP, but it is documented as such. That said, the function is most likely just calling MongoDB\Driver\Manager::selectServer(). I'm not sure what you mean by "empty is returned", but based on both the documentation and implementation of selectServer, the function either returns a MongoDB\Driver\Server instance or throws an exception. It's not clear to me how your "Debug Log" was produced, but it does look like it includes part of a stack trace so I assume there is actually an exception being thrown. If you're able to capture the complete exception and its stack trace, I can try and help you make sense of it.

Beyond that, you should note that the current stable versions of the PHP library and extension are 1.9.0 and 1.10.0, respectively.

maxIdleTimeMS is not a supported connection string option for PHP. The cross-driver URI Options specification notes that it only applies to drivers with connection pools, which does not include PHP. As such, you won't find it listed in the MongoDB\Driver\Manager::__construct() documentation. I don't think this is relevant to your issue, as the option will just be ignored, but it seemed reasonable to point this out.

Lastly, CosmosDB is not officially supported by the MongoDB drivers. I'm happy to try and help you diagnose the exception (assuming you can share that), but if this is ultimately an issue involving CosmosDB you'll have to follow up with the Microsoft Azure folks for further support.

Thank you for the entire comment <3

It was not easy to debug on production, but after some effort, we finally "found" the issue.

Just to let anyone knows, the "empty" mentioned was about the $exception message; catching the entire $exception was type ConnectionTimeoutException.

Following the official documentation, we let explicit the connection option connectTimeoutMS

My unique concern was about the method MongoDB\Driver\Manager->selectServer. It returns the driver response direct, and the lib doesn't allow to specify some retry on its point (or I'm not aware about, I'm sorry if it exists somehow, please let me know), none retry, fallback, "cache", or recursive ensure to check. And it's used on all methods, I understand that Cosmos isn't officially maintained, but in case of any connection intermittence with any provider, it could be quick beneficial.

object(MongoDB\Driver\Exception\ConnectionTimeoutException)#200 (8) {
  ["message":protected]=>
  string(0) ""
  ["string":"Exception":private]=>
  string(0) ""
  ["code":protected]=>
  int(13053)
  ["file":protected]=>
  string(54) "/var/www/code/vendor/mongodb/mongodb/src/functions.php"
  ["line":protected]=>
  int(432)
  ["trace":"Exception":private]=>
  array(28) {
    [0]=>
    array(6) {
      ["file"]=>
      string(54) "/var/www/code/vendor/mongodb/mongodb/src/functions.php"
      ["line"]=>
      int(432)
      ["function"]=>
      string(12) "selectServer"
      ["class"]=>
      string(22) "MongoDB\Driver\Manager"
      ["type"]=>
      string(2) "->"

Retrying server selection

My unique concern was about the method MongoDB\Driver\Manager->selectServer. It returns the driver response direct, and the lib doesn't allow to specify some retry on its point (or I'm not aware about, I'm sorry if it exists somehow, please let me know), none retry, fallback, "cache", or recursive ensure to check.

MongoDB drivers do support retrying read and write operations (I believe the latter would require a real MongoDB server), but neither of those features retry the initial server selection attempt (see: Selecting the initial server from the Retryable Reads spec).

The reason for this is that server selection typically operates in a loop and will spend up to serverSelectionTimeoutMS time (default: 30 seconds) attempting to select a server. This comes in particular handy when connecting to a MongoDB replica set, where you might experience a fail over and need to wait for a new primary to be elected. It can also help with overcoming small network interruptions (during server selection, before executing a read/write operation).

Due to PHP's design (many worker processes, which share nothing and generally handle a single HTTP request), that loop behavior is disabled by default to allow workers to fail quickly and move on to the next request. You can change this behavior by specifying false for the serverSelectionTryOnce URI option. You'll find both of those options mentioned in the docs for Manager::__construct().

Depending on your application needs, you may also want to lower serverSelectionTimeoutMS if you decide to utilize a server selection loop (i.e. not try-once behavior), since 30 seconds may be far longer than you're willing to wait. In some cases, max_execution_time may itself be 30 seconds, in which case PHP wouldn't even allow the script to run for that long.

Empty exception message

I still find it odd that the exception message would be empty. ConnectionTimeoutException is only thrown for two libmongoc error codes: MONGOC_ERROR_SERVER_SELECTION_FAILURE and MONGOC_ERROR_STREAM_SOCKET (see: phongo_exception_from_mongoc_domain). The PHP driver uses phongo_exception_from_mongoc_domain to determine the appropriate exception class for a given libmongoc error. The PHP exception itself is thrown from phongo_throw_exception_from_bson_error_t_and_reply using the selected class and the original message/code from libmongoc. In this case, the message string appears to be empty and the code is 13053. I can't find any reference to that error code in either the PHP driver or libmongoc, but searching for "mongodb 13053" does turn up several old issues (some by users of the PHP driver, and some not).

If we look at all instances of MONGOC_ERROR_STREAM_SOCKET in libmongoc, we can see that the message strings all include some string literal before appending an underlying error message (e.g. from OpenSSL or a socket function).

Shifting our focus to MONGOC_ERROR_SERVER_SELECTION_FAILURE, there is precisely one place where an error could possibly be set with an empty message: _mongoc_server_selection_error has an else condition where "%s" is used to set the message. Since this is a static function, we need only look within the same source file to see where _mongoc_server_selection_error is called and there are only three occurrences where a string literal is not passed for the msg argument. In all of those cases, timeout_msg is being passed, which is a static string literal declared within mongoc_topology_select_server_id (one of the main server selection functions within libmongoc). I may be missing something subtle, but I don't think that would explain how we get an empty string in our exception message either.

In any event, I'll share this with the C driver team to see if they have any ideas as to how the empty exception message could have happened.

Unbelievable how well you express yourself, I loved your explanation and I learned a lot from it, really thank you for your patience and kindness in helping <3<3<3