statonlab / tripal_alchemist

entity converter for tripal 3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

error running large job on dev

bradfordcondon opened this issue · comments

tripal_alchemist_run_conversion_job errored out converting a large chunk of entities.

I think its likely that it runs out of memory fetching all the entiites, since the form that counted also ran out of memory (see #39 )

tripal_alchemist_convert_all_entities is hte function to inspect

print("Idea one: its failing here\n");
//step 1 get all qualifying entities

    $query = db_select($chado_base_table, 'CBT');
    $query->fields('SET', ['entity_id', 'record_id']);
    $query->innerJoin($source_bundle_table, 'SET', 'SET.record_id = CBT.' . $source_bundle->data_table . '_id');
    $query->leftJoin($destination_table, 'DET', 'SET.record_id = DET.record_id');
    $query->condition('CBT.' . $type_column, $type_id);
    $query->condition('DET.record_id', NULL, 'IS');
    $results = $query->execute()->fetchAll();
print("you were wrong");

above added as logging.

alling: tripal_alchemist_run_conversion_job(Automatic, Array)
Idea one: its failing here
PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /home/www/sites/all/modules/custom/tripal_alchemist/includes/tripal_alchemist.api.inc on line 146

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /home/www/sites/all/modules/custom/tripal_alchemist/includes/tripal_alchemist.api.inc on line 146
PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in phar:///usr/bin/drush/includes/preflight.inc on line 769

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in phar:///usr/bin/drush/includes/preflight.inc on line 769

conclusion: "You were wrong" never printed. im not wrong, querying all entity/records run out of memory. we need to do a count and loop through chunks.

    $query = db_select($chado_base_table, 'CBT');
    $query->fields('SET', ['entity_id', 'record_id']);
    $query->innerJoin($source_bundle_table, 'SET', 'SET.record_id = CBT.' . $source_bundle->data_table . '_id');
    $query->leftJoin($destination_table, 'DET', 'SET.record_id = DET.record_id');
    $query->condition('CBT.' . $type_column, $type_id);
    $query->condition('DET.record_id', NULL, 'IS');

    $count_query = $query;

    $total_count = $count_query->countQuery()->execute()->fetchField();

    print("Converting " . $total_count . " records\n");

    $place = 0;
    $step = 1000;

    while ($place < $total_count) {

      $results = $query->range($place, ($place + $step))->execute()->fetchAll();

fixed like so. before i merge, I Need to check the manual and collection functions to do the same thing.

  • Automatic jobs for prop tables (ie, analyses) this is UNLIKELY to be an issue. Furthermore, the SQL is all "written out" so i need to handle the limit/offset differently.
  • Manual jobs this is a non-issue, since you're selecting them from a table.
  • Collection jobs this is an issue, but the code needs to be refactored to allow limit/offset.

So if you ask me, only the collection case is the relevant one.

i've added unit testing ot refactor, however, lets first ensure that its a problem.


class FeatureSeeder extends Seeder
{
    /**
     * Seeds the database with users.
     */
    public function up()
    {
      $mrna_term = chado_get_cvterm(['id' => 'SO:0000234']);

      factory('chado.feature', 1000000)->create(['type_id' => $mrna_term->cvterm_id]);
    }
}

create a simple seeder to generate 1 million mRNAs and then publish them.

THEN create a collection, tHEN convert to gene.

because the query is different, i dont actually think this is a problem. the collection returns all entities with no way to chunk, and its just an array of entity ids which we go through one-by-one.