error running large job on dev
bradfordcondon opened this issue · comments
tripal_alchemist_run_conversion_job
errored out converting a large chunk of entities.
I think its likely that it runs out of memory fetching all the entiites, since the form that counted also ran out of memory (see #39 )
tripal_alchemist_convert_all_entities
is hte function to inspect
print("Idea one: its failing here\n");
//step 1 get all qualifying entities
$query = db_select($chado_base_table, 'CBT');
$query->fields('SET', ['entity_id', 'record_id']);
$query->innerJoin($source_bundle_table, 'SET', 'SET.record_id = CBT.' . $source_bundle->data_table . '_id');
$query->leftJoin($destination_table, 'DET', 'SET.record_id = DET.record_id');
$query->condition('CBT.' . $type_column, $type_id);
$query->condition('DET.record_id', NULL, 'IS');
$results = $query->execute()->fetchAll();
print("you were wrong");
above added as logging.
alling: tripal_alchemist_run_conversion_job(Automatic, Array)
Idea one: its failing here
PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /home/www/sites/all/modules/custom/tripal_alchemist/includes/tripal_alchemist.api.inc on line 146
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /home/www/sites/all/modules/custom/tripal_alchemist/includes/tripal_alchemist.api.inc on line 146
PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in phar:///usr/bin/drush/includes/preflight.inc on line 769
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in phar:///usr/bin/drush/includes/preflight.inc on line 769
conclusion: "You were wrong" never printed. im not wrong, querying all entity/records run out of memory. we need to do a count and loop through chunks.
$query = db_select($chado_base_table, 'CBT');
$query->fields('SET', ['entity_id', 'record_id']);
$query->innerJoin($source_bundle_table, 'SET', 'SET.record_id = CBT.' . $source_bundle->data_table . '_id');
$query->leftJoin($destination_table, 'DET', 'SET.record_id = DET.record_id');
$query->condition('CBT.' . $type_column, $type_id);
$query->condition('DET.record_id', NULL, 'IS');
$count_query = $query;
$total_count = $count_query->countQuery()->execute()->fetchField();
print("Converting " . $total_count . " records\n");
$place = 0;
$step = 1000;
while ($place < $total_count) {
$results = $query->range($place, ($place + $step))->execute()->fetchAll();
fixed like so. before i merge, I Need to check the manual and collection functions to do the same thing.
- Automatic jobs for prop tables (ie, analyses) this is UNLIKELY to be an issue. Furthermore, the SQL is all "written out" so i need to handle the limit/offset differently.
- Manual jobs this is a non-issue, since you're selecting them from a table.
- Collection jobs this is an issue, but the code needs to be refactored to allow limit/offset.
So if you ask me, only the collection case is the relevant one.
i've added unit testing ot refactor, however, lets first ensure that its a problem.
class FeatureSeeder extends Seeder
{
/**
* Seeds the database with users.
*/
public function up()
{
$mrna_term = chado_get_cvterm(['id' => 'SO:0000234']);
factory('chado.feature', 1000000)->create(['type_id' => $mrna_term->cvterm_id]);
}
}
create a simple seeder to generate 1 million mRNAs and then publish them.
THEN create a collection, tHEN convert to gene.
because the query is different, i dont actually think this is a problem. the collection returns all entities with no way to chunk, and its just an array of entity ids which we go through one-by-one.