zdennis / activerecord-import

A library for bulk insertion of data into your database using ActiveRecord.

Home Page:http://www.continuousthinking.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use with Apartment gem leads to wrong sequence being used on postgresql

ChrisHampel opened this issue · comments

Our software has 1 PostgreSQL schema per company using our software using the Apartments gem most of the time it works but when there is a lot of activity we will get SQL where the apartment gem is selecting the right schema to insert data into, but this gem is saying to use the wrong schema's sequence for the primary key.

For example, we will be doing an import into schema 3 and it will look up the sequence for schema 1 which was used recently by another company

This either results in:

  • a very large id in a schema where barely any records exist
  • "PG::UniqueViolation: ERROR: duplicate key value violates unique constraint"
    neither of which is preferable since the first will lead to the second eventually unless we intervene and the other just stops saving of the new record all together.

It appears that the sequence being passed in here

connection_memo.next_value_for_sequence(sequence_name)
can be for the wrong schema.

For background information Apartments sets the current schema using the search_path and setting it to:

  • #{schema_name}, public

@ChrisHampel can you provide some code to demonstrate your import process and maybe some details about your environment? My guess is that you have multiple threads and one is modifying the connection context out from beneath another one.

we are running 2 puma servers with 16 max threads each, however this still happens with 1 server set to 1 max threads this also happens in rake tasks when we are specifically setting the schema.

this is our code, but for security reasons only included the applicable code.

vars.each do |var|
    instance_klass_attributes=collect_attributes(...stuff here...)
    instances.push instance_klass.new(instance_klass_attributes)
end
temp=instance_klass.import instances
instances=instance_klass.where(id: temp.ids)
unless instance_associations.nil?
    instance_associations.each do |association|
        asocs=[]
        instances.each do |instance|
            process_asoc(self, association).each do |hash|
            asoc=association[:klass].new(hash)
            asoc.send "#{association[:link].to_s}=", instance
            asocs.push asoc
        end
    end
    association[:klass].import asocs
end
unless instance_class_callbacks.nil?
  instance_class_callbacks.each do |callback|
    if instance_klass.respond_to?(:delay)
      instance_klass.delay.send(callback,instances.map(&:id))
    else
      instance_klass.send(callback,instances.map(&:id))
    end
  end
end
instances

the equivelent code we are currently running is due to this bug:

instances=[]
asocs={}
unless instance_associations.nil?
  instance_associations.each do |association|
    asocs["#{association[:key].to_s}_attributes".to_sym]=process_asoc(self, association)
  end
end
var.each do |var|
    instance_klass_attributes=collect_attributes(...stuff here...)
    asocs.each do |key,value|
      instance_klass_attributes[key]=value
    end
    instance=instance_klass.new(instance_klass_attributes)
    instance.save()
    instances.push instance
end

this is obviously orders of magnitude slower.

this all happens in a get request due to the dynamic nature of the data generation, infinite possible instance generated on demand.

we first noticed this when we were running a large scale data migration to the new system we built. in that we migrated each schema one at a time:

schemas.all.each do |schema|
    Apartment::Tenant.switch! schema.name
    ActiveRecord::Base.transaction do
        migration using your library in here(see code above)
    end
end

I hope this information is helpful.

We're experiencing the same issue with the wrong schema being passed to nextval when using ActiveRecord. I think I can help shed some light on what's going on here, and suggest a few paths forward.

A little background

If you're using Apartment the way we are (the most common way, with Postgres schemas), search_path is set to ensure that the table from the current schema is selected, like @ChrisHampel mentioned. Queries generated by ActiveRecord and elsewhere work best when no schema qualifier is included with table names which allows search_path to find the correct table. For example, queries on the posts table would use posts, rather than schema.posts. Specifically referencing a schema in the table name can return tables in different schemas, rather than what is set in search_path, if a different schema name is specified. That's what's going on here.

The issue

The issue isn't really with activerecord-import, but rather ActiveRecord's behavior with sequence names (I looked at 4.2.7, which is what we're on). The reason some other schema is appearing in activerecord-import's insert statements is because of this caching within sequence_name. When you call Post.sequence_name, it returns a value that includes the current schema, which is because it's included in what Postgres gives back here. Because of the aforementioned caching, switching to another schema and calling Post.sequence_name again returns the cached value, with the previous schema.

Fix/workaround

I don't think that sequence_name should include any schema in front (even the current one), so that might be a bug to file on ActiveRecord. However, that actually might not be what most people think, taking a look a this issue.

ActiveRecord allows explicitly setting a sequence name, too, which is how I plan on working around this, since we're only using it on one table. I set the sequence_name to the value without a schema in front of it, and it works:

class Post < ActiveRecord::Base
  self.sequence_name = "posts_seq"
end

@ChrisHampel, since you're looping dynamically through lots of records, you could set these dynamically using this fallback behavior for default_sequence_name as a reference. I saw that setting sequence_name on associations actually sets it on the association's model, too.

@mtalcott great find! Do you think calling #reset_sequence_name on the model is an option?

schemas.all.each do |schema|
    Apartment::Tenant.switch! schema.name
    ActiveRecord::Base.transaction do
        # I think this should set the sequence name with correct schema
        Post.reset_sequence_name

        Post.import posts
    end
end

If I get a chance I will test this, but we ended up not using your library, at this time, due to this weirdness

Yep, I just verified that #reset_sequence_name also does the trick!

See also this apartment issue. As this cache is set by Rails it is foremost a Rails problem. Next best place to solve this is the apartment gem. I wrote a hack while that issue is getting solved there (see the linked issue).