A collection describes how to enumerate the source records of a repository, yielding batches rather than individual records. Batching is crucial for bulk indexing performance.
Block form
The simplest form is a block declared inside a repository:
repository :user do collection do |**context, &block| User.where(context).find_in_batches(batch_size: 1_000) do |batch| block.call(batch, context) end endendThe block receives keyword context (passed through from import calls) and a block that you call with (batch_array, context).
Class form
Inherit from Esse::Collection for more structure:
class MyCollection < Esse::Collection def each raw_records.each_slice(@params[:batch_size] || 1_000) do |batch| yield(batch, @params) end end
# Optional: yield only IDs in batches. Used by async indexing and lazy attribute refresh. def each_batch_ids raw_records.each_slice(@params[:batch_size] || 1_000) do |batch| yield(batch.map(&:id)) end end
private
def raw_records # ... endend
repository :user do collection MyCollectionendWhen the collection is instantiated, it receives the context hash as @params:
MyCollection.new(batch_size: 500, active: true)Contract
| Method | Required | Description |
|---|---|---|
| `each { | batch, context | }` |
| `each_batch_ids { | ids | }` |
count / size | Optional | Total record count |
Esse::Collection includes Enumerable, so you get map, select, first etc. for free once each is defined.
Why each_batch_ids matters
Extensions like esse-async_indexing rely on this method to enqueue ID-only jobs that don’t hold raw record payloads in memory. If your repository only defines each, async indexing won’t be able to kick off import jobs from the CLI.
If you use esse-active_record or esse-sequel the plugin provides both methods for you:
collection ::User # ActiveRecord — both each and each_batch_ids availableContext passing
Whatever keyword arguments are passed as context: during import flow are forwarded to the collection:
UsersIndex.import(context: { active: true, region: 'us' })…arrives in the collection as:
collection do |**context, &block| # context => { active: true, region: 'us' } User.where(active: context[:active], region: context[:region]) .find_in_batches { |b| block.call(b, context) }endThe context is then forwarded to the document block unchanged.
Custom batching metadata
You can yield additional metadata alongside the batch for the document block to consume:
collection do |**ctx, &block| Order.find_in_batches do |orders| # Bulk-fetch related data once per batch customers = Customer.where(id: orders.map(&:customer_id)).index_by(&:id) block.call(orders, ctx.merge(customers: customers)) endend
document do |order, customers: {}, **| customer = customers[order.customer_id] { _id: order.id, customer_name: customer&.name }endThis “batch context” pattern is the recommended way to avoid N+1 lookups inside serialization.
ORM integrations
The ORM extensions turn common patterns into DSL:
- esse-active_record adds
collection Modelwithscope/batch_context/connect_with. - esse-sequel provides an identical DSL for Sequel.
collection ::User, batch_size: 500 do scope :active, -> { where(active: true) } batch_context :orders do |users, **| Order.where(user_id: users.map(&:id)).group_by(&:user_id) endend