bakame-php / csv-doctrine-collections-bridge

A Doctrine Collection Bridge to work with League CSV

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multi-Step CSV filtering

franz-josef-kaiser opened this issue · comments

First off: Thanks for this package… and the other projects you maintain!

Issue summary

I am using Reader::createFromPath() to retrieve a file stream, which I want to filter/ match records in a two step process.

The reason is simple: I need the filtered resultset. But for pagination, I need to have additional data: The total amount of results alongside the current "page" list (limit, offset, etc.).

Summed up, it boils down to:

# $filters and $search are `Criteria` objects containing `Comparison` objects.
$collection = new RecordCollection( $stream );
$result = $collection->matching( $filters );
$result = $collection->matching( $pagination );

The road block I am facing resides inside the RecordCollection class. When we initialize a new instance, we provide the Reader as an argument. But once we filter it, the lazy loading kicks in, resets the internal Collection and unsets the Reader. So I can not filter in two steps as the internal object simply is gone.

To gain the capability to work with an existing collection in multiple steps, I wrote a slightly changed implementation of the Collection, based upon the RecordCollection:

namespace App\Collection;

use Doctrine\Common\Collections\{
    AbstractLazyCollection,
    ArrayCollection,
    Criteria,
    Selectable,
};
use League\Csv\TabularDataReader;

final class MultifilterRecordCollection extends AbstractLazyCollection implements Selectable
{
    private $tabularDataReader;

    public function __construct(TabularDataReader $tabularDataReader)
    {
        $this->tabularDataReader = $tabularDataReader;
    }

    protected function doInitialize(): void
    {
+        if ( $this->collection ) {
+            return;
+        }
        $this->collection = new ArrayCollection();
        foreach ($this->tabularDataReader as $offset => $record) {
            $this->collection[$offset] = $record;
        }
-        unset($this->tabularDataReader);
    }

     public function matching(Criteria $criteria): ArrayCollection
    {
        $this->initialize();

        /** @var ArrayCollection $collection */
        $collection = $this->collection;

        /** @var ArrayCollection $newCollection */
        $newCollection = $collection->matching($criteria);
+	$this->collection = $newCollection;

        return $newCollection;
    }
}

@franz-josef-kaiser thanks for the kind words. Looking at your issue I fail to understand ir. The initialize method iterate over the full CSV document before unset is call so during your second filtering step sll the data should already be present in the collection and you no longer need the Reader instance unless I am missing something ?

Actually there are three steps:

  1. Fetch all data (as stream) > This gives me the total amount
  2. Apply clean up filtering > This give me the "real" *) total amount of data
  3. Apply filters to Collection > This gives me the final reduced amount of records.

The "real" number of records is what I need as "total" to calculate the amount of pages (using limits, offsets, etc.) or display them in sum summary. The final filters are the reduced set of records on a single page.

Hope that helps to clear things up?

*) My use cases involves a non-standard CSV that contains multiple header rows, which I have to remove by pre-filtering the stream, which reduces the actual number of records to the usable amount of records. For e.g. 141 records -1 header row -n custom header rows > total amount of usable rows.

As fas as I understand it I would create my RecordCollection out of step 2 then step 3 would not need anything.

Again the code depends heavily on your underlying business logic but I still believe you do not need to add your proposed solution to resolve the issue. The current way to initialise the RecordCollection is inline with how Doctrine collection are created.

Nothing is added, nothing is removed.

You mean filtering the stream before creating the actual Collection? That's something I haven't thought of…