1

I have a memory leak problem (I think that's what it is called) which I couldn't solve after numerous tries in 24 hours.

Background: I have an articles table (~37M rows) with article_id and journal_id columns, journal_specialty table (~29K rows) with journal_id and specialty_id columns. I want to populate article_specialty table with article_id and specialty_id, using the data from the 2 other tables - writing which specialty(ies) each article is on, based on the journal's specialties they are published on.

My code:

namespace App\Console\Commands\Articles;

use Illuminate\Console\Command;
use Illuminate\Support\Facades\DB;

class xa02_ArticleSpecialty extends Command
{
  protected $signature = 'articles:xa02-article-specialty';
  protected $description = 'Populate article_specialty table using journal_specialty relationships';

  private const BATCH_SIZE = 50000;
  private const INSERT_CHUNK_SIZE = 25000;

  public function handle()
  {
    $this->info('Starting article_specialty population...');
    $startTime = microtime(true);

    $journalSpecialties = DB::table('journal_specialty')
      ->select('journal_id', 'specialty_id')
      ->get()
      ->groupBy('journal_id')
      ->map(fn($group) => $group->pluck('specialty_id')->toArray())
      ->toArray();

    $totalRecords = DB::select("SHOW TABLE STATUS LIKE 'articles'")[0]->Rows;

    $this->info("Total articles to process: $totalRecords");

    $bar = $this->output->createProgressBar($totalRecords);
    $bar->start();

    $offset = 0;

    // Debugging: Show initial memory usage
    $this->info('Initial memory usage: ' . memory_get_usage() . ' bytes');

    while ($offset < $totalRecords) {
      $articles = DB::table('articles')
        ->orderBy('date_e')
        ->limit(self::BATCH_SIZE)
        ->offset($offset)
        ->get()
        ->toArray();

      // Debugging: Show memory usage after fetching articles
      $this->info('Memory usage after fetching articles: ' . memory_get_usage() . ' bytes');

      if (empty($articles)) {
        break; // ✅ No more data to process
      }

      $insertData = [];

      foreach ($articles as $article) {
        if (isset($journalSpecialties[$article->journal_id])) {
          foreach ($journalSpecialties[$article->journal_id] as $specialty_id) {
            $insertData[] = [
              'article_id' => $article->article_id,
              'specialty_id' => $specialty_id
            ];
          }
        }

        if (count($insertData) >= self::INSERT_CHUNK_SIZE) {
          DB::table('article_specialty')->insertOrIgnore($insertData);
          $insertData = []; // ✅ Free memory immediately
        }
      }

      // Debugging: Show memory usage after insert
      $this->info('Memory usage after inserting: ' . memory_get_usage() . ' bytes');

      if (!empty($insertData)) {
        DB::table('article_specialty')->insertOrIgnore($insertData);
      }

      $bar->advance(count($articles));

      // Trying to free memory after processing
      $articles = null;
      $insertData = null;
      gc_collect_cycles();
      clearstatcache();
      DB::table('articles')->newQuery();
      DB::flushQueryLog();

      $offset += self::BATCH_SIZE;

      // Debugging: Show memory usage after processing batch
      $this->newLine();
      $this->info('Memory usage after processing batch: ' . memory_get_usage() . ' bytes');
    }

    $bar->finish();
    $this->newLine();

    $totalTime = microtime(true) - $startTime;
    $this->info('✅ Completed! Processed articles in ' . gmdate("H:i:s", $totalTime) . '.');

    // Debugging: Show final memory usage
    $this->info('Final memory usage: ' . memory_get_usage() . ' bytes');
  }
}

Outcome:

Starting article_specialty population...
Total articles to process: 37765760
        0/37765760 [>---------------------------]   0%
Initial memory usage: 35593704 bytes
Memory usage after fetching articles: 141872128 bytes
Memory usage after inserting: 147389216 bytes
   100000/37765760 [>---------------------------]   0%
Memory usage after processing batch: 41217656 bytes
Memory usage after fetching articles: 145440808 bytes
Memory usage after inserting: 155017720 bytes
   200000/37765760 [>---------------------------]   0%
Memory usage after processing batch: 46857472 bytes
Memory usage after fetching articles: 151319400 bytes
Memory usage after inserting: 161351936 bytes
   300000/37765760 [>---------------------------]   0%
Memory usage after processing batch: 52506008 bytes
Memory usage after fetching articles: 156635200 bytes
Memory usage after inserting: 166937624 bytes
...

It keeps accumulating until it stops with an error, exhausting memory:

PHP Fatal error:  Allowed memory size of 536870912 bytes exhausted (tried to allocate 1052672 bytes)

What am I missing???

1 Answer 1

1

Finally! I got it!!! The issue was caused by Laravel’s event dispatcher, which was accumulating event listeners during batch inserts, leading to memory exhaustion. The fix was simply to disable it:

DB::connection()->unsetEventDispatcher();
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.