Sometimes you need to delete X number of records out of mysql table where the table may contain millions of records and/or gigs of data.

You could just write some code like this:

DELETE FROM screen_sizes WHERE url LIKE '%system_monitor.php%';

and call it a day.

But what happens when your query has been running for 30 seconds, shows no signs of stopping, and users page-loads are getting hung up because your MySQL server is backed up? What happens when it’s 10 minutes later and everything is still waiting on that query?

(Side note: If your table is MyISAM, a delete query blocks the table from further inserts and updates until the query is done. If you table is InnoDB, the table is not blocked but it does slow down to the point where you might have a cascade of inserts that never finish until the delete is done anyways.)

So how do you avoid this situation? Delete the records individually. This will make each query run very quickly and will put minimal load on the server. Write a little script that can be run every minute or so for as long as is needed. I’m also going to introduce a few functions in this code that will be explained in later posts, but you should still understand the concepts as-is for now.

<?php

include '../php_config/common.inc.php';

$query = "SELECT id FROM screen_sizes
			WHERE url LIKE '%system_monitor.php%'
			LIMIT 100";
$ids = getMySQLValues($query);
if($ids) {
	foreach($ids AS $id) {
		$screenObj = new screen_size($id);
		$screenObj->delete();
	}
}

?>

Putting a limit on the query as shown above helps the results come back quickly and limits the number of records you’re dealing with at any given time.

If you have 10,000 records you want to delete and you’ve setup the above script to run once a minute via cron job, it’ll take 1 hour and 40 minutes to complete the deletion procedure. Is that a lot longer than just running a delete query? Sure. Does it keep your system happy and keep users from ever knowing anything was going on? Yep.

In this example, we’re working with a very small set of records in a simple table but the same idea applies and is more relevant when trying to delete millions of records from a complex table with billions of records and dozens of indexed fields. In those cases, there’s no other way to keep a system responsive and still perform the operations you want in real-time without taking the whole system down for “maintenance”.

August 12th, 2013

Posted In: Systems

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: