Jump to content

Looking for a more elegent solution to a MySQL mess


Rumblefish

Recommended Posts

		//Determine who receives bracelets and update bracelet recipients table		//This is only applicable to recipients of CHARMS		$sql = "SELECT DISTINCT ACCOUNT_NUM FROM sc_master WHERE REFERENCE = '$reference' AND AWARD NOT LIKE 'Scentsational Start Award%' AND GIFT_GROUP = 'Charm' AND PROCESSED <> 'X' LIMIT 1000";				$qry = mysql_query($sql) or die(mysql_error());				while($r = mysql_fetch_array($qry))			{			$acct = $r['ACCOUNT_NUM'];			$fname = $r['FNAME'];  //no data due to DISTINCT clause			$lname = $r['LNAME'];  //no data due to DISTINCT clause			$bsql = "SELECT ID FROM sc_bracelet_recipients WHERE CONSULTANT_ID = '$acct'";			$bqry = mysql_query($bsql) or die(mysql_query());			$count = mysql_num_rows($bqry);						if($count < 1)				{				//get count of how many bracelet records were marked				$bracelets = $bracelets + 1;				//update sc_master with an "X" for each bracelet needed				mysql_query("UPDATE sc_master SET BRACELET = 'X' WHERE ACCOUNT_NUM = '$acct' AND REFERENCE = '$reference' AND AWARD NOT LIKE 'Scentsational Start Award%' AND GIFT_GROUP = 'Charm' LIMIT 1") or die(mysql_error());				//update sc_bracelet_recipients to track these records				mysql_query("INSERT INTO sc_bracelet_recipients (CONSULTANT_ID, FIRST_NAME, LAST_NAME, BATCH_MONTH) VALUES('$acct', '$fname', '$lname', '$reference')");				}			//mark the record as processed to avoid reprocessing it on the next pass			mysql_query("UPDATE sc_master SET PROCESSED = 'X'") or die(mysql_error());			}

I have a script that we run once a month on a large 'awards' list provided by a customer. A recipient can be on the list more than once for different achievements, and for certain awards they may receive a charm for a bracelet. If they are receiving a charm for the first time we must send them a charm bracelet, however if they have already received a bracelet all we send is the charm.

 

Here's a summary of steps:

  • Is the recipient award associated with a charm?
    • Yes - check the 'bracelets' table to see if they have received a bracelet already
    • If they haven't received a bracelet:

    1. update the bracelets table
    2. indicate that the record receives a bracelet
    3. if this recipient receives multiple awards, only mark 1 record for a bracelet

    [*]Mark all records that have been processed as processed

The 'batch' of data that we process is anywhere from 5,000 to 15,000 records per month. The logic noted above runs (loops) for each record in the 'batch' and the bracelet table contains a couple hundred thousand records. This just kills the server... so, you'll see that I limit the primary query to 1,000 records, this is due to how slow this script is (the tables contain hundreds of thousands of records).

 

I'm guessing there's someone far smarter than I that can probably handle all of this in one beautiful and much faster query, preferably fast enough to process the entire list in one run vs. breaking it into groups of 1,000 per run.

 

Thanks in advance for your help!

 

Rumble

Link to comment
Share on other sites

It would be better to have 1 query get the list of all of the current bracelet recipients and then just look them up inside your loop instead of sending another select query for each user. If you build that list as an array using the account ID as the array index then you can just use a random-access isset to check if they already have a bracelet instead of needing to use in_array or looping through it. e.g.:

$res = mysql_query('SELECT CONSULTANT_ID FROM sc_bracelet_recipients');$bracelets = [];while ($row = mysql_fetch_assoc($res)) {  $bracelets[$row['CONSULTANT_ID']] = '';}
Now you can use isset to quickly check if any consultant ID is in that list.Also, you're eventually going to want to transition away from the mysql extension and towards something like PDO, the mysql extension is removed in the next major version of PHP. With PDO you should also use prepared statements rather than putting data directly in the queries. Using prepared statements in that code would speed things up also because you could prepare those insert and update queries once and then just execute them multiple times, once for each account. Prepared statements help with SQL injection attacks but the real point of them is in queries that get run over and over with different data. You can prepare the query on the server once to generate the execution plan, and then keep executing the same query with different data.
Link to comment
Share on other sites

Thanks jsg. I had considered storing that data in an array but wasn't sure how well an array would handle over 200k records, even if it was just an account number. Never tested that though. Regarding the upcoming changes with php, that really worries me as I have many thousands of lines of code written pretty much the way you see here, copied across a dozen or so platforms - that update will be brutal.

 

Has the date been set for this next PHP release?

 

Thanks again for your note. I'll try the array and follow up with the results.

 

Best,

 

R

Link to comment
Share on other sites

PHP 7 is scheduled to be released on November 12th. There's not a PHP 6, it would have led to some confusion about which features were a part of it. The current version is 5.6. Most servers will probably remain on 5 for a while before moving to 7 because of the prevalence of the mysql extension and other non-compatible changes. The mysql extension was officially deprecated in 2013, although the PHP developers released mysqli all the way back in 2004 and have been telling people to use mysqli instead of mysql since then. Even so, people still want to keep publishing tutorials showing mysql instead of mysqli or PDO. Although a quick glance at the manual for any mysql function shows the warning box at the top:http://php.net/manual/en/function.mysql-query.phpAbout the array, you can use memory_get_usage or memory_get_max_usage if you want to benchmark how much memory your script is using. There's a maximum memory setting per script which you can change in php.ini.http://php.net/manual/en/function.memory-get-usage.php

Link to comment
Share on other sites

JSG,

 

Array question. The number of records captured in the array doesn't match the number of records in the table. Any ideas as to what may cause this? It's no bueno.

Array Count = 97,317

Table Rows = 97,492

 

Still timing out after about 5-1/2 minutes, even using the array. I'm thinking I would see significant gains if I could get access to the record ID from the DISTINCT query. I've always been irritated that you can't get any other data except for the DISTINCT field.

SELECT DISTINCT ACCOUNT_NUM FROM sc_master WHERE REFERENCE = '$reference' AND AWARD NOT LIKE 'Scentsational Start Award%' AND GIFT_GROUP = 'Charm'

 

I think the real culprit is the update query, especially because of the LIKE clause: UPDATE sc_master SET BRACELET = 'X' WHERE ACCOUNT_NUM = '$acct' AND REFERENCE = '$reference' AND AWARD NOT LIKE 'Scentsational Start Award%' AND GIFT_GROUP = 'Charm'

 

If I could simply collect the ID from the SELECT query and use the ID to update the record instead of having to find a corresponding record to post to, I'm guessing this will run very quickly. I ran a test script that identifies the records but doesn't run update or insert queries ant the script ran in a few seconds.

 

Is there any way possible to collect the ID from the record returned via the DISTINCT query? Or, a different way to identify that record without such a cumbersome update query?

 

Thanks again,

 

Rumble

Link to comment
Share on other sites

If the select query doesn't have a where clause then it will get all rows. Since it stores records by key, maybe the database has duplicates. The array won't. Also, InnoDB does not give an accurate row count and you need to actually use a count query to get the size of the table.If you want to check for duplicates in the database you can run something like this:

SELECT CONSULTANT_ID, COUNT(*) FROM sc_bracelet_recipients GROUP BY CONSULTANT_ID HAVING COUNT(*) > 1

I'm thinking I would see significant gains if I could get access to the record ID from the DISTINCT query.

You can select multiple distinct columns.
SELECT DISTINCT ACCOUNT_NUM, ID FROM ...
It will select every distinct combination.

I think the real culprit is the update query

It is, if you run the main select query in phpMyAdmin I'm sure you'll find it takes a fraction of a second. If it takes much longer than make sure you put indexes on the columns you're using in the WHERE clause. Using LIKE may cause it to not use an index though. You can also tell it to explain the query so that you can find out if it's using any indexes:
EXPLAIN EXTENDED UPDATE sc_master SET BRACELET = 'X' WHERE ACCOUNT_NUM = '$acct' AND REFERENCE = '$reference' AND AWARD NOT LIKE 'Scentsational Start Award%' AND GIFT_GROUP = 'Charm' LIMIT 1
You'll have to fill in the values, but it will tell you if it could be using any keys and which keys it's actually using. The table should be structured so that it always uses keys. If you have too many keys though, then inserts and updates will take longer. There's a balance there, only the right keys.
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...