Jump to content

What's the real problem?


niche

Recommended Posts

I think I'm hitting a limit that looks like a time limit, but it's probably something else. What I don't know.

Here's my script :

UPDATE du_raw2, cris4 SET du_raw2.sn2 = cris4.sn WHERE du_raw2.key1 LIKE CONCAT('% ',cris4.sn,' %')

My localhost is a wampserver with mysql 5.6.17, php 5.5.12, and apache 2.4.9.

 

My code parses mailing addresses. It targets the key1 column in the table du_raw2. The key1 column contains addresses (all the address components on a single line).

 

cris4 contains all the street name possibilities for a specific city. There are 2000 of them.

 

The script scans each address in the target file (when it has less than 2000 rows) and returns the street name if it's found (from the 2000 street name possibilities).

 

It successfully updates 1000 target rows in a second. The query log says the connections are opened and closed.It successfully updates 2000 target rows in three seconds. The query log says the connections are opened and closed.It times-out somewhere between 2000 and 3000 target rows. The query log says the connections are opened, but never closed.

 

When I reduce the number of street name possibilities from 2000 to 50, my script successfully processes 3000 target rows in 26 seconds. The query log says the connections are opened and closed. I expected it to take less than 26 seconds (causing me to think it's the way I designed the code, but then I'd need know what concept I'm violating).

 

My script isn't in a loop (yet). I have 100,000 rows to process. I'm just peeling it back to figure-out where it chokes. It's not the data. It's the number of rows and or the way I'm applying the code.

 

I'm using it with php, but I doubt php is the problem, I've already played with set_time_limit().

 

I suppose I could work around the problem in a while loop. It would only run 50 times.

 

I'd rather know what the true problem is. What do you think? Where else can I look?

 

Link to comment
Share on other sites

That query is going to take exponential time because of the join. If you have 50 rows in table A and 200 rows in table B then it has to join and process 50 * 200 = 10000 rows. If you double that, and have 100 rows in table A and 400 rows in table B then it processes 100 * 400 = 40000 rows. Doubling the number of rows caused it to increase by 4 times (22).So, if table A has 2000 rows, then notice how fast the total number of processed rows increases as the size of table B increases:

A       B         Total2000    1000      2,000,0002000    2000      4,000,0002000    3000      6,000,0002000    4000      8,000,0002000    5000      10,000,000...2000    100000    200,000,000
At 100,000 rows in table B, you're asking it to concatenate and test 200 million values.
  • Like 1
Link to comment
Share on other sites

Understood.

 

I haven't encountered many limitations with mysql. Obviously this is one of them.

 

Thanks for the reinforcement jsg.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...