View previous topic :: View next topic |
Author |
Message |
planet-admin Apprentice
Joined: 27 Mar 2004 Posts: 213 Location: Boise, ID
|
Posted: Sun Dec 31, 2006 6:42 pm Post subject: VERY strange MySQL problem, definitely worth a look |
|
|
I have a mysql 4.1 server running on a quad opteron machine running redhat E/S 4u4.
This mysql runs at about 30-40% cpu usage. It has tables that are no larger than 2 gb. They are all MyISAM tables.
The cpus in this box are 2.2ghz opterons, and there are 20gb of RAM.
This box also serves as a web server, so we wanted my migrate the mysql off of the box, onto a 16core opteron 32gb RAM box, (running gentoo, with a vanilla 2.6.19, and it runs very well) with fiber channel attached raid array. Alreay on this machine we have a MySQL 4.1 with several databases, and a MySQL 5.0.26 with tables upwards of 6gb (a vbulletin site). (those databases have NO problem at ALL)
With both of these running (and that vbulletin sees a LOT of traffic), the load avg is 1-2, and the cpu usage is no more than 10%. (It is 16 cores after all).
I created a new mysql instance/config file/init script, and moved our binary data files over (created using mysqlhotcopy).
I started the server (used even the same config file, mysql variable-wise), and the cpu usage slams the server, upwards of 1100%. The load avg gets as high as 90.
I have tried:
clearing the linux disk cache, and cat'ing the myi/myd files to /dev/null, just to get them back in the cache.
MySQL 5
switching the tables to innodb
fine-tuning the mysql variables
several other things
Now, quite obviously, the 16 core machine is a much beefier server, and the nic wasn't being overutilized, yet the mysql process was taking at the very least 400-500% cpu, with actual numbers around 900%. It was causing the load average to spike, making the other mysql instances....slow.
I moved the EXACT SAME data files back (so, that rules out something strictly relating to them) to the other quad opteron box, and it runs perfectly.
I don't even know where to BEGIN to try to figure out why it would perform so differently. Anyone have any ideas?
Thanks,
Michael _________________ Michael S. Moody
Sr. Systems Engineer
Global Systems Consulting
Web: http://www.GlobalSystemsConsulting.com |
|
Back to top |
|
|
Dan Veteran
Joined: 25 Oct 2005 Posts: 1302
|
Posted: Sun Dec 31, 2006 8:52 pm Post subject: |
|
|
nothing fishy in /var/log/mysql ? |
|
Back to top |
|
|
planet-admin Apprentice
Joined: 27 Mar 2004 Posts: 213 Location: Boise, ID
|
Posted: Sun Dec 31, 2006 9:04 pm Post subject: |
|
|
Nada, zip. I can't figure it out for the life of me. It's completely incomprehensible.
Thanks,
Michael _________________ Michael S. Moody
Sr. Systems Engineer
Global Systems Consulting
Web: http://www.GlobalSystemsConsulting.com |
|
Back to top |
|
|
BitJam Advocate
Joined: 12 Aug 2003 Posts: 2508 Location: Silver City, NM
|
Posted: Sun Dec 31, 2006 9:05 pm Post subject: Re: VERY strange MySQL problem, definitely worth a look |
|
|
planet-admin wrote: | It was causing the load average to spike, making the other mysql instances....slow.
|
Perhaps the performance problem has to do with using multiple instances. I would also double check and make sure your indices are actually being used. |
|
Back to top |
|
|
m.wales n00b
Joined: 06 Sep 2006 Posts: 18
|
Posted: Sun Dec 31, 2006 10:19 pm Post subject: |
|
|
So the only thing that stood out to me was that you were moving from MySQL 4 to 5. Perhaps the problem lies somewhere in that switch? Maybe the configuration you used for 4 doesn't work too well with 5? Also are all the tables in the new 5 instance indexed and analyzed properly as I've certainly had migration problems due to that in the past. _________________ - Mark W - |
|
Back to top |
|
|
planet-admin Apprentice
Joined: 27 Mar 2004 Posts: 213 Location: Boise, ID
|
Posted: Mon Jan 01, 2007 3:43 am Post subject: |
|
|
I'm already using multiple instances, and they run without problem. In regards to the above message, that's not the case either, as I tried it as MySQL 4.1, and then moved it to 5 to see if that would solve the problem. I currently have 2 MySQL instances running on that machine, with cpu usage of 10%, and a load average of about 1. If I add that other database, it freaks out, whether I make it 4.1, ol 5.0.
Thanks,
Michael _________________ Michael S. Moody
Sr. Systems Engineer
Global Systems Consulting
Web: http://www.GlobalSystemsConsulting.com |
|
Back to top |
|
|
planet-admin Apprentice
Joined: 27 Mar 2004 Posts: 213 Location: Boise, ID
|
Posted: Mon Jan 01, 2007 3:43 am Post subject: Re: VERY strange MySQL problem, definitely worth a look |
|
|
BitJam wrote: | planet-admin wrote: | It was causing the load average to spike, making the other mysql instances....slow.
|
Perhaps the performance problem has to do with using multiple instances. I would also double check and make sure your indices are actually being used. |
They are, I did "explain" on several of the queries.
Thanks,
Michael _________________ Michael S. Moody
Sr. Systems Engineer
Global Systems Consulting
Web: http://www.GlobalSystemsConsulting.com |
|
Back to top |
|
|
BitJam Advocate
Joined: 12 Aug 2003 Posts: 2508 Location: Silver City, NM
|
Posted: Mon Jan 01, 2007 5:12 am Post subject: |
|
|
I can think of three possible reasons for the slowdown:- The new database is running unoptimized
- Misconfiguration
- Resource conflict
1. To test the indices, I would suggest actually timing select statements under a no-load condition. Compare timing on the new machine and on the old machine. I wouldn't trust explain. It would also be interesting to see how the new database works on the new machine when the vB database is off but I realize this might not be possible.
2. Make sure each instance has it's own:- Socket
- Port
- PID file
- Log file
- Data directory
Consider using mysql_multi.
3. Since the other instances slow down after you start the new one, (if there is no misconfiguration) it is clear that there is resource contention. What is not clear is if this is a symptom of some other problem or if it is the actual cause of the problem. I realize that you don't think this could be the cause because your current two instances play nice together, but resource contention among multiple instances is only an issue if the databases have a high load. Your new database could be the straw that breaks the camel's back.
Ideally, it would be nice to be able to see if the problem exists without the vB database running (or with no load). Alternatively you could add the new database but don't give it any load and see what happens. This might help isolate a misconfiguration. See if simple queries affect resource use. Then write a Perl (or some other language) script to start hitting the new database with multiple queries. See if this starts to cause a problem.
If such tests would be difficult or painful to run then as an alternative, you might want to closely monitor resource use when the contention starts. Perhaps you could set up GKrellm or some other monitoring tool to monitor memory use, processes, CPU, network traffic, and disk IO. Be prepared to take some screen shots.
If the new database works well by itself (tested either with timing selects or actually running it with the other dbs off or throttled back) then my next guess would be a misconfiguration causing two instances to vie for the same resource. If there is no configuration problem then you might have to start tracking down resource use. For example, depending on how your raid is configured it could be extremely unoptimized for multiple instances and you are getting creamed by seeks back and forth between two different data directories. Raids can often slow things down due to the seek times. In the past, I've tended to try to design my disk layouts to minimize seeks by playing tricks such as putting log file directories on separate drives. Generally, raids are often not well optimized for database performance. They are good for reliability but the can really suck performance-wise when compared with having multiple spindles available to cut down on seek times.
Finally, you might want to take a day off away from this and come back to it with a fresh set of eyes. Happy New Year! |
|
Back to top |
|
|
planet-admin Apprentice
Joined: 27 Mar 2004 Posts: 213 Location: Boise, ID
|
Posted: Mon Jan 01, 2007 6:14 am Post subject: |
|
|
I used mysqlhotcopy to get the datafiles initially, so they're the same ones. Also, when I moved them back, I moved back the exact same files, where they are now running fine. This probably rules out some "not using indexes" problem.
I used the same config file (all the various buffer_size, etc variables from the "lesser" machine), and that didn't work. I tuned it specifically for my hardware, that didn't work. I used one of the config files from the existing mysql instances, that didn't work.
Each instance is bound to a different ip address/nic. Each has it's own pid, socket, log file, and data directory, and port.
I have turned on "log queries not using indexes", and it IS using the indexes.
I have a VERY high amount of context switching, upwards of 250,000. (using vmstat)
The database does NOT do this when there are no open connections to it, but once I start the web server (hosted on another machine), it gets SLAMMED. Again, the vbulletin, and the other database instances are far far far more utilized, but they run at a nice load average of.....1.
My RAID is actually a 16-disk 12TB raid 6 array, connected via 4gb fiberchannel, with a 1gb RAM buffer on the raid controller. Plus, I flushed the linux disk buffer cache, and then cat'ed the files to dev/null to get them loaded back into the linux disk buffer cache. This raid array is using sata2 ncq drives. It's so blazing fast it's not even funny.
The key_read vs key_read_request ratio is .002.
The tmp_tables ratio is also very low.
The query cache isn't full.
NOTHING that I can think of is causing this.
Thanks,
Michael
BitJam wrote: | I can think of three possible reasons for the slowdown:- The new database is running unoptimized
- Misconfiguration
- Resource conflict
1. To test the indices, I would suggest actually timing select statements under a no-load condition. Compare timing on the new machine and on the old machine. I wouldn't trust explain. It would also be interesting to see how the new database works on the new machine when the vB database is off but I realize this might not be possible.
2. Make sure each instance has it's own:- Socket
- Port
- PID file
- Log file
- Data directory
Consider using mysql_multi.
3. Since the other instances slow down after you start the new one, (if there is no misconfiguration) it is clear that there is resource contention. What is not clear is if this is a symptom of some other problem or if it is the actual cause of the problem. I realize that you don't think this could be the cause because your current two instances play nice together, but resource contention among multiple instances is only an issue if the databases have a high load. Your new database could be the straw that breaks the camel's back.
Ideally, it would be nice to be able to see if the problem exists without the vB database running (or with no load). Alternatively you could add the new database but don't give it any load and see what happens. This might help isolate a misconfiguration. See if simple queries affect resource use. Then write a Perl (or some other language) script to start hitting the new database with multiple queries. See if this starts to cause a problem.
If such tests would be difficult or painful to run then as an alternative, you might want to closely monitor resource use when the contention starts. Perhaps you could set up GKrellm or some other monitoring tool to monitor memory use, processes, CPU, network traffic, and disk IO. Be prepared to take some screen shots.
If the new database works well by itself (tested either with timing selects or actually running it with the other dbs off or throttled back) then my next guess would be a misconfiguration causing two instances to vie for the same resource. If there is no configuration problem then you might have to start tracking down resource use. For example, depending on how your raid is configured it could be extremely unoptimized for multiple instances and you are getting creamed by seeks back and forth between two different data directories. Raids can often slow things down due to the seek times. In the past, I've tended to try to design my disk layouts to minimize seeks by playing tricks such as putting log file directories on separate drives. Generally, raids are often not well optimized for database performance. They are good for reliability but the can really suck performance-wise when compared with having multiple spindles available to cut down on seek times.
Finally, you might want to take a day off away from this and come back to it with a fresh set of eyes. Happy New Year! |
_________________ Michael S. Moody
Sr. Systems Engineer
Global Systems Consulting
Web: http://www.GlobalSystemsConsulting.com |
|
Back to top |
|
|
jonnevers Veteran
Joined: 02 Jan 2003 Posts: 1594 Location: Gentoo64 land
|
Posted: Mon Jan 01, 2007 4:37 pm Post subject: |
|
|
m.wales wrote: | So the only thing that stood out to me was that you were moving from MySQL 4 to 5. Perhaps the problem lies somewhere in that switch? Maybe the configuration you used for 4 doesn't work too well with 5? Also are all the tables in the new 5 instance indexed and analyzed properly as I've certainly had migration problems due to that in the past. |
if i remember correctly, to properly go from mysql4 to mysql5 you should dump the mysql4 database and import that dump back into mysql5. allowing the mysql5 databse to create all the data structures itself. I've had mysql4 datafiles completely wigg out a mysql5 database instance.Yes. it is a lot of work to dump a large database but it does have the side effect of providing you a proper backup of the database also.
This has to do with the way mysql5 structures its data.
planet-admin wrote: | I have a mysql 4.1 server running on a quad opteron machine running redhat E/S 4u4.
The cpus in this box are 2.2ghz opterons, and there are 20gb of RAM. My RAID is actually a 16-disk 12TB raid 6 array, connected via 4gb fiberchannel, with a 1gb RAM buffer on the raid controller.... (It is 16 cores after all).....
I moved the EXACT SAME data files back (so, that rules out something strictly relating to them) to the other quad opteron box, and it runs perfectly.
|
sounds a little like you are approaching Oracle territory w/ hardware like that. |
|
Back to top |
|
|
planet-admin Apprentice
Joined: 27 Mar 2004 Posts: 213 Location: Boise, ID
|
Posted: Tue Jan 02, 2007 12:53 am Post subject: |
|
|
Well, I first tried it on MySQL 4.1. I'm aware of the migration path, I just wanted to "quickly" try a solution, when 4.1 didn't work, I started the 5 binary I had installed. Sadly, it didn't work either.
And, yes, this hardware is a serious piece of hardware. It boots off of the fiberchannel. It'll compile the COMPLETE 2.6.19 kernel, with a make -j in about 19 seconds. 19 SECONDS!
(Sadly, I also have the "joy" of working on an sgi altix, itanium 2, supercomputer pain, but I digress)
Thanks,
Michael _________________ Michael S. Moody
Sr. Systems Engineer
Global Systems Consulting
Web: http://www.GlobalSystemsConsulting.com |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|