Trying to Solve a Slow Tomcat Server
This is really frustrating. This morning, out of nowhere, one of my Tomcat webservers was taking far longer to return data than normal. By 'far longer', I mean 5 to 10 seconds when yesterday it was sub-second. Really. It's a factor of 10x to 30x in response time. Horrible.
So I'm trying to figure this out. It's not a resource issue - got plenty of memory and CPU... it's not the code - it's running fine in London and NYC... it's not a simple data error - restarts and even different JVM GC parameters makes no difference at all.
It could be hardware. It could be networking. I seem to be getting no real help on either of these. It's pretty frustrating. My co-worker, Steve, who is taking over for me on this project, wants to put in another box to the mix and see if the problem clears up. If so, then it's the box. Or is it?
If they can't find the problem under load, are they going to fix it when it's isolated and not doing anything? Maybe. But the vendor isn't going to give us a replacement box "just because". It's got to be something. I'm just stumped as to what.
What's frustrating is that it appears to be a networking problem. I get the connection, but getting the contents from the Tomcat instance is exceptionally slow. I suppose, it really appears to be a software problem. Everyone says their stuff is working just fine, but still we have this delay in getting the data.
If it's not the code - and I haven't changed the code in weeks, then it might be the configuration. Nope, I checked that... it's configured exactly right. If it's not the code, and the data loads haven't changed that much, then how likely is it really "within" the box? Not a lot, in my book. But what else could it be?
Not fun.
UPDATE: a reboot of the box fixed it. Something went very wrong with the box. Oh well... can't do much about that.
[5/28] UPDATE: turns out, there's a problem with the stated defaults with Tomcat 6.0.18. In the conf/servlet.xml file there's a maximum number of acceptor threads for the incoming socket connections. It appears that the stated default of 200 was not, in fact, what the default was. Using jConsole, we were able to see that the actual value of maxThreads for the port 8080 connector was 40. We set it in the conf/server.xml:
<Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" maxPostSize="0" maxThreads="400" />
and with this (and the automatic change in jConsole) the number of acceptor threads popped up from 40 to 166 and the speed was back. Amazing that the default wasn't what they said. Had that been the case, we would have been fine.
Good lesson to know.
To activate the monitoring for Tomcat assuming it's not already active, you need to add a few command-line options to the CATALINA_OPTS environment variable in either cataling.sh or the startup.sh script:
export CATALINA_OPTS="-Dcom.sun.monitoring.jmxremote \ -Dcom.sun.monitoring.jmxremote.port=7999 \ -Dcom.sun.monitoring.jmxremote.ssl=flase \ -Dcom.sun.monitoring.jmxremote.authenticate=false"
and then restart the Tomcat server. It's probably not a huge load, but you have to have it in order to connect with jConsole, so it's probably something you need to use unless you know it's not a great idea.