Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - emilec

Pages: [1] 2 3
Any update on fixing this bug?

With a recent upgrade to QueueMetrics 12.10.1 I've picked up this error at a few sites.

Code: [Select]
Nov 1, 2012 11:26:58 PM org.apache.jasper.compiler.Compiler generateClass
SEVERE: Error compiling file: /srv/www/tomcat5/base/work/Catalina/localhost/QueueMetrics//org/apache/jsp/     [javac] Compiling 1 source file

/srv/www/tomcat5/base/work/Catalina/localhost/QueueMetrics/org/apache/jsp/ generics are not supported in -source 1.3
(use -source 5 or higher to enable generics)
      TreeSet<Integer> validLabels = new TreeSet<Integer>();
1 error
Nov 1, 2012 11:26:58 PM org.apache.jasper.compiler.Compiler generateClass
SEVERE: Javac exception
Compile failed; see the compiler error output for details.

This problem seems specific to Tomcat 5.0. The solution is to edit TOMCAT_HOME/conf/web.xml to set the compiler to 1.5
Code: [Select]

More info:

Sure! I enjoy the free Karma  8)

I've spent a good few months working with the Loway team trying to track down a performance problem in QueueMetrics and it looks like we have finally made a breakthrough. I'm currently testing a "beta" version which is looking to be very promising. I thought I would post some of the history and some of the useful information I've gathered over time. Even though I believe the improvements Loway have made mostly contribute to the overall solution, your Java performance settings play a key role as well.

For simplicity I will be referring to QM as the application. Obviously it is served by Tomcat which uses Java. Between Tomcat and Java is where most of the troubleshooting and setting changes need to happen, but the action of running QM is what causes Tomcat and Java to become unstable.

Typical symptoms I was experiencing were either all or a combination of the following:
1) QueueMetrics GUI becomes terribly slow or inaccessible
2) High CPU usage caused by Java
3) Out of memory errors in catalina.out
4) High run time values recorded in catalina.out
5) XMLRPC queries time out

For a number of clients simply setting up a cron job to restart Tomcat once a day was generally enough to prevent slowdowns from occurring (might still happen once or twice a month). This unfortunately did not work for the larger sites with 400+ agents, where I'd often have to restart Tomcat multiple times during office hours.

Java Visual VM
So where does one start? The first thing you want to do is get your Java Visual VM monitoring working. This is detailed in the QM Advanced Manual:
The 3 things you want to look at on the Monitor page are:
1) CPU
2) (Memory) Heap
3) (Memory) PermGen

Memory Settings - Heap
After discussion with Loway they require 5/6Mb of RAM in the Heap per agent accessing the GUI. On top of that you need to allow overhead for Java as well as your reporting. At one client site I had about 400 agents. So 400 x 6 = 2400. I'm not sure how much to allocate for reports so I played it safe and rounded up to 4096 as they do pull large reports. You then use this value to set your Xms and Xmx values. You can read how to set them in the QM Manual: (I think this section of the manual may need a revisit in terms of memory allocation advice). Loway suggested that I set the Xms and Xmx values the same. Thus I used: -Xms4096M -Xmx4096M. You also want to make sure to add -server as this changes the compiler in Java. Read more here:

Note: Be sure that your memory settings are within the limits of your physical RAM (bearing in mind that your OS and other applications like MySQL also need resources). I have 12GB of RAM in my 400 Agent server of which 8GB is in use (mostly Tomcat and MySQL).

Memory Settings - PermGen
Next thing to look at is PermGen. Often the OutOfMemory events are in fact not from Heap, but PermGen. You might see this in the catalina.out log:
Code: [Select]
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: PermGen space.I hadn't realised that just like the Heap you can also set the PermGen size. By default this seems to be about 80Mb. I experimented with 256Mb and eventually settled on 512Mb. So add these settings to your config for tomcat:  -XX:PermSize=512M -XX:MaxPermSize=512M. This change made a significant difference to the stability of QM. You can read more about PermGen here:

Garbage Collection
Next up is Garbage Collection. When you start reading about Garbage Collection there is a lot of information and a lot of it differs between Java versions, so make sure your reading matches your Java version. The default collector in Java 6 is selected based on your hardware and OS, but you can force which collector to use by adjusting your tomcat settings. For single CPU setups use a serial collector:  -XX:+UseSerialGC for multi CPU servers use a parallel (aka throughput collector): -XX:+UseParallelGC.  Before I discovered my PermGen size problem I also tried a concurrent collector: -XX:+UseConcMarkSweepGC. This seems to perform better where PermGen size is limited. Once I increased my PermGen size I went back to UseParallelGC as Loway recommended this. My server has 2 x quad core CPUs with HT, so it makes sense to use it.

While we are talking about GC let's also look at some additional logging you can turn on for GC. You can add the following to your tomcat settings: -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails. This adds additional logging to your catalina.out file. Often when QM was in a hung state I would only see GC log events in catalina.out this generally coincided with Heap being maxed out. Later when I paid more attention to PermGen and CPU I would see the same effects when they were maxed. You can also add settings to alert you when Java runs out of memory. Add the following to tomcat settings: -XX:OnError=/bin/ -XX:OnOutOfMemoryError=/bin/ The scripts can contain anything you like (you could for instance trigger a restart of Tomcat). In my case I just used them to send me email. e.g.
Code: [Select]

echo `date` | mail -s "SITENAME Java Error: General"
Code: [Select]

echo `date` | mail -s "SITENAME Java Error: OutOfMemory"

You can read more about GC here: and here

Once you have these things in place you can now start monitoring JavaVVM and Tomcat logs and capture details for feedback to Loway. Capturing jstack and jmap is detailed in the same document and the JavaVVM setup, but I will list some changes to these commands which I found worked better.
Code: [Select]
jstack -F -l 21472-F Forces the thread dump. I often found that in a hung state I was unable to get a thread dump without this.
-l Prints a long listing with more info
21472 Is the Java (Tomcat) PID
Code: [Select]
jmap -F -dump:live,format=b,file=heap.bin 21472-F Forces the thread dump.
-dump Dumps into a binary format file called heap.bin. Make sure you have disk space available as this file can get very large. It does compress reasonably well using bz2 if you need to upload it somewhere for Loway.
21472 Is the Java (Tomcat) PID

Note: I have found that both these commands will pause Tomcat while the information is extracted, so running this on a working system will cause it to stop while it executes. Obviously if the system is already hung, it doesn't matter  :P

Once I had a larger PermGen set I did see an improvement in the sense that no longer would QM simply hang, but it would still slow down. This was evident in the JVVM where you could see as PermGen usage climbed so did the CPU. In the past when PermGen was maxed out it would eventually cause QM to become completely unresponsive. Once you have more overhead in PermGen it can actually recover. So better, but not quite fixed

Throughout this process I tested a number of different combination of settings and QM versions from Loway each time sending them back jstack and jmap dumps so they could locate what was slowing things down and make improvements to their code. I'll leave the detailed fix(es) up to Loway to explain (it's Greek to me) but essentially it came down to the handling of unique stings which we slowing down when using the intern() function so they replaced it with ChmInterner (hope I got that right).

Final Settings
For a quick copy and paste here are my final settings for a 400+ Agent server with 2 x Quad CPU and 12GB RAM running Tomcat, MySQL & Apache.

Code: [Select]
-Xms4096M -Xmx4096M -server -XX:+UseParallelGC -XX:PermSize=512M -XX:MaxPermSize=512M
With extra logging, JVVM and Java alerts:
Code: [Select]
-Xms4096M -Xmx4096M -server -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseParallelGC -XX:PermSize=512M -XX:MaxPermSize=512M -XX:OnError=/bin/ -XX:OnOutOfMemoryError=/bin/

Running QueueMetrics / Re: [SOLVED] Zero unanswered calls
« on: July 26, 2012, 11:13:30 »
The problem here is that there are calls with a state of abandon that were not answered anywhere, so I would expect that even with multi-stint mode enabled I would still see the abandond calls in the reports.

Is the user appearing on an outbound or inbound queue?

I've sometimes seen that calls arrive at agents due to bad routing, even if not members of the queue, the call info will be logged to queue_log, which will give the illusion that the agent logged onto the queue, when in fact all they did was receive a call.

Your raw queue_log file would be a good place to start to see what events you have for that agent.

Yes, the box is firewalled.

Can it be made proxy aware? i.e. use http_proxy env settings in the same way wget does.

I have a bit of a strange problem where after starting tomcat I then type in the QM URL in my browser and it waits a few minutes before the page actually appears. In the logs I can see it gets to Servlet A01 - Step X3. When the page finally loads it then logs Servlet A01 - Step X5! - 189483 in the catalina.out file. The error gives very little detail. Is there a way I can start QM with more verbose logging to figure out what is timing out?

Code: [Select]
INFO: Server startup in 432 ms
Servlet startup - $Id:,v 1.70 2010/11/25 18:34:59 lenz-mobile Exp $
 *** LOWAY TPF licence: to 'XXXXXX' up to 'Mon Jun 20 00:00:00 SAST 2016'
Data Scad [2016-06-20]: Mon Jun 20 00:00:00 SAST 2016
Data scad:Mon Jun 20 00:00:00 SAST 2016 - Scaduto: 0
Servlet A01 - Step X3


Servlet A01 - Step X5! - 189483 Connection timed out
Absolute path for verbs:  /usr/share/tomcat6/webapps/QueueMetrics/WEB-INF/LVerbs
SMTP: Host[localhost] Auth[false] User[xxxx] Pass[xxxxx]
Start transaction: qm_start
Encoding: UTF-8
*** DBVER:34
*** DBVER:34
*** Esce 3
*** Esce 3
[B0E86FA158B4553EFDF8F999F9E89144] 201206271029 Total run time for verb 'qm_start': 51 ms
[B0E86FA158B4553EFDF8F999F9E89144] 201206271029 Total run time for verb 'qm_start': 51 ms

Improving QueueMetrics / qloaderd init script for SUSE systems
« on: June 11, 2012, 15:16:32 »
I'd like to suggest an additional qloaderd init script to be stored in WEB-INF/mysql-utils/qloader/Other-initscripts called qloaderd.suse (or whatever convention you like) for SUSE systems.

In order to link an init script to the correct startup sequence position SUSE requires some additional information in the headers of the init script. Something like this:
Code: [Select]
# QLoader startup script for SUSE systems
# Please edit the following options in order to use the correct paths.
# $Id: qloaderd,v 1.1 2006/11/22 10:50:16 lenz Exp $
# Provides:       qloaderd
# Required-Start: $network $remote_fs
# Required-Stop:
# Default-Start:  2 3 5
# Default-Stop:
# Description:    Start the Qloader daemon



I have found that setting size=+1024k means you don't have to modify logrotate.conf (first post updated).

UPDATE: I have subsequently added size=+1024k to my suggestion as this seems to get past the logrotate.conf issues of only rotating logs weekly (first post updated).

Any suggestions yet?

You should be able to reproduce this quite easily by copying an existing call to a new name with the same Call ID in it.

That's what I see in the call details pop up.

Pages: [1] 2 3