Zend server taking 100% CPU

General discussion on Zend Server for IBM System i
chris_hird
Posts: 171
Joined: Fri Apr 10, 2009 12:41 am
Location: Toronto
Contact:

Zend server taking 100% CPU

Post by chris_hird » Mon Mar 29, 2010 3:18 pm

I am running with Version 7.1 of i/OS and the Zend Server is taking 100% of the CPU?
Work with Active Jobs SHIELD2
03/29/10 10:50:38
CPU %: 97.3 Elapsed time: 00:00:00 Active jobs: 218

Type options, press Enter.
2=Change 3=Hold 4=End 5=Work with 6=Release 7=Display message
8=Work with spooled files 13=Disconnect ...
Current
Opt Subsystem/Job User Type CPU % Function Status
ZENDSVR QTMHHTTP BCI 50.0 PGM-php-cgi.bi RUN
WEBSERVER QTMHHTTP BCI 44.9 PGM-php-cgi.bi RUN
CHRIS#HPA1 CHRISH INT 1.2 CMD-WRKACTJOB RUN
WEBSERVER QTMHHTTP BCI .6 PGM-php-cgi.bi RUN
ZSMONMNG QTMHHTTP BCI .1 PGM-MonitorNod SELW
WEBSERVER QTMHHTTP BCI .1 PGM-zfcgi SELW
ZSMONMNG QTMHHTTP BCI .0 PGM-watchdog THDW
ZSJOBQMNG QTMHHTTP BCI .0 PGM-jqd SELW
ZSJOBQMNG QTMHHTTP BCI .0 PGM-watchdog THDW
More...
Parameters or command
===>
F3=Exit F5=Refresh F7=Find F10=Restart statistics
F11=Display elapsed data F12=Cancel F23=More options F24=More keys

The Webserver job is the IBM http server which is linked to the PHP server?

Any ideas on what is causing this? I have seen this before but thought it had been cleared up with an IBM PTF?

Chris...
Shield Advanced Solutions Ltd
Home of JobQGenie and the Receiver Apply Program
http://www.shield.on.ca/Blog

chris_hird
Posts: 171
Joined: Fri Apr 10, 2009 12:41 am
Location: Toronto
Contact:

Re: Zend server taking 100% CPU

Post by chris_hird » Mon Mar 29, 2010 5:01 pm

Logged the problem with Zend Support and IBM support, hopefully someone can find the problem?
Shield Advanced Solutions Ltd
Home of JobQGenie and the Receiver Apply Program
http://www.shield.on.ca/Blog

chris_hird
Posts: 171
Joined: Fri Apr 10, 2009 12:41 am
Location: Toronto
Contact:

Re: Zend server taking 100% CPU

Post by chris_hird » Mon Mar 29, 2010 7:58 pm

Job log

5770SS1 V7R1M0 100423 Job Log SHIELD2 03/29/10 10:37:22 Page 1
Job name . . . . . . . . . . : ZENDSVR User . . . . . . : QTMHHTTP Number . . . . . . . . . . . : 465590
Job description . . . . . . : QZHBHTTP Library . . . . . : QHTTPSVR
MSGID TYPE SEV DATE TIME FROM PGM LIBRARY INST TO PGM LIBRARY INST
MCH6801 Escape 40 03/29/10 10:37:18.836254 < 000000 QP2USER2 QSYS *STMT
From Program . . . . . . . : tia_fault
To module . . . . . . . . . : QP2API
To procedure . . . . . . . : runpase_common__FiPvT2
Statement . . . . . . . . . : 5
Message . . . . : Object domain or storage protection error for offset
X'00000000100001D8' in object ZENDSVR QTMHHTTP 465590.
Cause . . . . . : A program tried to use a blocked instruction, access a
system domain object, or make invalid use of a protected page. The violation
type is 4. The violation type indicates the type of error: 1-Object domain
violation. 2-Test Pointer Target Addressability (TESTPTA) violation. 3-Read
protection error. 4-Write protection error. 5-Execute protection error. The
space class is X'08'. The space class designates the type of space for a
storage protection error or TESTPTA violation for a space pointer:
00-primary associated space (includes space objects). 01-secondary
associated space. 02-implicit process space for automatic storage.
03-implicit process space for static storage. 04-implicit process space for
activation group-based heap storage. 05-constant space. 06-space for
handle-based heap storage. 07-teraspace offset X'00000000100001D8'.
08-teraspace for i5/OS PASE memory address X'00000000100001D8'.
X'800000000000000000008000600001D8' is a pointer to the storage for a
protection error or TESTPTA violation for a space pointer. Some violations
may be suppressed at low system security levels.
CPFB9C6 Escape 40 03/29/10 10:37:22.061922 QP2FORK QSYS *STMT QP0ZPCPN QSYS *STMT
From module . . . . . . . . : QP2FORK
From procedure . . . . . . : send_escape__FPcPvUi
Statement . . . . . . . . . : 11
To module . . . . . . . . . : QP0ZPCPN
To procedure . . . . . . . : InvokeTargetPgm__FP11qp0z_pcp_cb
Statement . . . . . . . . . : 103
Message . . . . : PASE for i ended for signal 11, error code 1.
Cause . . . . . : The PASE for i program ended because of PASE for i signal
11. Error code 1 indicates a core file was written in the current directory.
The signal may have been produced for an exception message that appears in
the job log. Recovery . . . : Correct any error and then try the request
again. Technical description . . . . . . . . : If a core file was written,
examine it with the PASE for i 'dbx' command. PASE for i commands can be
entered on the command line displayed by calling program QP2TERM in an
interactive job.
CPF24A3 Escape 40 03/29/10 10:37:22.062431 QMHSNDPM QSYS 0C5F QLEAWI QSYS *STMT
To module . . . . . . . . . : QLEDEH
To procedure . . . . . . . : Q LE leDefaultEh2
Statement . . . . . . . . . : 172
Message . . . . : Value for call stack counter parameter not valid.
Cause . . . . . : The value 4, specified for call stack counter parameter,
is not valid. The value was specified in parameter number 7 on the API.
Recovery . . . : Correct the value for call stack counter parameter and
try the request again. This value must be greater than or equal to 0 but
cannot be larger than the number of entries on the call stack.
Shield Advanced Solutions Ltd
5770SS1 V7R1M0 100423 Job Log SHIELD2 03/29/10 10:37:22 Page 2
Job name . . . . . . . . . . : ZENDSVR User . . . . . . : QTMHHTTP Number . . . . . . . . . . . : 465590
Job description . . . . . . : QZHBHTTP Library . . . . . : QHTTPSVR
MSGID TYPE SEV DATE TIME FROM PGM LIBRARY INST TO PGM LIBRARY INST
CEE9901 Diagnostic 30 03/29/10 10:37:22.062495 QLEAWI QSYS *STMT QP0ZPCP2 QSYS *STMT
From module . . . . . . . . : QLETOOL
From procedure . . . . . . : Q LE CPF24A3_handler
Statement . . . . . . . . . : 9
To module . . . . . . . . . : QP0ZPCP2
To procedure . . . . . . . : _CXX_PEP__Fv
Statement . . . . . . . . . : *N
Message . . . . : Application error. CPFB9C6 unmonitored by QP0ZPCPN at
statement 0000000103, instruction X'0000'.
Cause . . . . . : The application ended abnormally because an exception
occurred and was not handled. The name of the program to which the
unhandled exception is sent is QP0ZPCPN QP0ZPCPN
InvokeTargetPgm__FP11qp0z_pcp_cb. The program was stopped at the high-level
language statement number(s) 0000000103 at the time the message was sent.
If more than one statement number is shown, the program is an optimized ILE
program. Optimization does not allow a single statement number to be
determined. If *N is shown as a value, it means the real value was not
available. Recovery . . . : See the low level messages previously listed
to locate the cause of the exception. Correct any errors, and then try the
request again.
CPC1219 Completion 50 03/29/10 10:37:22.062665 QWTPITP2 QSYS 0635 *EXT *N
Message . . . . : This job ended abnormally.
Cause . . . . . : An error occurred that caused this job to end abnormally.
Recovery . . . : See the previously listed messages in the job log for
this job. Correct the errors and try the request again.
CPF1164 Completion 00 03/29/10 10:37:22.141320 QWTMCEOJ QSYS 014A *EXT *N
Message . . . . : Job 465590/QTMHHTTP/ZENDSVR ended on 03/29/10 at 10:37:22;
.374 seconds used; end code 30 .
Cause . . . . . : Job 465590/QTMHHTTP/ZENDSVR completed on 03/29/10 at
10:37:22 after it used .374 seconds processing unit time. The job had
ending code 30. The job ended after 1 routing steps with a secondary ending
code of 0. The job ending codes and their meanings are as follows: 0 - The
job completed normally. 10 - The job completed normally during controlled
ending or controlled subsystem ending. 20 - The job exceeded end severity
(ENDSEV job attribute). 30 - The job ended abnormally. 40 - The job ended
before becoming active. 50 - The job ended while the job was active. 60 -
The subsystem ended abnormally while the job was active. 70 - The system
ended abnormally while the job was active. 80 - The job ended (ENDJOBABN
command). 90 - The job was forced to end after the time limit ended
(ENDJOBABN command). Recovery . . . : For more information, see the Work
management topic collection in the Systems management category in the IBM i
Information Center, http://www.ibm.com/systems/i/infocenter/.
CPD4090 Diagnostic 10 03/29/10 10:37:22.184902 QDMCOPEN QSYS 17B3 QMHJLOG QSYS 0094
Message . . . . : Printer device PRT01 not found. Output queue changed to
QPRINT in library QGPL.
Cause . . . . . : The printer device PRT01 not found. The output queue was
changed for the spooled printer file QPJOBLOG in library QSYS. Recovery . .
. : Do one of the following before you run the program again: -- Change or
override the printer device name for the spooled printer file QPJOBLOG in
library QSYS using either the Change Printer File (CHGPRTF) command or the
Override Printer File (OVRPRTF) command. -- Add or create the configuration
for the printer (CRTDEVPRT command).
Shield Advanced Solutions Ltd
Shield Advanced Solutions Ltd
Home of JobQGenie and the Receiver Apply Program
http://www.shield.on.ca/Blog

chris_hird
Posts: 171
Joined: Fri Apr 10, 2009 12:41 am
Location: Toronto
Contact:

Re: Zend server taking 100% CPU

Post by chris_hird » Tue Mar 30, 2010 12:11 am

I was told this is where I need to post the Zendserver issues as it is in beta? Is anyone actually monitoring the forums from Zend?

Chris...
Shield Advanced Solutions Ltd
Home of JobQGenie and the Receiver Apply Program
http://www.shield.on.ca/Blog

chris_hird
Posts: 171
Joined: Fri Apr 10, 2009 12:41 am
Location: Toronto
Contact:

Re: Zend server taking 100% CPU

Post by chris_hird » Tue Mar 30, 2010 5:12 pm

Gents, I am stuck between a rock and a hard place. Zend Server is not GA so I cannot place a support call. I am also running V7R1 on ESP which IBM provides support for but IBM does not provide support for Zend. My system generated over 1,000 job logs in 30 minutes and took 100% of the CPU. This is a major flaw within Zend Server. I know V7R1 is not GA yet but it will be in a matter of weeks, I think its critical that someone at least takes a look at this, after all in 2 weeks time if lots of people implement 7.1 and they find the same errors I have you could be swamped with calls!

I have spent a couple of hours on this today already, a response from Zend is not too much to ask is it? IBM suggested I send you the Core dump, I have attached it to this post. They said your programs are trying to write to protected memory and that you have some debug module at the top of the stack???

I had a memory problem running one of my C programs on V7R1, it looks like they have tightened up the memory management in this release especially where malloc is used..

Please let me know what you suggest? I will leave the system in this state for one more day before I have to rip out the OS and rebuild to the previous level.

Chris...
Shield Advanced Solutions Ltd
Home of JobQGenie and the Receiver Apply Program
http://www.shield.on.ca/Blog

massimilianoc
Posts: 699
Joined: Thu Mar 12, 2009 11:58 am

Re: Zend server taking 100% CPU

Post by massimilianoc » Wed Mar 31, 2010 9:57 am

Thanks for the valuable input.

Is there a way for us at Zend to get in contact with you privately and have a check on the issue?

If there is, please drop an email to me 'massi@zend.com' and our System i expert ' shlomo.v@zend.com', referencing this post in the email body.

Thanks for the cooperation.
Best regards,
Massi.

chris_hird
Posts: 171
Joined: Fri Apr 10, 2009 12:41 am
Location: Toronto
Contact:

Re: Zend server taking 100% CPU

Post by chris_hird » Thu Apr 08, 2010 1:18 am

Sorry I had to remove 7.1 as we had done all the testing we needed for our products. IBM said its a problem in your code and they could not help. The worst thing is this only happens occasionally and its to do with the way memory is being mapped. I had a similar problem where I was using malloc to allocate some memory and then using memset to set data at a pointer to blanks. In Version 7.1 the program failed every time but in V6R1 and V5R4 it worked every time! Setting debug on when used on V7.1 fixed the problem which made it very hard to find out what the problem was. Eventually I found the pointer was not set to an address before the memset was called, somehow the compiler decided to use the malloc allocated pointer and set the memory there instead! IBM says its usually caused by a tightening of the memory management in the new release!

I did spend about 2 hours on the phone with IBM going over the memory dump and the support guy said you are setting memory at 0100001d8 into a control space of 256Mb?? He said it was to do with the debug_??? process which was at the top of the stack. The dumps I provided should give you the information you need?

Sorry thats the best I can offer.

Chris...
Shield Advanced Solutions Ltd
Home of JobQGenie and the Receiver Apply Program
http://www.shield.on.ca/Blog

massimilianoc
Posts: 699
Joined: Thu Mar 12, 2009 11:58 am

Re: Zend server taking 100% CPU

Post by massimilianoc » Thu Apr 08, 2010 3:30 pm

Actually we also experienced the same problems in our tests lab.

We know for sure the root cause for them resides in the OS side, and IBM promised us the problems should be fixed in the latest PTF released by them.

Please, next time you need to test Zend Server on 7.1, make sure the latest OS related PTFs are installed.
Best regards,
Massi.

chris_hird
Posts: 171
Joined: Fri Apr 10, 2009 12:41 am
Location: Toronto
Contact:

Re: Zend server taking 100% CPU

Post by chris_hird » Thu Apr 08, 2010 4:28 pm

Massi

Not sure why IBM would say this to you and not to me? We had the very latest PTF's on the system shipped by IBM so we cannot think that this is going to be fixed by IBM before GA? I will however resend a note to the IBM support representative with your comments and ask if this is true? I will post any information IBM cares to share with me.

Chris...
Shield Advanced Solutions Ltd
Home of JobQGenie and the Receiver Apply Program
http://www.shield.on.ca/Blog

chris_hird
Posts: 171
Joined: Fri Apr 10, 2009 12:41 am
Location: Toronto
Contact:

Re: Zend server taking 100% CPU

Post by chris_hird » Fri Apr 09, 2010 3:43 pm

Massi

Just got off the phone with IBM, they do not know about any PTF which would cause the issue we saw??? They have asked that I ask you for the PMR number and PTF so they can tie my PMR up with yours. I have the IBM contact information and I will send it the information you give me onto him. There is one PTF which could concievably be the cause but the developers are saying its highly unlikely?

Thanks
Chris...
Shield Advanced Solutions Ltd
Home of JobQGenie and the Receiver Apply Program
http://www.shield.on.ca/Blog

Post Reply