Sounds like a weird situation right? That's what we thought too, but we weren't really laughing about it since it happened on our A/R Close night and we still had it happening after the DBAs bounced the Concurrent Manager. A few days later jobs by this person are still running and nothing is completing but I don't see any locks at the database and the OS_PROCESS_IDs from FND_CONCURRENT_REQUESTS aren't actually alive on the server. So what's going on? Well, the log files tell the story.
Upon closer inspection, it appears that the reports are actually completing all of their work before passing off the job for printing as all of the reports have the same last lines since the user is trying to PRINT TO THE SAME PRINTER. Alright. We have a Solaris OS machine here which we aren't running CUPS on, so lpstat -lp <printer name> should give us some details about the printer yet when I issue the command nothing happens. No detail results. No command prompt returned. Nothing. So it appears that the concurrent manager hands off the reports for printing, but if the printer exists and can't complete the request (in this case just hanging out there) the concurrent request will never continue to Completed.
After we get the printer issue resolved, I try to cancel the reports in Running status but the application locks up on me. Looking at the locks in the system I can see something is holding me up, but I don't think it is the concurrent request session but the concurrent manager itself so I can't kill it but I took the PROCESS number 14517 from GV$SESSION and went to the server:
[me@server ~]$ ps -eaf | grep 14517
applmgr 9895 14517 0 Aug 06 ? 0:00 /bin/sh -c lp -c -dPRINTERNAME -n1 -t"USERNAME.REQUESTID" /u01/app/applmgr/11i/i
applmgr 14517 3854 0 Aug 06 ? 0:07 FNDLIBR FND Concurrent_Processor
You can see that this process number corresponded to the requests which were running under the USERNAME (USER_NAME.FND_USER), REQUESTID (REQUEST_ID.FND_CONCURRENT_REQUESTS), and the PRINTERNAME (PRINTER_NAME.FND_PRINTER) so we can look at the other OS processes with that information:
[me@server ~]$ ps -eaf | grep USERNAME
applmgr 9895 14517 0 Aug 06 ? 0:00 /bin/sh -c lp -c -dPRINTERNAME -n1 -t"USERNAME.REQUESTID" /u01/app/applmgr/11i/i
applmgr 9896 9895 0 Aug 06 ? 0:00 lp -c -dPRINTERNAME -n1 -tUSERNAME.REQUESTID /u01/app/applmgr/11i/i
applmgr 7434 3303 0 Aug 02 ? 0:00 /bin/sh -c lp -c -dPRINTERNAME -n1 -t"USERNAME.REQUESTID2" /u01/app/applmgr/11i/i
applmgr 7435 7434 0 Aug 02 ? 0:00 lp -c -dPRINTERNAME -n1 -tUSERNAME.REQUESTID2 /u01/app/applmgr/11i/i
What I find interesting is that you can see in the first two lines the process ID 9895 is listed twice, and the same with the last two lines and process ID 7434 so we've learned something about Unix here that the printer OS processes are marked as children of the main concurrent request OS process. Next up, I issued kill -9 <OS process ID> commands for 9896 and 7435 as applmgr and after I kill the first one process 9895 disappears and the application is no longer locked up on me trying to cancel the report. Why? Killing the print job that was hanging at the OS level allowed the concurrent OS process to complete believing it was done printing, which then allowed it to report back to the application that it was done and allowed my session to "complete". Next up, I killed all of the other threads that had the lp command for our user and when I went back to the application to search for their Running reports there were no longer any reports running.
No comments:
Post a Comment