So this one baffled me for several days, until I focused 100% on the problem.
This is what I saw when I ran a script to start a WAS 8.5.5.2 Deployment Manager: -
0403-030 The fork function failed. Too many processes already exist
on two of AIX LPARs.
I've checked the "obvious", including
$ lsattr -El sys0 | grep maxuproc
maxuproc 4096 Maximum number of PROCESSES allowed per user True
$ ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) unlimited
memory(kbytes) unlimited
coredump(blocks) 2097151
nofiles(descriptors) unlimited
threads(per process) unlimited
processes(per user) unlimited
When I monitored the number of processes for my user - wasadmin : -
$ ps -ef | grep -i wasadmin
I noticed that the number quickly ramped up to the ~4,000 mark, before the exception popped up.
Guess what the problem was ?
I'd created a script - startManager.sh - in the wasadmin home directory, as I usually do.
However, for some strange reason (!), I'd created a symbolic link from: -
/opt/ibm/WebSphereProfiles/Dmgr01/bin/startManager.sh
to: -
/home/wasasdmin/startManager.sh
but, in doing so, I'd somehow overwritten the original script.
In other words, I had a "shortcut" script that called itself :-)
Therefore, I run startManager.sh which then spawns 4,096 shells, each to run a copy of the script, before I hit the maxuproc limit, at which point the forking error appears.
Simples :-)
Do I feel embarrassed ? Yes, I do, but one learns through failure :-)