Apple Network Server 500 or 700: Seems Slow

Customer says that their Apple Network Server 500 or 700 running AIX
seems slow.

Please keep in mind that Performance tuning is an art and at present we
do not have any numbers as to what is healthy performance on the Network
Servers. We will be getting better information after we and our
customers spend considerable time exercising these boxes.

Also, consider that any good UNIX tuning guide is usually between 200 and 400 pages long. IBM offers a 3 to 4 day class in this area. Also the Info Explorer List of Books contains "AIX Versions 4.1 Problem Solving Guide and Reference." This FAQ is only a brief introduction.

Four Areas To Check

===================


If an AIX system appears to be slow there are four general areas that need to be examined over time before making any suggestions to improve Performance: CPU Usage, Memory Usage, Disk and Local Peripheral I/O Performance, and Network Performance.




CPU Usage
"ps aux" will show you the memory usage of processes presently running.

# ps aux
USER
PID
%CPU
%MEM
SZ
RSS
TTY
STAT
STIME
TIME
COMMAND
root
516
78.1
0.0
0
4
-
A
Apr 08
4617:22
kproc
root
3254
19.7
7.0
1756
1756
rcm0
A
Apr 08
1165:17
/usr/lpp/
X11/bin



The columns of interest are SZ and RSS. Processes in UNIX consist of text (code), data, and stack segments. SZ is a measure of the virtual memory allocated for the data and stack segments of a running process and the text segment if it is not shared code. The RSS is a measure of the actual memory allocated for a process. Processes that are using a large percentage of the available memory might be candidates for either program optimization or jobs that could run when the system use is low, by using the cron or batch facilities. Also, nice or renice could be used to lower these processes priorities


"iostat" can tell you in general if CPU usage is high. If it is, sar -q will show you the run queue size under the heading runq-sz



# sar -q 1 2
AIX einstein 1 4 002AC5884C00 04/12/96
20:25:40
runq-sz
%runocc
swpq-sz
%swpocc
20:25:41
3.0
100
5.0
100
20:25:42
1.0
100
6.0
100
Average
2.0
99
5.5
99



w can tell you load average. This is a count of the size of the run queue and can indicate if the cpu can handle the number of processes that are attempting to run at any one time. If this count is too high, some jobs may be good candidates to be run at times when the system use is low, by using the cron or batch facilities. Also nice or renice could be used to lower their priority.



# w
08:26PM up 4 days, 2:39, 4 users, load average: 1.16, 0.63, 0.42
User
tty
login@
idle
JCPU
PCPU what
root
lft0
Thu07PM
4days
0
0 /usr/sbin/getty
root
pts/0
Thu07PM
1day
0
0 /bin/ksh
root
pts/1
Thu07PM
2:15
17
0 /bin/ksh
root
pts/2
08:19PM
0
44
0 w



Another method to control the usage of system resources by processes is by using /etc/security/limits. Please see the man page on limits for more details.





Memory Usage


vmstat will show memory usage. Remember to throw out the first entry since it is the sum total activity since the system booted.



# vmstat 2
kthr
memory
page
faults
cpu
r b
avm fre
re pi po fr sr cy
in sy cs
us sy id wa
0 0
6623 1613
0 0 0 0 0 0
2 659 252
20 278 0
0 0
6623 1613
0 0 0 0 0 0
2 814 314
25 471 0

vmstat can indicate if a high paging rate is slowing down the system. The pi and po fields under the page heading are of particular importance. pi may be meaningless since some processes page in at start time. po on the other hand, if the count is large, could be an indication of paging problems. This may indicate that more memory is needed if all the present processes need to be run at the same time.

Possible solutions to this are to run some jobs at later times using the cron and/or batch facilities. If code is written in house it might help to check to make sure code optimization techniques, such as shared libraries are used.

Make sure that there is sufficient paging space on all the disks on the system. As a general rule, paging should be spread throughout the first 4 or 5 disks on a system to minimize paging problems.





Disk and Local Peripheral I/O Performance


iostat can be used to determine usage. Remember to throw out the first entry since it is the sum total of activity since the system was booted.

# iostat
tty:
tin
tout
avg-cpu:
% user
% sys
% idle
%iowait
0.0
0.9
20.2
1.8
77.9
0.1
Disks:
% tm_act
Kbps
tps
Kb_read
Kb_wrtn
hdisk0
0.2
0.8
0.2
152435
141661
cd0
0.0
0.0
0.0
1050
0



iostat can indicate whether disk usage is well balanced or not. It may be possible to increase performance by moving certain well used logical volumes from a heavily used disk to a less used disk.

If the disk usage is well balanced iostat can also indicate if there are possible scsi or disk hardware problems.





Network Performance
netstat can indicate that there are excessive network errors. The Ierrs and Oerrs columns from "netstat -i" are of particular interest here. Ierrs and Oerrs should not greater than 1% of the Ipkts or Opkts, respectively. The Coll (collision) column should not be more than 5 or 10 percent of the network bandwidth generally with Ethernet. (There is some question as to if AIX is keeping track of this which we need to review). This may╩be an indication of faulty network components or network congestion.



# netstat -i
Name
Mtu
Network
Address
Ipkts
Ierrs
Opkts
Oerrs
Coll
lo0
16896
<Link>
12470
0
12491
0
0
lo0
16896
127
loopback
12470
0
12491
0
0
en0
1500
<Link>
0.5.2.54.a3.11
146177
0
1520
0
0
en0
1500
17.104.96
einstein
146177
0
1520
0
0



nfsstat can ind
icate that there are excessive network errors. This can be caused by overloaded NFS servers, or possible network congestion or hardware problems.



# nfsstat
Server rpc:
calls
badcalls
nullrecv
badlen
xdrcall
0
0
0
0
0
Server nfs:
calls
badcalls
0
0
null
getattr
setattr
root
lookup
readlink
read
0 0%
0 0%
0 0%
0 0%
0 0%
0 0%
0 0%
wrcache
write
create
remove
rename
link
symlink
0 0%
0 0%
0 0%
0 0%
0 0%
0 0%
0 0%
mkdir
rmdir
readdir
fsstat
0 0%
0 0%
0 0%
0 0%
Client rpc:
calls
badcalls
retrans
badxid
timeout
wait
newcred
0
0
0
0
0
0
0
Client nfs:
calls
badcalls
nclget
nclsleep
0
0
0
0
null
getattr
setattr
root
lookup
readlink
read
0 0%
0 0%
0 0%
0 0%
0 0%
0 0%
0 0%
wrcache
write
create
remove
rename
link
symlink
0 0%
0 0%
0 0%
0 0%
0 0%
0 0%
0 0%
mkdir
rmdir
readdir
fsstat
0 0%
0 0%
0 0%
0 0%




Which Processes to Kill

What processes can I safely kill on my system to perhaps make its performance a little faster?


  • If your system is not a router and you do not have more than one router on your network segment, there is no need to run routed.

  • Unless you need to let other people use the finger command to your system, there is no need to run rwhod.

  • If you are not an NFS server, there is no need to be running nfsd, rpc.mountd, rpc.statd, or rpc.lockd.

  • If you are not going to be mounting NFS file systems ever, on client machines there is no need to be running biod's, rpc.statd, or rpc.lockd.

  • If you are running your Apple Network Server strictly as a server, you may want to kill the CDE interface.

  • We have found that graphical screen savers from the CDE interface use a considerable amount of processing bandwidth. Setting the Screen Saver in the Style Manger's Screen section to "Blank Screen" is probably best here.




  • Published Date: Feb 18, 2012