Opscode-erchef ( beam.smp ) process is consuming too much CPU on the Production Chef 11.2.3 Server

Opscode-erchef ( beam.smp ) process is consuming too much CPU on the Production Chef 11.2.3 Server.

Below are some data from this issue. Rebooted the server didn't resolve the issue.

packages:

private-chef-11.2.3-1.el6.x86_64
opscode-manage-1.17.0-1.el6.x86_64
opscode-reporting-1.5.5-1.el6.x86_64
opscode-push-jobs-server-1.1.3-1.el6.x86_64

System info:

Total Node Counts: 3600
MemTotal:          24GB
CPU(s):            4

# free -m 
                 total       used       free     shared    buffers     cached
    Mem:         24026      23606        419       6132        369      17754
    -/+ buffers/cache:       5482      18544
    Swap:        12287         81      12206

beam.smp process

beam.smp process taking whole CPU usage

Chef11Server # pidstat -p 10816 1
 
Linux 2.6.32-754.12.1.el6.x86_64         06/03/2019      _x86_64_  (4 CPU)
02:10:33 PM       PID    %usr %system  %guest    %CPU   CPU  Command
02:10:34 PM     10816  100.00   11.00    0.00  100.00     2  beam.smp
02:10:35 PM     10816  100.00   11.00    0.00  100.00     2  beam.smp
02:10:36 PM     10816  100.00   11.00    0.00  100.00     2  beam.smp
02:10:37 PM     10816  100.00    8.00    0.00  100.00     2  beam.smp
02:10:38 PM     10816  100.00    9.00    0.00  100.00     2  beam.smp

Network connection per second

about ~100 connection requests getting in a second on below log file

ChefServer # cat /var/log/opscode/opscode-erchef/requests.log.4 | cut -f1  -d' ' | sort |uniq -c
111 2019-06-02T09:28:41Z
128 2019-06-02T09:28:42Z
135 2019-06-02T09:28:43Z
133 2019-06-02T09:28:44Z
125 2019-06-02T09:28:45Z
107 2019-06-02T09:28:46Z
117 2019-06-02T09:28:47Z
109 2019-06-02T09:28:48Z

System uptime

14:18:14 up 6 days, 1:30, 10 users, load average: 5.19, 5.58, 6.27

TOP Command

TOP command output like as below

ChefServer # top
top - 14:18:49 up 6 days,  1:31, 10 users,  load average: 5.40, 5.59, 6.25
Tasks: 425 total,   2 running, 423 sleeping,   0 stopped,   0 zombie
Cpu(s): 73.7%us,  7.9%sy,  0.0%ni, 17.2%id,  0.2%wa,  0.0%hi,  1.1%si,  0.0%st
Mem:  24603160k total, 24267960k used,   335200k free,   379692k buffers
Swap: 12582908k total,    83792k used, 12499116k free, 18266808k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                
10816 opscode   20   0 1025m 308m 3508 S 115.0  1.3  11355:58 beam.smp                                               
 3728 opscode   20   0 1325m 320m 3140 S 73.2  1.3   1964:26 beam.smp                                                
11415 opscode   20   0  249m 113m 2140 S 38.4  0.5   1002:35 ruby                                                    
10717 opscode   20   0 98268 9988 2712 R 20.2  0.0 983:06.91 nginx                                                   
10719 opscode   20   0 98120 9884 2720 S 10.3  0.0 984:38.63 nginx                                                   
11407 opscode   20   0  243m 107m 2140 S  9.9  0.4 970:02.54 ruby                                                    
10920 opscode   20   0 3318m 1.1g 5440 S  8.9  4.7 588:25.92 java                                                    
 3772 opscode-  20   0 6208m  51m  47m S  7.0  0.2  45:31.69 postgres                                                
10917 opscode   20   0 1114m  86m 3164 S  6.0  0.4 420:12.62 beam.smp                                                
10965 opscode   20   0 2264m 181m 2364 S  6.0  0.8 900:24.29 beam.smp         

Network Connections

There is too much open UDP connection between opscode-erlang process and oc_bifrost process

# netstat -tulpn |grep 10816 | wc -l
352
#  netstat -tulpn |grep 10816 |head
    tcp        0      0 127.0.0.1:8000              0.0.0.0:*                   LISTEN      10816/beam.smp      
    tcp        0      0 0.0.0.0:36392               0.0.0.0:*                   LISTEN      10816/beam.smp      
    udp        0      0 0.0.0.0:51639               0.0.0.0:*                               10816/beam.smp      
    udp        0      0 0.0.0.0:49975               0.0.0.0:*                               10816/beam.smp      
    udp        0      0 0.0.0.0:49847               0.0.0.0:*                               10816/beam.smp      
    udp        0      0 0.0.0.0:41911               0.0.0.0:*                               10816/beam.smp      
    udp        0      0 0.0.0.0:41783               0.0.0.0:*                               10816/beam.smp      
    udp        0      0 0.0.0.0:46391               0.0.0.0:*                               10816/beam.smp      
    udp        0      0 0.0.0.0:37176               0.0.0.0:*                               10816/beam.smp      
    udp        0      0 0.0.0.0:38840               0.0.0.0:*                               10816/beam.smp     

oc_bifrost process has created more UDP connections (352)

opscode-erchef processes and oc_bifrost processes are having the same number of count of UDP connections

# netstat -tulpn |grep  3728|wc -l
352

# private-chef-ctl status |  grep 3728
run: oc_bifrost: (pid 3728) 128119s; run: log: (pid 1298) 524402s

# private-chef-ctl status |  grep 10816
run: opscode-erchef: (pid 10816) 515140s; run: log: (pid 1290) 524530s

SYSCALL statistic

getsockopt and setsockopt are too much

# ./syscount -c -p 10816
CSYSCALL              COUNT
write                    2
newstat                  3
access                   6
connect                357
munmap                1227
getpeername           2662
bind                  3008
socket                3008
read                  3850
accept                4720
sendto                5303
close                 5610
getsockname           5670
epoll_ctl             9925
fcntl                11340
futex                13962
epoll_wait           22926
setsockopt           29957
getsockopt           38310
recvfrom             46708
writev               62396

I would like to reduce CPU utilization.

Any help/advice would be appreciated.