背景:前端访问某个接口报 500,Cannot allocate new IntPointer (1): totalBytes = 0, physicalBytes = 7775M,但是访问其他的一些接口正常。

# 查看服务器日志

服务器上有很多内存泄漏的 警告 日志和 严重 日志,很多。

[root@localhost ~]# docker logs -f 1d75d7d2abcb | grep Exception
23-Mar-2023 13:12:50.636 警告 [Catalina-utility-1] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads Web应用程序[xxx]似乎启动了一个名为[client_][generic][T#3]] 的线程,但未能停止它。这很可能会造成内存泄漏。线程的堆栈跟踪:[
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:737)
 java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647)
 java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1269)
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 java.lang.Thread.run(Thread.java:748)]
23-Mar-2023 13:12:50.640 警告 [Catalina-utility-1] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads Web应用程序[xxx]似乎启动了一个名为[[_client_][generic][T#4]] 的线程,但未能停止它。这很可能会造成内存泄漏。线程的堆栈跟踪:[
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:737)
 java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647)
 java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1269)
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 java.lang.Thread.run(Thread.java:748)]
23-Mar-2023 13:12:50.650 严重 [Catalina-utility-1] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks web应用程序[xxx]创建了一个Thre型为[java.lang.ThreadLocal](值为[java.lang.ThreadLocal@36b9c3a3]),值类型为[io.netty.util.internal.InternalThreadLocalMap](值为[io.netty.util.internal.InternalThreadLocalMap@web应用程序时未能将其删除。线程将随着时间的推移而更新,以尝试避免可能的内存泄漏
23-Mar-2023 13:12:50.650 严重 [Catalina-utility-1] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks web应用程序[xxx]创建了一个Threocal,其键类型为[java.lang.ThreadLocal](值为[java.lang.ThreadLocal@36b9c3a3]),值类型为[io.netty.util.internal.InternalThreadLocalMap](值为[io.netty.util.internal.InternalThr03),但在停止web应用程序时未能将其删除。线程将随着时间的推移而更新,以尝试避免可能的内存泄漏
23-Mar-2023 13:12:50.651 严重 [Catalina-utility-1] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks web应用程序[xxx]创建了一个Thre型为[java.lang.ThreadLocal](值为[java.lang.ThreadLocal@36b9c3a3]),值类型为[io.netty.util.internal.InternalThreadLocalMap](值为[io.netty.util.internal.InternalThreadLocalMap@web应用程序时未能将其删除。线程将随着时间的推移而更新,以尝试避免可能的内存泄漏

# 查看 Linux 服务器内存使用

# 检查下内存
[root@localhost ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:            31G         15G        7.4G         35M        8.0G         14G
Swap:          2.0G          0B        2.0G
# top 看下实时内存使用情况,可以看到 total=32778416,free=7819664,可见 vm 内存够用,同时能够在 COMMAND 列锁定该 java 进程 pid 为 30164
[root@localhost ~]# top
top - 09:15:10 up 21 days, 22:59,  1 user,  load average: 0.16, 0.23, 0.21
Tasks: 303 total,   1 running, 302 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.5 us,  3.4 sy,  0.0 ni, 96.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32778416 total,  7825416 free, 16587464 used,  8365536 buff/cache
KiB Swap:  2097148 total,  2097148 free,        0 used. 15711900 avail Mem 
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                      
 8984 root      20   0  162232   2340   1528 R  22.6  0.0   0:00.23 top                                                                                                          
  991 root      20   0 1044128  12672   7592 S   3.2  0.0   7:57.36 NetworkManager                                                                                               
30164 root      20   0   15.4g   7.6g  18000 S   3.2 24.3 540:05.34 java                                                                                                         
    1 root      20   0  194156   7272   4172 S   0.0  0.0   3:44.03 systemd                                                                                                      
    2 root      20   0       0      0      0 S   0.0  0.0   0:03.97 kthreadd
# 查看进程下线程状况,可以看到该 java 进程(或者说 jvm 进程)下有 363 个线程,每个线程使用内存 24.3
[root@localhost ~]# top -Hp 30164
top - 09:14:36 up 21 days, 22:58,  1 user,  load average: 0.16, 0.24, 0.21
Threads: 363 total,   0 running, 363 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.4 us,  0.4 sy,  0.0 ni, 99.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32778416 total,  7824408 free, 16588480 used,  8365528 buff/cache
KiB Swap:  2097148 total,  2097148 free,        0 used. 15710884 avail Mem 
  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                       
30276 root      20   0   15.4g   7.6g  18000 S  0.3 24.3  17:44.52 java                                                                                                          
30387 root      20   0   15.4g   7.6g  18000 S  0.3 24.3   1:52.16 java                                                                                                          
30389 root      20   0   15.4g   7.6g  18000 S  0.3 24.3   1:51.18 java                                                                                                          
  527 root      20   0   15.4g   7.6g  18000 S  0.3 24.3   2:00.88 java                                                                                                          
 3481 root      20   0   15.4g   7.6g  18000 S  0.3 24.3   0:54.77 java                                                                                                          
 3485 root      20   0   15.4g   7.6g  18000 S  0.3 24.3   2:01.18 java                                                                                                          
 3498 root      20   0   15.4g   7.6g  18000 S  0.3 24.3   0:48.24 java                                                                                                          
 4706 root      20   0   15.4g   7.6g  18000 S  0.3 24.3   5:01.12 java
# 也可查看当前进程下有多少线程
[root@localhost ~]# cat /proc/30164/status | grep Threads
Threads:        363
# 查看启动的容器,根据接口和名称锁定容器 id,如果无法锁定是哪个容器,inspect 下宿主机进程号,如果是 30164,就对上了,xxx 可以百度下,这里和 markdown 语法冲突,没有列出来
[root@localhost ~]# docker ps
# 参考:https://blog.csdn.net/m0_45406092/article/details/103671832
[root@localhost ~]# docker inspect -f 'xxx' 1d75d7d2abcb
# 查看该容器挂载点,通常会将容器内的一些目录进行挂载,方便我们直接在宿主机查看
[root@localhost ~]# docker inspect 1d75d7d2abcb|grep "Mount" -A 30
"Destination": "/opt/tomcat/tomcat8/webapps"
# 进入容器
[root@localhost ~]# docker exec -it 1d75d7d2abcb /bin/bash
# 尝试看下容器内的 jdk,还好是 jdk,这样就可以使用 jps,jsat,jmap 一些命令了
root@1d75d7d2abcb:/# java -version
java version "1.8.0_202"
Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)
# jps 看下 java 进程,定位到容器内进程 pid=1
root@1d75d7d2abcb:/# jps
1 Bootstrap
7959 Jps
# jstat 看下该进程堆详细状况,可以看到新生代满了
# NGCMN:新生代最小容量
# NGCMX:新生代最大容量  1397760
# NGC:当前新生代容量    1397760
# S0C:第一个幸存区大小    25088
# S1C:第二个幸存区的大小  23552
# EC:伊甸园区的大小      1349120
# OGCMN:老年代最小容量
# OGCMX:老年代最大容量   2796544
# OGC:当前老年代大小     2375680
# OC: 当前老年代大小
# MCMN: 最小元数据容量
# MCMX:最大元数据容量
# MC:当前元数据空间大小
# CCSMN:最小压缩类空间大小
# CCSMX:最大压缩类空间大小
# CCSC:当前压缩类空间大小
# YGC:年轻代 gc 次数   1624
# FGC:老年代 gc 次数   820
# 这里抛出一个问题:为什么新生代使用 100%,老年代还有剩余,为何新生代对象不转移到老年代?按理说对象应该会直接进入老年代
# 这里可以看到新生代和老年代还是遵循 1:2 的比例,也就是虚拟机参数都是默认的
# -XX:NewRatio=2   :新生代和年老代的堆内存占用比例,例如 2 表示新生代占年老代的 1/2,占整个堆内存的 1/3
# -XX:SurvivorRatio=8: Eden 与 Survivor 的占用比例。例如 8 表示,一个 survivor 区占用 1/8 的 Eden 内存,即 1/10 的新生代内存
root@1d75d7d2abcb:/# jstat -gccapacity 1
 NGCMN    NGCMX     NGC     S0C   S1C       EC      OGCMN      OGCMX       OGC         OC       MCMN     MCMX      MC     CCSMN    CCSMX     CCSC    YGC    FGC 
174592.0 1397760.0 1397760.0 25088.0 23552.0 1349120.0   349696.0  2796544.0  2375680.0  2375680.0      0.0 1751040.0 812632.0      0.0 1048576.0 110208.0   1624   820
# 知道进程 pid 了,根据 pid 确定进程名,其实已经可以肯定是 tomcat 容器进程了,下面查询都指向了 tomcat
root@1d75d7d2abcb:~# top
root@1d75d7d2abcb:~# ps -elf|grep java
root@1d75d7d2abcb:~# ps -elf|grep tomcat
root@1d75d7d2abcb:~# ps -ef|grep 1
# 使用 jmap dump 堆快照,看服务启动时间,我们这边服务启动了几天,生成了 1.6G 的快照
root@1d75d7d2abcb:~# jmap -dump:file=dump.hprof 1
# 把 dump 的文件移动到挂载点拷贝出来,或者使用 docker cp 命令拷贝出来,拿到本地
# 如果上面的快照不大,可以尝试使用这个网站分析:https://gceasy.io/ft-index.jsp
# 其实最好还是自己下载一个 JProfiler,安装下,然后将 dump 文件导入进去分析。JProfiler 也可以集成在 Intelligent Idea 里,查看动态运行的程序内存快照
# 下载 JProfiler,https://pan.baidu.com/s/1EJxkS2U3cmF8JHQlJJYILA
# 打开 JProfiler,heap walker,打开 dump 的快照
# 通过分析快照,发现 java 程序中使用的 ES 客户端工具频繁的做垃圾回收,回收过程中产生了很多大对象,并不是我们手写的程序本身造成的
# 我们使用的是 5.x 的 ES,版本很老了,而且由于短期无法升级 ES 客户端,所以选择提升 tomcat 容器运存(总感觉不太好)
# 查看 tomcat 运存,当前为 4G,调整后重启 docker 容器
root@1d75d7d2abcb:/usr/local/tomcat# cat bin/catalina.sh |grep "JAVA_OPT"
JAVA_OPTS="-Xms512m -Xmx4096m -XX:MaxPermSize=256m"

# 其他猜测

用户调用某个接口出现了内存溢出问题,但是调用其他接口正常,从某方面讲,可能这个接口也是有些问题的,我们从本地调整程序运行的堆参数,分个 100M 堆,新生代大致 33M,S0 和 S1 各 3.3M,即指定 - Xmx100m,-Xms100m,测试程序该接口,发现报 OOM 错,该接口返回数据量不大,但是执行过程中调用了 native 方法(我们有引入动态 dll 库,底层是大量的运算),而 native 方法是运行在虚拟机中,需要申请额外的堆内内存,而新生代又无法接受这么多内存的分配,所以 OOM,所以这也是一个原因

java.lang.OutOfMemoryError: Physical memory usage is too high: physicalBytes (196M) > maxPhysicalBytes (188M)
	at org.bytedeco.javacpp.Pointer.deallocator (Pointer.java:700)
	at org.bytedeco.javacpp.Pointer.init (Pointer.java:126)
	at org.bytedeco.javacpp.IntPointer.allocateArray (Native Method)
	at org.bytedeco.javacpp.IntPointer.<init>(IntPointer.java:90)
	这里的栈信息就不显示了~
	at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke (Method.java:498)
	at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke (InvocableHandlerMethod.java:190)
	at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest (InvocableHandlerMethod.java:138)
	at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle (ServletInvocableHandlerMethod.java:106)
	at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod (RequestMappingHandlerAdapter.java:888)
	at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal (RequestMappingHandlerAdapter.java:793)
	at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle (AbstractHandlerMethodAdapter.java:87)
	at org.springframework.web.servlet.DispatcherServlet.doDispatch (DispatcherServlet.java:1040)
	at org.springframework.web.servlet.DispatcherServlet.doService (DispatcherServlet.java:943)
	at org.springframework.web.servlet.FrameworkServlet.processRequest (FrameworkServlet.java:1006)
	at org.springframework.web.servlet.FrameworkServlet.doPost (FrameworkServlet.java:909)
	at javax.servlet.http.HttpServlet.service (HttpServlet.java:660)
	at org.springframework.web.servlet.FrameworkServlet.service (FrameworkServlet.java:883)
	at javax.servlet.http.HttpServlet.service (HttpServlet.java:741)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter (ApplicationFilterChain.java:231)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter (ApplicationFilterChain.java:166)
	at org.apache.tomcat.websocket.server.WsFilter.doFilter (WsFilter.java:53)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter (ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter (ApplicationFilterChain.java:166)
	at com.github.xiaoymin.swaggerbootstrapui.filter.SecurityBasicAuthFilter.doFilter (SecurityBasicAuthFilter.java:84)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter (ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter (ApplicationFilterChain.java:166)
	at com.github.xiaoymin.swaggerbootstrapui.filter.ProductionSecurityFilter.doFilter (ProductionSecurityFilter.java:53)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter (ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter (ApplicationFilterChain.java:166)
	at org.springframework.web.filter.RequestContextFilter.doFilterInternal (RequestContextFilter.java:100)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter (OncePerRequestFilter.java:119)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter (ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter (ApplicationFilterChain.java:166)
	at org.springframework.web.filter.FormContentFilter.doFilterInternal (FormContentFilter.java:93)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter (OncePerRequestFilter.java:119)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter (ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter (ApplicationFilterChain.java:166)
	at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal (CharacterEncodingFilter.java:201)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter (OncePerRequestFilter.java:119)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter (ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter (ApplicationFilterChain.java:166)
	at org.apache.catalina.core.StandardWrapperValve.invoke (StandardWrapperValve.java:202)
	at org.apache.catalina.core.StandardContextValve.invoke (StandardContextValve.java:96)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke (AuthenticatorBase.java:526)
	at org.apache.catalina.core.StandardHostValve.invoke (StandardHostValve.java:139)
	at org.apache.catalina.valves.ErrorReportValve.invoke (ErrorReportValve.java:92)
	at org.apache.catalina.core.StandardEngineValve.invoke (StandardEngineValve.java:74)
	at org.apache.catalina.connector.CoyoteAdapter.service (CoyoteAdapter.java:343)
	at org.apache.coyote.http11.Http11Processor.service (Http11Processor.java:408)
	at org.apache.coyote.AbstractProcessorLight.process (AbstractProcessorLight.java:66)
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process (AbstractProtocol.java:861)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun (NioEndpoint.java:1579)
	at org.apache.tomcat.util.net.SocketProcessorBase.run (SocketProcessorBase.java:49)
	at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run (TaskThread.java:61)
	at java.lang.Thread.run (Thread.java:750)

# 办法

调整 tomcat 运存