PHP-FPM reloads hangs

Hello,

Since a while we're (mostly our users) experiencing issues with our PHP-FPM setup, which we can't really pinpoint. We see what happens and we are able to resolve the issue by restarting PHP-FPM, but the issue occurs every few days on multiple servers.

Case:
When a server reaches 150/180 domains, every now and then a reload of PHP-FPM is hanging. This is always one (random) child process of the master (can be PHP 5.5, 7.0 - doesn't matter). For example: 150 sites are using PHP-FPM 7.0. When the reload happens, all child process are being shutdown. This goes well most of the time and within a second or 2, the reload is complete and everything is back to normal. Sometimes, all but one child process are being shutdown. The one child process still active causes the reload to wait/hang. Incoming requests for the other domains result in the 503 response from Apache (since Apache can't reach the PHP-FPM child pool for that specific domain). The child pool still active, does respond to new requests - so it seems to be fully functional although it should stop. We see this behavior on multiple servers and it's always a different child pool that is not responding shutdown signal.

Setup:
All servers with this issue, are running on Plesk Onyx 17.5 on a CentOS 7.4 system. The issue happens on a shared platform and our 'managed' server (simply said: we have one customer with reseller account in Plesk, so he/she can add/remove/manage his customers and domains - a managed customer can't edit configuration except for the config within Plesk). We are a few updates behind the Plesk updates. When the issue occurs, there is no issue regarding CPU/memory or disk space, aka there is no high load and enough free disk space available. Configurations like the semaphores are still default.

Logging:
We do see some errors regarding the PHP-FPM config about max.chidren. Though this config can give issues, they shouldn't cause issues with the hanging reload (if I'm correct). During the reload, we see some successful config tests:
Code:
[12-Jun-2018 14:59:46] NOTICE: configuration file /opt/plesk/php/7.0/etc/php-fpm.conf test is successful
[12-Jun-2018 14:59:52] NOTICE: Reloading in progress ...
[12-Jun-2018 15:00:14] NOTICE: configuration file /opt/plesk/php/7.0/etc/php-fpm.conf test is successful
[12-Jun-2018 15:00:16] NOTICE: Reloading in progress ...
[12-Jun-2018 15:00:20] NOTICE: configuration file /opt/plesk/php/7.0/etc/php-fpm.conf test is successful
[12-Jun-2018 15:00:21] NOTICE: Reloading in progress ...
A strace on the child pool during the issue shows little/almost nothing. When I open up the site still active child pool, I see a lot of activity because of my request. After my request is finished processing, it becomes quiet again.

Support & Testing:
We've contacted Plesk support regarding this issue. They don't seem to know anything about this issue. In the new Plesk Onyx 17.8 release they made some updates regarding PHP-FPM, but none of them in (any) relation to this. They would like to debug this issue further, but they would need access to the server which we don't really want to give (unless we really can't figure it out on our own - or with some help on this forum). We've setup a test environment so Plesk could debug on that server (and for us as well), but we weren't able to reproduce the issue on a test environment. We've used some stresstest tools (like AB) so get some load on the testserver and then perform multiple reloads, but they all went on without a hassle. Thus it seems to be an issue that occurs when there a are multiple active domains on a server, like I said: the issue starts occuring after +- 150/180 domains are added in Plesk/on the server. Most of these domains are active, but not high-traffic sites (a visitor every now and then).

Our company has been active in the webhosting world for +- 15 years now. Most of the years we we're running Windows Servers with IIS. Since a few years (after some new techies came to the company with Linux experience) we're actively using Linux (CentOS) with Plesk Onyx. This works like a charm, except for the PHP-FPM issue. As a work-a-round we're using PHP FastCGI, which works like a charm but we would like to use FPM.

Is there anyone here whom have recognizes this issues or can point us in the right direction?

Thanks!

 

 

 

 

Top