and we're facing some scaling issues. The macro symptom is a sundden increase of i/o wait, increase of load (10x or more). We're unsure what's causing this. We see a lot of LWLock and therefore CPU wait on some queries during this overload, but we're unable to pinpoint what is causing it. Scaling vertically mitigated the issue, but it's still there.
My question is how would you advise pinpointing the root cause of these sundden slownesses? Would you advise a tool that would help introspect what's going on and pinpoint the issue?
many reason can be possible in this case, is there any connection pooling solution in your cluster?
Thank you for answering. Yes we’re using pgbouncer. 12 instances, default pool at 40, for a total of 480 connections (server can accept 800)
Lots of things. Some configuration misconfigurations, a change on the application, hardware (disk, memory) problems, network because of synchronous replication. I could go on, it's near to ”an error occurred”.
Thanks. Hardware the same, no replication, … my question is about how to troubleshoot such situation. Where to look to gather insights.
Read this blog post to understand WAL and checkpoints: https://www.cybertec-postgresql.com/en/postgresql-what-is-a-checkpoint/
Обсуждают сегодня