the # of rows in a resultset for an OLAP query before acutally running the query on a huge table.
Motivation: return error early if the resultset estimated size is anyway too big for a client to process.
What I’m thinking now is to parse the output of EXLAIN + tune statistics for columns + run analyze more often. Is this going to work?
Or are there better solutions? 🙂
There are much better solutions. Depending on the complexity of the query. If it were just one table, I believe there are articles (Cybertec) comes to mind, that explains what DETAILS are used to estimate the number of rows, because what the optimizer does and EXPLAIN does is open source and pretty well documented. I've done something similar to hit a single table for estimated rows from that table, as opposed to using an expensive select count()... when the actual count was NOT really important.
For a table you can use this query: ` SELECT reltuples::bigint AS estimate FROM pg_class WHERE relname = 'tablename' ;
Hey, thanks! It’s not that I need the total # of rows, but the # of rows that correspond to some query (WHERE + GROUP BY) 🙂
Nope, the query I posted returns the numbers of records estimated. With a complex query there is no chance to estimate the numbers.
What is a complex query?
Anything containing joins, where, and/or group by. An estimation on such things is like rolling a dice with thousands of sites.
I don’t have joins, but I do have WHERE and GROUP BY. I assumed that with different kinds of stats (https://www.postgresql.org/docs/current/planner-stats.html) planner could do that
Well, read the source code of the planner. It's a very complex thing and it doesn't estimate the count of rows.
Обсуждают сегодня