So I have a table with about 30 Billion Rows.

Question

So I have a table with about 30 Billion Rows.

I needed to insert 20 billion more.
I have 2 indexes. One is a HUGE field (80 characters) and inserts are random.
One is a simple BIGINT tied to a sequence...

The question becomes "Should I drop my indexes?". So I played with this.
It's educational. These are NOT SSDs. These are iSCSI drives over a Gigabit Network.

Inserting With the indexes: 1 Million Rows Per HOUR.
Without the indexes: 20 million Rows Per MINUTE (1,200 times Faster)
With ONLY the SEQUENCE indexed: (It took 18hrs to rebuild after dropping, LOL). ~5 Million Rows Per MINUTE.

Considering how long it takes to rebuild the SEQUENCE index.
I should have dropped it for the inserts. Effectively I have to do 20hrs worth of inserts (about 1hr per Billion rows), then 18hrs + ???hrs to rebuild the other index.

This is 38hrs of work plus however long the second index build adds to this. Probably 10 hrs if I am guessing (so 28 hrs to create that index is my guess, and I will run them at the same time). I will write an update when I find out.

But the loads with one index vary between 3.1 and 3.6 hours (some of this could be network demand).
So, loading 20 files is AT LEAST 65 hrs and then the time to build the last index.
Versus dropping both, spending 20hrs loading, and then ~28 hrs building the indexes again.

Luckily this is not something that is in use, and the downtime is widely available.
The table is NOT partitioned. The problem with partitioning this is that the pseudo random key makes the most sense as it is the most queried. Maybe if I have 4 partitions of 50 billion rows, it won't matter that much.

Anyways, it's always faster to load data without indexes (and certainly without triggers).
Without deleting the indexes, this could have taken 20,000 Hours (833 Days)... And it would be 99% just thrashing the disk drives! Instead it will take a week or so between holidays!

BTW... How do you learn?
You DO things. Specifically HARD things...

#programming #russian

0

27.12.2023

4 ответов

79 просмотров

Amit Kumar

Thanx.. :)

0

12 месяцев назад

Mateus Freitas

Did you try "COPY"?

0

12 месяцев назад

Kirk Автор вопроса

Mateus Freitas
Did you try "COPY"?

Of course. COPY (or \COPY via psql) is simply the fastest way I know to load data. It is the format of the pg_dump/pg_restore (by default), and requires very little parsing, unlike parsing SQL for every insert. I can't fathom how long 1 BILLION insert statements would take... Just on the parsing time ALONE. COPY should be the first consideration of Bulk Loading data.

0

12 месяцев назад

Boriss Mejías · Accepted Answer

Boriss Mejías

Very interesting. Thanks for sharing @CaptainKirkFan

0

12 месяцев назад

169 похожих чатов

So I have a table with about 30 Billion Rows.

4 ответов

Похожие вопросы