Discussion:
[BUGS] BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
d***@sonic.net
2015-10-30 13:32:52 UTC
Permalink
The following bug has been logged on the website:

Bug reference: 13750
Logged by: David Gould
Email address: ***@sonic.net
PostgreSQL version: 9.4.5
Operating system: Linux
Description:

With more than a few tens of thousands of tables in one database
autovacuuming slows down radically and becomes ineffective. Increasing the
number of autovacuum workers makes the slow down worse.

A client has an application that loads data from thousands of external feeds
many times a day. They create a new table for each batch of new data. After
some months old tables are dropped. Typically the database has about 200,000
fairly small tables each of which has a few indexes and a toast. There are
also a lot of temp tables that come and go. pg_class has over 1/2 million
rows. The hosts have 80 hardware threads, 1TB of memory and fusionIO
storage.

They started seeing long running autovacuum workers doing antiwraparound
vacuums. pg_stat_activity showed workers had been vacuuming a single small
table (ex, 10k rows) for several hours, however in the query log the actual
vacuum took less than a second.

When they updated to 9.4.4 they also increased the number of autovacuum
workers in anticipation of the multixact fix causing extra vacuuming.
Several days later they observed massive catalog bloat, eg pg_attribute was
over 200GB of mostly empty pages. This caused new connections to get stuck
in startup as the catalogs no longer fit in the buffer cache.

I then tried experimenting with different setting of autovacuum workers and
found:

/Autovacuum Actions per Hour/
Workers Actions per Worker
1 2110.1 2110.1
2 1760.8 880.4
4 647.3 161.8
8 386.2 48.3
72 62.0 0.9

I have analyzed this and created reproduction scripts. I'll send that later
today.
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Alvaro Herrera
2015-10-30 13:46:46 UTC
Permalink
Post by d***@sonic.net
With more than a few tens of thousands of tables in one database
autovacuuming slows down radically and becomes ineffective. Increasing the
number of autovacuum workers makes the slow down worse.
Yeah, you need to decrease autovacuum_vacuum_cost_delay if you want to
make them go faster. (As more workers are started, the existing ones
slow down. The intent is that the I/O bandwidth allocation is kept
constant regardless of how many workers there are.)
Post by d***@sonic.net
When they updated to 9.4.4 they also increased the number of autovacuum
workers in anticipation of the multixact fix causing extra vacuuming.
Several days later they observed massive catalog bloat, eg pg_attribute was
over 200GB of mostly empty pages. This caused new connections to get stuck
in startup as the catalogs no longer fit in the buffer cache.
Oh crap.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
David Gould
2015-10-30 14:27:04 UTC
Permalink
On Fri, 30 Oct 2015 10:46:46 -0300
Post by Alvaro Herrera
Post by d***@sonic.net
With more than a few tens of thousands of tables in one database
autovacuuming slows down radically and becomes ineffective. Increasing the
number of autovacuum workers makes the slow down worse.
Yeah, you need to decrease autovacuum_vacuum_cost_delay if you want to
make them go faster. (As more workers are started, the existing ones
slow down. The intent is that the I/O bandwidth allocation is kept
constant regardless of how many workers there are.)
The cost delays are all 0. We care about bloat, not bandwidth.

Anyway, they are not actually vacuuming. They are waiting on the
VacuumScheduleLock. And requesting freshs snapshots from the
stats_collector.

Basically there is a loop in do_autovacuum() that looks like:

... build list of all tables to vacuum ...
for tab in tables_to_vacuum:
lock(VacuumScheduleLock)
for worker in autovacuum_workers:
if worker.working_on == tab:
skip = true

if skip or very_expensive_check_to_see_if_already_vacuumed(tab):
unlock(VacuumScheduleLock)
continue
unlock(VacuumScheduleLock)
actually_vacuum(tab)

Since all the workers are working on the same list they all compete to
vacuum the next item on the list.

-dg
--
David Gould ***@sonic.net
If simplicity worked, the world would be overrun with insects.
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Alvaro Herrera
2015-10-30 15:17:11 UTC
Permalink
Post by David Gould
The cost delays are all 0. We care about bloat, not bandwidth.
Anyway, they are not actually vacuuming. They are waiting on the
VacuumScheduleLock. And requesting freshs snapshots from the
stats_collector.
Oh, I see. Interesting. Proposals welcome. I especially dislike the
("very_expensive") pgstat check.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Tom Lane
2015-10-30 15:40:27 UTC
Permalink
Post by Alvaro Herrera
Post by David Gould
Anyway, they are not actually vacuuming. They are waiting on the
VacuumScheduleLock. And requesting freshs snapshots from the
stats_collector.
Oh, I see. Interesting. Proposals welcome. I especially dislike the
("very_expensive") pgstat check.
Couldn't we simply move that out of the locked stanza? That is, if no
other worker is working on the table, claim it, and release the lock
immediately. Then do the "very expensive" check. If that fails, we
have to re-take the lock to un-claim the table, but that sounds OK.

regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Alvaro Herrera
2015-10-30 15:49:37 UTC
Permalink
Post by Tom Lane
Post by Alvaro Herrera
Post by David Gould
Anyway, they are not actually vacuuming. They are waiting on the
VacuumScheduleLock. And requesting freshs snapshots from the
stats_collector.
Oh, I see. Interesting. Proposals welcome. I especially dislike the
("very_expensive") pgstat check.
Couldn't we simply move that out of the locked stanza? That is, if no
other worker is working on the table, claim it, and release the lock
immediately. Then do the "very expensive" check. If that fails, we
have to re-take the lock to un-claim the table, but that sounds OK.
Hmm, yeah, that would work.

Of course, if we could avoid the pgstat check completely that would be
even better. The amount of pgstat traffic that causes is ridiculous.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Tom Lane
2015-10-30 15:53:57 UTC
Permalink
Post by Alvaro Herrera
Of course, if we could avoid the pgstat check completely that would be
even better. The amount of pgstat traffic that causes is ridiculous.
Good point ... shouldn't we have already checked the stats before ever
deciding to try to claim the table?

regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Alvaro Herrera
2015-10-30 16:11:50 UTC
Permalink
Post by Tom Lane
Post by Alvaro Herrera
Of course, if we could avoid the pgstat check completely that would be
even better. The amount of pgstat traffic that causes is ridiculous.
Good point ... shouldn't we have already checked the stats before ever
deciding to try to claim the table?
The second check is there to allow for some other worker (or manual
vacuum) having vacuumed it after we first checked, but which had
finished before we check the array of current jobs.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Tom Lane
2015-10-30 16:51:43 UTC
Permalink
Post by Alvaro Herrera
Post by Tom Lane
Good point ... shouldn't we have already checked the stats before ever
deciding to try to claim the table?
The second check is there to allow for some other worker (or manual
vacuum) having vacuumed it after we first checked, but which had
finished before we check the array of current jobs.
I wonder whether that check costs more than it saves.

regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Jeff Janes
2015-10-31 05:16:04 UTC
Permalink
Post by Tom Lane
Post by Alvaro Herrera
Post by Tom Lane
Good point ... shouldn't we have already checked the stats before ever
deciding to try to claim the table?
The second check is there to allow for some other worker (or manual
vacuum) having vacuumed it after we first checked, but which had
finished before we check the array of current jobs.
I wonder whether that check costs more than it saves.
A single autovacuum worker can run for hours or days. I don't think
we should start vacuuming a TB size table because it needed vacuuming
days ago, when the initial to-do list was built up, but no longer
does. So some kind of recheck is needed.

I thought of making the recheck first use whichever snapshot we
currently have hanging around, and then only if it still needs
vacuuming force a fresh snapshot and re-re-check. The problem with
that is that any previous snapshot is aggressively destroyed at the
end of each vacuum or analyze by the EOXact code. So we don't
actually have a snapshot hanging around to use until we go to the work
of re-parsing the database stats file. So you have to take special
steps to exempt AutoVacuumWorker from EOXact code clearing out the
stats, and I don't know what the repercussions of that might be.

We could also relax the freshness requirements of even the final
re-check, perhaps dynamically. No point in re-parsing a 40MB stats
file to avoid unnecessary vacuuming a 16KB table. But parsing a 8KB
stats files to avoid unnecessary vacuuming of a 1TB table is well
worth it. But that runs into the same problem as above. Once you
have destroyed your previous stats snapshot, you no longer have the
ability to accept stale stats any longer.

Cheers,

Jeff
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
David Gould
2015-10-31 07:01:08 UTC
Permalink
On Fri, 30 Oct 2015 22:16:04 -0700
Post by Jeff Janes
Post by Tom Lane
Post by Alvaro Herrera
Post by Tom Lane
Good point ... shouldn't we have already checked the stats before ever
deciding to try to claim the table?
The second check is there to allow for some other worker (or manual
vacuum) having vacuumed it after we first checked, but which had
finished before we check the array of current jobs.
I wonder whether that check costs more than it saves.
A single autovacuum worker can run for hours or days. I don't think
we should start vacuuming a TB size table because it needed vacuuming
days ago, when the initial to-do list was built up, but no longer
does. So some kind of recheck is needed.
I thought of making the recheck first use whichever snapshot we
currently have hanging around, and then only if it still needs
vacuuming force a fresh snapshot and re-re-check. The problem with
that is that any previous snapshot is aggressively destroyed at the
end of each vacuum or analyze by the EOXact code. So we don't
actually have a snapshot hanging around to use until we go to the work
of re-parsing the database stats file. So you have to take special
steps to exempt AutoVacuumWorker from EOXact code clearing out the
stats, and I don't know what the repercussions of that might be.
We could also relax the freshness requirements of even the final
re-check, perhaps dynamically. No point in re-parsing a 40MB stats
file to avoid unnecessary vacuuming a 16KB table. But parsing a 8KB
stats files to avoid unnecessary vacuuming of a 1TB table is well
worth it. But that runs into the same problem as above. Once you
have destroyed your previous stats snapshot, you no longer have the
ability to accept stale stats any longer.
I've been thinking about this and frankly the whole scheme is a bit
suspect. There are a couple of problems:

1. In an 8TB database with 200,000 tables it is not reasonable to build a
list in advance of all the tables to vacuum. Inevitably some workers
will be working off very stale lists.

2. Autovacuum workers do not need a statistics snapshot. They only need the
current statistics for the table they are considering vacuuming.

I think the following might be too big to back patch or maybe even for
9.5, but I think a better overall scheme would look something like:

- The stats collector accepts a new request message, "give me the stats
for this: (dboid, table oid)" and handles it by replying to the sender
(autovacuum worker) with the current statistics entry for that table.

- Instead of the worker building a big list of work up front, it just keeps
track of a shared highwater mark of tables worked on and does an index
scan for the next higher oid in pg_class. This is similar to my patch in
that vacuuming is ordered so that workers never contend, but instead of
working from a list they look up the next table in pg_class each time
using the oid index. For each table that might need vacuuming it would
send the stats collector the request for that tables statistics and use
the reply to make determine what action to take.

This avoids the stale work list and the need for rechecking. Also, it
eliminates all the flogging of the stats files (I suspect SSD owners
and laptop users will rejoice!).

-dg
--
David Gould 510 282 0869 ***@sonic.net
If simplicity worked, the world would be overrun with insects.
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
David Gould
2015-10-31 06:41:40 UTC
Permalink
On Fri, 30 Oct 2015 12:51:43 -0400
Post by Tom Lane
Post by Alvaro Herrera
Post by Tom Lane
Good point ... shouldn't we have already checked the stats before ever
deciding to try to claim the table?
The second check is there to allow for some other worker (or manual
vacuum) having vacuumed it after we first checked, but which had
finished before we check the array of current jobs.
I wonder whether that check costs more than it saves.
It does indeed. It drives the stats collector wild. And of course if there
are lots of tables and indexes the stats temp file gets very large so that
it can take a long time (seconds) to rewrite it. This happens for each
worker for each table that is a candidate for vacuuming.

Since it would not be convenient to provide a copy of the clients 8TB
database I have made a standalone reproduction. The attached files:

build_test_instance.sh - create a test instance
datagen.py - used by above to populate it with lots of tables
logbyv.awk - count auto analyze actions in postgres log
trace.sh - strace the stats collector and autovacuum workers
tracereport.sh - list top 50 calls in strace output

The test process is to run the build_test_instance script to create an
instance with one database with a large number of tiny tables. During the
setup autovacuuming is off. Then make a tarball of the instance for reuse.
For each test case, untar the instance, set the number of workers and start
it. After a short time autovacuum will start workers to analyze the new
tables. Expect to see the stats collector doing lots of writing.

You may want to use tmpfs or a ramdisk for the data dir for building the
test instance. The configuration is sized for reasonable desktop, 8 to 16GB
of memory and an SSD.

-dg
--
David Gould 510 282 0869 ***@sonic.net
If simplicity worked, the world would be overrun with insects.
Jeff Janes
2015-10-30 16:16:52 UTC
Permalink
Post by Tom Lane
Post by Alvaro Herrera
Of course, if we could avoid the pgstat check completely that would be
even better. The amount of pgstat traffic that causes is ridiculous.
Good point ... shouldn't we have already checked the stats before ever
deciding to try to claim the table?
If forces another pg_stat just before it claims the table, in case the
table had just finished being vacuumed.

Cheers,

Jeff
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Jeff Janes
2015-10-31 04:49:00 UTC
Permalink
Post by Tom Lane
Post by Alvaro Herrera
Post by David Gould
Anyway, they are not actually vacuuming. They are waiting on the
VacuumScheduleLock. And requesting freshs snapshots from the
stats_collector.
Oh, I see. Interesting. Proposals welcome. I especially dislike the
("very_expensive") pgstat check.
Couldn't we simply move that out of the locked stanza? That is, if no
other worker is working on the table, claim it, and release the lock
immediately. Then do the "very expensive" check. If that fails, we
have to re-take the lock to un-claim the table, but that sounds OK.
The attached patch does that. In a system with 4 CPUs and that had
100,000 tables, with a big chunk of them in need of vacuuming, and
with 30 worker processes, this increased the throughput by a factor of
40. Presumably it will do even better with more CPUs.

It is still horribly inefficient, but 40 times less so.

Cheers,

Jeff
David Gould
2015-10-31 06:19:52 UTC
Permalink
On Fri, 30 Oct 2015 21:49:00 -0700
Post by Jeff Janes
Post by Tom Lane
Post by Alvaro Herrera
Post by David Gould
Anyway, they are not actually vacuuming. They are waiting on the
VacuumScheduleLock. And requesting freshs snapshots from the
stats_collector.
Oh, I see. Interesting. Proposals welcome. I especially dislike the
("very_expensive") pgstat check.
Couldn't we simply move that out of the locked stanza? That is, if no
other worker is working on the table, claim it, and release the lock
immediately. Then do the "very expensive" check. If that fails, we
have to re-take the lock to un-claim the table, but that sounds OK.
The attached patch does that. In a system with 4 CPUs and that had
100,000 tables, with a big chunk of them in need of vacuuming, and
with 30 worker processes, this increased the throughput by a factor of
40. Presumably it will do even better with more CPUs.
It is still horribly inefficient, but 40 times less so.
That is a good result for such a small change.

The attached patch against REL9_5_STABLE_goes a little further. It
claims the table under the lock, but also addresses the problem of all the
workers racing to redo the same table by enforcing an ordering on all the
workers. No worker can claim a table with an oid smaller than the highest
oid claimed by any worker. That is, instead of racing to the same table,
workers leapfrog over each other.

In theory the recheck of the stats could be eliminated although this patch
does not do that. It does eliminate the special handling of stats snapshots
for autovacuum workers which cuts back on the excess rewriting of the stats
file somewhat.

I'll send numbers shortly, but as I recall it is over 100 times better than
the original.

-dg
--
David Gould 510 282 0869 ***@sonic.net
If simplicity worked, the world would be overrun with insects.
David Gould
2015-10-31 08:37:14 UTC
Permalink
On Fri, 30 Oct 2015 23:19:52 -0700
Post by David Gould
On Fri, 30 Oct 2015 21:49:00 -0700
Post by Jeff Janes
The attached patch does that. In a system with 4 CPUs and that had
100,000 tables, with a big chunk of them in need of vacuuming, and
with 30 worker processes, this increased the throughput by a factor of
40. Presumably it will do even better with more CPUs.
It is still horribly inefficient, but 40 times less so.
That is a good result for such a small change.
The attached patch against REL9_5_STABLE_goes a little further. It
claims the table under the lock, but also addresses the problem of all the
workers racing to redo the same table by enforcing an ordering on all the
workers. No worker can claim a table with an oid smaller than the highest
oid claimed by any worker. That is, instead of racing to the same table,
workers leapfrog over each other.
In theory the recheck of the stats could be eliminated although this patch
does not do that. It does eliminate the special handling of stats snapshots
for autovacuum workers which cuts back on the excess rewriting of the stats
file somewhat.
I'll send numbers shortly, but as I recall it is over 100 times better than
the original.
As promised here are numbers. The setup is a 2 core haswell i3 with a
single SSD. The system is fanless, so it slows down after a few minutes of
load. The database has 40,000 tiny tables freshly created. Autovacuum will
try to analyze them, but that is not much work per table so the number of
tables analyzed per minute is a pretty good measure of the recheck
overhead and contention among the workers.

Unpatched postgresql 9.5beta1 (I let it run for over an hour but it did not
get very far):

seconds elapsed actions chunk sec/av av/min
430.1 430.1 1000 1000 0.430 139.5
1181.2 751.1 2000 1000 0.751 79.9
1954.0 772.7 3000 1000 0.773 77.6
2618.5 664.5 4000 1000 0.664 90.3
3305.7 687.2 5000 1000 0.687 87.3
4010.1 704.4 6000 1000 0.704 85.2


A ps sample from partway through the run. Most of the cpu used is by
the stats collector:
$ ps xww | awk '/collector|autovacuum worker/ && !/awk/'
30212 ? Ss 0:00 postgres: autovacuum launcher process
30213 ? Ds 0:55 postgres: stats collector process
30221 ? Ss 0:23 postgres: autovacuum worker process avac
30231 ? Ss 0:12 postgres: autovacuum worker process avac
30243 ? Ss 0:11 postgres: autovacuum worker process avac
30257 ? Ss 0:10 postgres: autovacuum worker process avac



postgresql 9.5beta1 plus my ordered oids/high watermark autovacuum patch:

seconds elapsed actions chunk sec/av av/min
13.4 13.4 1000 1000 0.013 4471.9
22.9 9.5 2000 1000 0.010 6299.9
31.9 8.9 3000 1000 0.009 6718.9
40.2 8.3 4000 1000 0.008 7220.2
52.2 12.1 5000 1000 0.012 4973.1
59.5 7.2 6000 1000 0.007 8318.3
69.4 10.0 7000 1000 0.010 6024.7
78.9 9.5 8000 1000 0.010 6311.8
93.5 14.6 9000 1000 0.015 4105.1
104.3 10.7 10000 1000 0.011 5601.7
114.4 10.2 11000 1000 0.010 5887.0
127.5 13.1 12000 1000 0.013 4580.9
140.1 12.6 13000 1000 0.013 4763.0
153.8 13.7 14000 1000 0.014 4388.9
166.7 12.9 15000 1000 0.013 4638.6
181.6 14.8 16000 1000 0.015 4043.9
200.9 19.3 17000 1000 0.019 3113.5
217.5 16.7 18000 1000 0.017 3600.8
231.5 14.0 19000 1000 0.014 4285.7
245.5 14.0 20000 1000 0.014 4286.3
259.0 13.5 21000 1000 0.013 4449.7
274.5 15.5 22000 1000 0.015 3874.2
292.5 18.0 23000 1000 0.018 3332.4
311.3 18.8 24000 1000 0.019 3190.3
326.1 14.8 25000 1000 0.015 4047.8
345.1 19.0 26000 1000 0.019 3158.1
363.5 18.3 27000 1000 0.018 3270.6
382.4 18.9 28000 1000 0.019 3167.6
403.4 21.0 29000 1000 0.021 2855.0
419.6 16.2 30000 1000 0.016 3701.6

A ps sample from partway through the run. Most of the cpu used is by
workers, not the collector.
$ ps xww | awk '/collector|autovacuum worker/ && !/awk/'
872 ? Ds 0:49 postgres: stats collector process
882 ? Ds 3:42 postgres: autovacuum worker process avac
953 ? Ds 3:21 postgres: autovacuum worker process avac
1062 ? Ds 2:56 postgres: autovacuum worker process avac
1090 ? Ds 2:34 postgres: autovacuum worker process avac

It seems to slow down a bit after a few minutes. I think this may be
because of filling the OS page cache with dirty pages as it is fully IO
bound for most of the test duration. Or possibly cpu throttling. I'll see
about retesting on better hardware.

-dg
--
David Gould 510 282 0869 ***@sonic.net
If simplicity worked, the world would be overrun with insects.
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
David Gould
2015-10-31 10:20:51 UTC
Permalink
On Sat, 31 Oct 2015 01:37:14 -0700
David Gould <***@sonic.net> wrote:

I've added numbers from the same test setup with Jeff Janes patch.

postgresql-9.5beta1 with Jeff Janes vac_move_lock.patch:

seconds elapsed actions chunk sec/av av/min
116.1 116.1 1000 1000 0.116 516.8
224.0 107.9 2000 1000 0.108 556.1
356.4 132.4 3000 1000 0.132 453.2
489.8 133.4 4000 1000 0.133 449.8
602.5 112.7 5000 1000 0.113 532.3
706.6 104.1 6000 1000 0.104 576.3
801.9 95.3 7000 1000 0.095 629.5
886.3 84.4 8000 1000 0.084 710.9
983.0 96.7 9000 1000 0.097 620.5
1074.5 91.5 10000 1000 0.091 656.1
1168.2 93.7 11000 1000 0.094 640.1
1259.5 91.3 12000 1000 0.091 656.9
1350.3 90.8 13000 1000 0.091 661.1
1434.2 83.9 14000 1000 0.084 715.4
1516.0 81.9 15000 1000 0.082 733.0

ps sample during run. Note that stats collector is very busy.
3006 ? Ds 2:43 postgres: stats collector process
3014 ? Ss 1:42 postgres: autovacuum worker process avac
3044 ? Ss 1:29 postgres: autovacuum worker process avac
3053 ? Ds 1:27 postgres: autovacuum worker process avac
3060 ? Ss 1:25 postgres: autovacuum worker process avac
Post by David Gould
Unpatched postgresql 9.5beta1 (I let it run for over an hour but it did not
seconds elapsed actions chunk sec/av av/min
430.1 430.1 1000 1000 0.430 139.5
1181.2 751.1 2000 1000 0.751 79.9
1954.0 772.7 3000 1000 0.773 77.6
2618.5 664.5 4000 1000 0.664 90.3
3305.7 687.2 5000 1000 0.687 87.3
4010.1 704.4 6000 1000 0.704 85.2
A ps sample from partway through the run. Most of the cpu used is by
$ ps xww | awk '/collector|autovacuum worker/ && !/awk/'
30212 ? Ss 0:00 postgres: autovacuum launcher process
30213 ? Ds 0:55 postgres: stats collector process
30221 ? Ss 0:23 postgres: autovacuum worker process avac
30231 ? Ss 0:12 postgres: autovacuum worker process avac
30243 ? Ss 0:11 postgres: autovacuum worker process avac
30257 ? Ss 0:10 postgres: autovacuum worker process avac
seconds elapsed actions chunk sec/av av/min
13.4 13.4 1000 1000 0.013 4471.9
22.9 9.5 2000 1000 0.010 6299.9
31.9 8.9 3000 1000 0.009 6718.9
40.2 8.3 4000 1000 0.008 7220.2
52.2 12.1 5000 1000 0.012 4973.1
59.5 7.2 6000 1000 0.007 8318.3
69.4 10.0 7000 1000 0.010 6024.7
78.9 9.5 8000 1000 0.010 6311.8
93.5 14.6 9000 1000 0.015 4105.1
104.3 10.7 10000 1000 0.011 5601.7
114.4 10.2 11000 1000 0.010 5887.0
127.5 13.1 12000 1000 0.013 4580.9
140.1 12.6 13000 1000 0.013 4763.0
153.8 13.7 14000 1000 0.014 4388.9
166.7 12.9 15000 1000 0.013 4638.6
,,,
Post by David Gould
A ps sample from partway through the run. Most of the cpu used is by
workers, not the collector.
$ ps xww | awk '/collector|autovacuum worker/ && !/awk/'
872 ? Ds 0:49 postgres: stats collector process
882 ? Ds 3:42 postgres: autovacuum worker process avac
953 ? Ds 3:21 postgres: autovacuum worker process avac
1062 ? Ds 2:56 postgres: autovacuum worker process avac
1090 ? Ds 2:34 postgres: autovacuum worker process avac
-dg
--
David Gould 510 282 0869 ***@sonic.net
If simplicity worked, the world would be overrun with insects.
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
David Gould
2015-10-31 12:18:53 UTC
Permalink
I've collected strace output for a one minute interval with 4 autovacuum
workers on my test setup for stock 9.5beta1, Jeff Janes vac_mov_lock patch,
and my ordered oids / high watermark patch. These results are normalized to
times and number of calls per table analyzed by autovacuum.

To refresh, on this setup:

Patch Version TPM
none 85
vac_mov_lock 600
ordered oids 5225

Strace summary:

postgresql 9.5 beta1 cost per table @ 85 tables per minute.
-----------------------------------------------------------

calls msec system call [ 4 autovacuum workers ]
------ ------ -------------------
19.46 196.09 select(0, <<< Waiting for stats snapshot
3.26 1040.46 semop(43188238, <<< Waiting for AutovacuumScheduleLock
2.05 0.83 sendto(8, <<< Asking for stats snapshot

calls msec system call [ stats collector ]
------ ------ -------------------
3.02 0.05 recvfrom(8, <<< Request for snapshot refresh
1.55 248.64 rename("pg_stat_tmp/db_16385.tmp", <<< Snapshot refresh



9.5beta1 + Jeff Janes vac_mov_lock.patch cost per table @ 600 tpm
-----------------------------------------------------------------

calls msec system call [ 4 autovacuum workers ]
------ ------ -------------------
24.90 260.16 select(0, <<< Waiting for stats snapshot
1.41 0.02 sendto(8, <<< Asking for stats snapshot

calls msec system call [ stats collector ]
------ ------ -------------------
1.29 0.02 recvfrom(8, <<< Request for snapshot refresh
0.59 62.40 rename("pg_stat_tmp/db_16385.tmp", <<< Snapshot refresh



9.5beta1 + David Gould autovacuum ordering patch cost per table @ 5225 tpm
--------------------------------------------------------------------------

calls msec system call [ 4 autovacuum workers ]
------ ------ -------------------
0.63 6.34 select(0, <<< Waiting for stats snapshot
0.21 0.01 sendto(8, <<< Asking for stats snapshot
0.07 0.00 semop(43712518, <<< Waiting for AutovacuumLock

calls msec system call [ stats collector ]
------ ------ -------------------
0.40 0.01 recvfrom(8, <<< Request for snapshot refresh
0.04 6.75 rename("pg_stat_tmp/db_16385.tmp", <<< Snapshot refresh

-dg
--
David Gould ***@sonic.net
If simplicity worked, the world would be overrun with insects.
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
David Gould
2015-11-03 22:10:06 UTC
Permalink
On Fri, 30 Oct 2015 23:19:52 -0700
Post by David Gould
The attached patch against REL9_5_STABLE_goes a little further. It
claims the table under the lock, but also addresses the problem of all the
workers racing to redo the same table by enforcing an ordering on all the
workers. No worker can claim a table with an oid smaller than the highest
oid claimed by any worker. That is, instead of racing to the same table,
workers leapfrog over each other.
In theory the recheck of the stats could be eliminated although this patch
does not do that. It does eliminate the special handling of stats snapshots
for autovacuum workers which cuts back on the excess rewriting of the stats
file somewhat.
I'll send numbers shortly, but as I recall it is over 100 times better than
the original.
I sent performance test data and a setup for reproducing it elsewhere in
the thread. I also ran tests on a larger system (128GB mem, many cores, 2x
SSD with battery backed raid).

This is for an idle system with 100,000 new small tables to analyze. I ran
all the test for an hour or 5000 tables processed. "jj" refers to the patch
from Jeff Janes, "dg" refers to the attached patch (same as previous).

/autovacuum actions per minute/
workers 9.5b1 jj dg
------- ----- ---- -----
1 183 171 285
4 45 212 1158
8 23 462 1225


Could someone please take a look at the patch and comment? Thanks.

-dg
--
David Gould 510 282 0869 ***@sonic.net
If simplicity worked, the world would be overrun with insects.
Alvaro Herrera
2015-11-03 22:24:25 UTC
Permalink
Post by David Gould
This is for an idle system with 100,000 new small tables to analyze. I ran
all the test for an hour or 5000 tables processed. "jj" refers to the patch
from Jeff Janes, "dg" refers to the attached patch (same as previous).
/autovacuum actions per minute/
workers 9.5b1 jj dg
------- ----- ---- -----
1 183 171 285
4 45 212 1158
8 23 462 1225
Nice numbers.
Post by David Gould
Could someone please take a look at the patch and comment? Thanks.
1. What's with all the FIXMEs?

2. I think you need more of an explanation of what your patch actually
does.

3. Do we want to backpatch? Changes in behavior aren't acceptable on
existing branches, because it might destabilize autovacuum behavior
that's been carefully tuned in existing systems. So if we want
something to backpatch, ideally it shouldn't change the ordering in
which tables are vacuumed, and instead arrive at the same results
faster. (I don't care about this restriction myself, but others do and
strongly so.)

4. In the master branch, behavior changes are acceptable, so we can do
something more invasive.

5. Is it easier to use a binary heap rather than the OID list thing you
have? (see src/include/lib/binaryheap.h) I don't think so, but it's
worth asking. Note that older branches don't have this, so
backpatchable should not rely on it.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
David Gould
2015-11-07 00:46:49 UTC
Permalink
This post might be inappropriate. Click to display it.
David Gould
2015-10-31 04:23:17 UTC
Permalink
On Fri, 30 Oct 2015 12:17:11 -0300
Post by Alvaro Herrera
Post by David Gould
The cost delays are all 0. We care about bloat, not bandwidth.
Anyway, they are not actually vacuuming. They are waiting on the
VacuumScheduleLock. And requesting freshs snapshots from the
stats_collector.
Oh, I see. Interesting. Proposals welcome. I especially dislike the
("very_expensive") pgstat check.
I have an effective patch for this issue that I think is small enough to be
backpatched. I'm cleaning it up a bit and will send it this weekend.

-dg
--
David Gould 510 282 0869 ***@sonic.net
If simplicity worked, the world would be overrun with insects.
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Loading...