docker-library / postgres

Docker Official Image packaging for Postgres

Home Page:http://www.postgresql.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Upgrading between major versions?

roosmaa opened this issue · comments

There doesn't seem to be a good way to upgrade between major versions of postgres. When sharing the volume with a new container with a newer version of postgres it won't run as the data directory hasn't been upgraded. pg_upgrade on the other hand requires (?) old installation binary files, so upgrading the data files from new server container is also difficult.

It would be nice if there was some suggested way of doing this in the readme. Maybe even some meta container which does the upgrading from version to version?

The "Usage" section of this document describes what pg_upgrade does: http://www.postgresql.org/docs/9.4/static/pgupgrade.html

I guess my concern does not communicate well in the first comment... Lets try again.

pg_upgrade functionality is perfectly clear: you need to have both postgres [old] and [new] versions installed at the same time to upgrade the data structures on disk. But what is not clear is the best way of going about the upgrade when using containers, since every container only has a single version of postgres available/installed.

  • Should one create a custom upgrade-from-X-to-Y container with both X and Y versions of postgres installed?
  • Or should one just apt-get install new postgres version into the existing container, upgrade the data and then replace the container with one of the "vanilla" containers?
  • Or some even more clever way of doing this?

It would be fairly useful to have a section in the Readme regarding how to approach this. (And what to avoid?)

I've been wondering about/looking out for a good way to handle this
myself. If someone's got a good flow that works well for them, I'm very
interested in getting something documented (and "transition images" added
if that's what's necessary here -- this is a common enough thing IMO that
it doesn't really make sense to have everyone create their own for their
one-off needs).

Perhaps for each "supported" version, we could add a variant that also
installs the latest major release, and then we document how to create your
own one-off transition container if you need a transition more esoteric
than that?

@roosmaa: my comment wasn't so much directed at you as intended as background.

+1 for major version upgrade instructions.

Yet another alternative, for smaller databases (?), should be that the old container runs pg_dumpall and saves the dump somewhere in the volume. Then one stops the container, starts a new one with the new major version, and it imports the dump.

Yeah this is a major pain ... at the moment. Would be great to get some official way to do it.

Yeah, just nuked my postgres install on accident with an upgrade. Are there any plans for this?

I am not a docker guru, but I could imagine that it is possible to add some script to the container that is started when the container starts. That script should check the database files, and when it detects the correct version, simply startup the postgresql engine. If not, it should decide to run pgupgrade or dump/reload. Shouldn't be rocket science?

+1.

I'm just going to nuke what I have since it's just a personal server I don't really use, but this seems like a major pain maintenance wise.

I've had success upgrading launching another postgres instance with the new version, and then using dumpall and psql to move all the data piping:

docker exec postgres-old pg_dumpall -U postgres | docker exec -i postgres-new psql -U postgres

It's rather simple :)

pg_upgrade is supposed to be faster though, I'm interested if someone has an easy way of using it.

I was attempting to hack up a bash script to manage the upgrade process between two containers of different versions by using pg_upgrade, but I hit a roadblock:

#!/bin/bash

#
# Script to migrate PostgreSQL data from one version to another
#

set -e

OLD_CONTAINER=$1
OLD_VOLUME=$2

NEW_CONTAINER=$3
NEW_VOLUME=$4

if [ -z "$OLD_CONTAINER" ] || [ -z "$OLD_VOLUME" ] || [ -z "$NEW_CONTAINER" ] || [ -z "$NEW_VOLUME" ]; then
  echo -e
  echo -e "Usage: ./pg_upgrade_docker.sh [old container name] [old volume name] [new container name] [new volume name]"
  echo -e "Example: ./pg_upgrade_docker.sh postgres94 postgres94_data postgres95 postgres95_data"
  echo -e
  exit 1;
fi

# Get the major version and the pg binaries out of the old postgres container
(docker start "$OLD_CONTAINER" || true) > /dev/null
OLD_MAJOR="$(docker exec "$OLD_CONTAINER" bash -c 'echo "$PG_MAJOR"')"

OLD_BIN_DIR="/tmp/pg_upgrade/bin/$OLD_MAJOR"
mkdir -p "$OLD_BIN_DIR"
rm -rf "$OLD_BIN_DIR"/*
docker cp "$OLD_CONTAINER":"/usr/lib/postgresql/$OLD_MAJOR/bin/." "$OLD_BIN_DIR"

(docker stop "$OLD_CONTAINER" || true) > /dev/null

# Get the major version out of the new postgres container
(docker start "$NEW_CONTAINER" || true) > /dev/null
NEW_MAJOR="$(docker exec "$NEW_CONTAINER" bash -c 'echo "$PG_MAJOR"')"
(docker stop "$NEW_CONTAINER" || true) > /dev/null

# Create a temp container running the new postgres version which we'll just use to migrate data from one volume to another.
# This container will use the old binaries we just extracted from the old container.
# We can't reuse the existing "new" container because we have to bind extra volumes for the update to work.
NEW_IMAGE="$(docker ps -a --filter "name=$NEW_CONTAINER" --format "{{.Image}}")"
docker run -v "$OLD_BIN_DIR":/tmp/old-pg-bin -v "$OLD_VOLUME":/tmp/old-pg-data -v "$NEW_VOLUME":/tmp/new-pg-data \
  --name temp_postgres_util "$NEW_IMAGE" su - postgres -c "cd /tmp && /usr/lib/postgresql/$NEW_MAJOR/bin/pg_upgrade \
    -b /tmp/old-pg-bin -B /usr/lib/postgresql/$NEW_MAJOR/bin \
    -d /tmp/old-pg-data/ -D /tmp/new-pg-data/ \
    -o \"-c config_file=/tmp/old-pg-data/postgresql.conf\" -O \"-c config_file=/tmp/new-pg-data/postgresql.conf\""

# Remove temp container
(docker stop temp_postgres_util) > /dev/null
(docker rm temp_postgres_util) > /dev/null

rm -rf "$OLD_BIN_DIR"

echo -e "Data migration from $OLD_MAJOR to $NEW_MAJOR is complete!"

the idea is a bit convoluted, because I'm extracting the binaries from the old version and mounting them into a new temporary container created from the same image of the container with the new version, along with data volumes from existing containers (the old and the new ones).
My idea was then to use that container to run pg_upgrade, then throw it away (and data would have been migrated through the two volumes).
When running the script, though, I get the following error:

Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok

*failure*
Consult the last few lines of "pg_upgrade_server.log" for
the probable cause of the failure.

connection to database failed: could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/tmp/.s.PGSQL.50432"?


could not connect to old postmaster started with the command:
"/tmp/old-pg-bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/tmp/old-pg-data/" -o "-p 50432 -b -c config_file=/tmp/old-pg-data/postgresql.conf -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/tmp'" start
Failure, exiting

Any ideas?

I'm using @Dirbaio tricks to dump all data from old container and restore it into the new container. Of course, this needs separate data volume and run fast with small data sets. I hope that pr_upgrade can be more 'intelligent', even better if the Postgres itself could do the upgrade by itself. I mean, when installing the new version, the apps should also know how old version format looks like. Maybe also include a necessary old version binary to do the data 'upgrade' then after the upgrade finished, just delete that old version binary.

I think we should have a complete solution or none at all, as I doubt that most people always use the latest version.

From: Jonas Thiem [mailto:notifications@github.com]
Sent: Freitag, 20. Mai 2016 08:53
To: docker-library/postgres
Cc: Markus KARG; Comment
Subject: Re: [docker-library/postgres] Upgrading between major versions? (#37)

Ok so why can't the container be changed to simply have both the latest and the second-to-latest postgresql version installed? (e.g. just put a chroot into the docker container and install the older package from the distribution there or something. Once a script for that has be made that just needs to be passed the name of the respective older apt package, it shouldn't really be a lengthy process) At least that would cover the majority of users and it would allow to successfully convert the database for them automatically by the container.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub #37 (comment) https://github.com/notifications/beacon/ABn3tyqP3YQcazFg6O98ZNdeqZx_FYJwks5qDVpUgaJpZM4C_kVy.gif

Why is none at all any better? I would assume most people running a production server with some basic security consciousness will probably use at least some version of somewhere in the latest release cycle - such major upgrades aren't happening that often, right? (I'm thinking of the usual docker environment here optimized for flexibility and repeatability where upgrading and reverting if it goes wrong is usually less painful than for a regular oldschool server administrator) And if doing it for all versions is not feasible, at least doing it for the respective second-to-newest one should be doable..

If for the other cases where the upgrade can't be done there is a descriptive error message like there is now, I don't see how this wouldn't be at least a considerable improvement - although I agree that being able to upgrade from any arbitrary previous version automatically is of course better if it can be done.

It binds resources better invested into developing a general solution, and it possibly could stand in the way when coming up with the general solution.

-Markus

From: Jonas Thiem [mailto:notifications@github.com]
Sent: Montag, 23. Mai 2016 17:53
To: docker-library/postgres
Cc: Markus KARG; Comment
Subject: Re: [docker-library/postgres] Upgrading between major versions? (#37)

Why is none at all any better? I would assume most people running a production server with some basic security consciousness will probably use at least the latest release - they aren't happening that often, right? And if doing it for all versions is not feasible, at least doing it for the respective second-to-newest one should be doable..

If for the other cases where the upgrade can't be done there is a descriptive error message like there is now, I don't see how this wouldn't be at least a considerable improvement - although I agree that being able to upgrade from any arbitrary previous version automatically is of course better if it can be done.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub #37 (comment) https://github.com/notifications/beacon/ABn3t8Q4uNkFUuMUTDo-1BtrJaS5KDrRks5qEc1GgaJpZM4C_kVy.gif

Yes, resignation is the right word. We just do not see how it could be solved in a good way.

If you have time you can do whatever you like -- this is the nature of open source. Whether or not the project leads will accept your PR is a different question and not up to me to decide.

I spent a little time working on a generic way to run pg_upgrade for somewhat arbitrary version combinations (only increasing, ie 9.0 to 9.5, but not the reverse), and here's the result:
tianon/postgres-upgrade
(might be easier to read directly over at https://github.com/tianon/docker-postgres-upgrade/blob/master/README.md)

I haven't tested this yet but, since it's built on top of Debian, maybe an option would be to add some script to /docker-entrypoint-initdb.d like this, during the upgrade process (ideally making an snapshot of the original data first when using BTRFS for example):

Considering the current DB is on 9.4 and you want to migrate it to 9.5

apt-get update
apt-get install -y postgresql-9.5 postgresql-contrib-9.5
pg_dropcluster 9.5 main
pg_upgradecluster 9.4 main -m upgrade -k

This is the basic idea. However, this will add some unneccessary downtime. We could save some time by creating a temporary image:

FROM postgres:9.4
RUN apt-get update && apt-get install -y postgresql-9.5 postgresql-contrib-9.5
RUN echo "pg_dropcluster 9.5 main; pg_upgradecluster 9.4 main -m upgrade -k" > /docker-entrypoint-initdb.d/zzz-upgrade.sh

This is just to simplify the explanation, I'd probably use COPY rather than echo ... > ......

The container running from this new image would then only execute the remaining steps and would be run only once during the upgrade after stopping the previous 9.4 container and before starting the new 9.5 container. pg_upgradecluster will reuse the previous configuration when creating the new cluster.

Maybe to reduce the impact on users, one might want to run the 9.4 cluster in a read-only transaction during the upgrade so that it would mostly work and handle the write errors in the application to tell the users the database is being upgraded and that it's currently not possible to write new data to it... If upgrading the cluster would take a while even with the --link/-k option, maybe a read-only mode might give users a better experience rather than presenting them a static maintainance page for a long while... Maybe it could switch the header to let users know that it's in read-only mode during the upgrade in the meanwhile... It's more effort, but a better user experience for some sort of applications.

@Jonast, The only complete way is to have images for each set of supported versions; to ensure users can upgrade between them. @tianon, has a set of them in tianon/docker-postgres-upgrade. The other major problem is that it requires two folders (volumes) in specific locations to upgrade your database, since pg_upgrade requires both servers to be running and cannot just upgrade in place. Tianon's repo has more information on the requirements of this, which includes how to do it with one volume (properly setup), so that pg_upgrade can take advantage of it being a single drive and "use hard links instead of copying files to the new cluster".

The reason mariadb works to "upgrade automatically" is that the mariadb team has to put in extra work to make newer versions able to read older data directories, whereas postgres can make backwards incompatible changes in newer releases and not have to worry about file formats from older versions. This is why postgres data is usually stored under a directory of its version, but that would over complicate volume mounts if every version of postgres had a different volume.

@rosenfeld, unfortunately pg_dropcluster and pg_upgradecluster are specific tooling for the Debian provided packages and do not work with the Postgres provided apt/deb packages. Even after writing a new entrypoint to run the upgrade I just get failures, since it is assume specific directory structures and probably an init system:

$ # script:
#!/bin/bash

gosu postgres pg_ctl -D "$PGDATA" \
    -o "-c listen_addresses='localhost'" \
    -w start

gosu postgres pg_dropcluster 9.5 main
gosu postgres pg_upgradecluster 9.4 main -m upgrade -k

gosu postgres pg_ctl -D "$PGDATA" -m fast -w stop
exec gosu postgres "$@"

$ # shortened ouput (after initializing the db folder with a regular postgres image)
...
waiting for server to start....LOG:  database system was shut down at 2016-10-03 18:22:33 UTC
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
 done
server started
Error: specified cluster does not exist
Error: specified cluster does not exist

Hi @yosifkit. I don't understand why you think pg_upgradecluster being specific to Debian is a problem since the official PG images are based on Debian. As a matter of fact this is how I upgraded my PG servers from 9.5.4 to 9.6:

docker pull postgresql:9.6 # so we can easily start it after the upgrade is finished
docker run -it --rm --name pg-upgrade -v /mnt/pg/data:/var/lib/postgresql \
  -v /mnt/pg/config:/etc/postgresql --entrypoint bash postgres:9.5.4

# in the bash session in the container I installed 9.6:
apt-get update && apt-get install -y postgresql-9.6 postgresql-contrib-9.6
# Then, in another session, I stopped the server running current PG in order to
# start the upgrade (it's always a good idea to perform a back-up of the data
# before upgrading anyway).
pg_upgradecluster -m upgrade --link 9.5 main
# optionally: pg_dropcluster 9.5 main

After the upgrade is finished just start the new 9.6 container with the same arguments.

Here's how I run the container:

docker run --name pg -v /mnt/pg/scripts:/pg-scripts \
  -v /mnt/pg/data:/var/lib/postgresql \
  -v /mnt/pg/config:/etc/postgresql -p 5432:5432 \
  postgresql:9.6 /pg-scripts/start-pg 9.6 10.0.1.0/24

I stop it with docker exec pg pg_ctlcluster -f 9.6 main stop.

Here's how start-pg looks like:

#!/bin/bash
version=$1
net=$2
setup_db(){
  pg_createcluster $version main -o listen_addresses='*' -o wal_level=hot_standby \
    -o max_wal_senders=3 -o hot_standby=on -- -A trust
  pghba=/etc/postgresql/$version/main/pg_hba.conf
  echo -e "host\tall\tpguser\t$net\ttrust" >> $pghba
  echo -e "host\treplication\tpguser\t$net\ttrust" >> $pghba
  pg_ctlcluster $version main start
  psql -U postgres -c '\du' postgres|grep -q pguser || \
    createuser -U postgres -l -s pguser
  pg_ctlcluster $version main stop
}
[ -d /var/lib/postgresql/$version/main ] || setup_db
exec pg_ctlcluster --foreground $version main start

The server is actually managed by some systemd unit files, but this is basically how it works behind the scene.

What about a solution where the new container is using an image with an older postgresql version to launch the older version, then use the regular flow to perform the update?
(This is basically nesting the older postgresql container inside the new postgresql container).

It might be tricky to get it right, but at least the new postgresql image does not need to contain any previous postgresql version.

I love the simplicity of @Dirbaio's approach. You wouldn't have to know what two versions you're migrating between for it to work. If you're worried about the speed of that approach, it sounds like the Upgrade via Replication approach described in the docs would be the best option, over pg_upgrade, in terms of actual downtime.

@dschilling yes, if you can afford the time to set up slony (I started to read its documentation once but found it very complicated) or if you can't afford any downtime window, that's probably the best plan. Since I'm not comfortable with setting up Slony I preferred to use the method I explained above because the downtime is much smaller when you use pg_upgrade with --link. Since it can only upgrade the full cluster rather than a single database I decided to use our production database in a dedicated cluster so that the downtime would be as small as possible while using the pg_upgrade approach.

Instead of Slony, you could use pglogical:
https://2ndquadrant.com/en/resources/pglogical/

I plan to try this approach, using a HAProxy to switch from the old to the new version. Pglogical should be easier to use than Slony, but I still need to validate.

pglogical doesn't work with the vanilla official packages, that's why I'm not comfortable with using it. Also, in that case, you wouldn't be using this Docker image.

It is an extension, not a hack. You can put it in a layer on top of this image.
I agree on the main topic, that it would be nice to be able to do a pg_upgrade with this image.

I gave some thought to this problem. Here is what I came up with:

Given that you want to upgrade from 9.3 to 9.5, you can create two containers to migrate the old data to a new volume. I assume the data resides in local directories on the docker-host that are bind-mounted. To perform the upgrade, start one container for each version and share volumes of the 9.3-binarires and the data directory. You then can execute pg_upgrade and migrate to a new data directory. Afterwards you could mount the new data directory in your production environment with a postgres 9.5 image.

I wrote this down in a compose file:

version: '2'
services:
  old:
    image: postgres:9.3.14
    command: echo "done"
    volumes:
      - /usr/lib/postgresql/9.3
      - /usr/share/postgresql/9.3
  new:
    image: postgres:9.5.5
    command: >
      bash -c "gosu postgres initdb  && cd /tmp \
      && gosu postgres pg_upgrade \
         -b /usr/lib/postgresql/9.3/bin/ \
         -B /usr/lib/postgresql/9.5/bin/ \
         -d /old/ \
         -D /var/lib/postgresql/data/"
    volumes:
      - ./old:/old
      - ./new:/var/lib/postgresql/data
    volumes_from:
      - old

The first container is just started to fill the volumes with the required binaries. Before the upgrade can be performed, a database has to be initialized inside the 9.5-container. The upgrade has to be invoked by the postgres-user (therefore gosu postgres).

Actually the idea comes close to the suggestion that @asprega had.

What are your thoughts on this?

@dastrasmue thanks for the docker compose file. Here is a part of my script I used.

old_datadir=somewhere
rm -rf new_data && mkdir new_data

cat > docker-compose.yml << EOF
version: '2'
services:
  old:
    image: postgres:9.5
    command: echo "done"
    volumes:
      - /usr/lib/postgresql/9.5
      - /usr/share/postgresql/9.5
  new:
    image: postgres:9.6
    volumes:
      - ./new_data:/var/lib/postgresql/data
  upgrade:
    image: postgres:9.6
    command: >
      bash -c "
        cd /tmp && \
        gosu postgres pg_upgrade \
           -b /usr/lib/postgresql/9.5/bin/ \
           -B /usr/lib/postgresql/9.6/bin/ \
           -d /old/ \
           -D /var/lib/postgresql/data/"
    volumes:
      - $old_datadir:/old
      - ./new_data:/var/lib/postgresql/data
    volumes_from:
      - old:ro
EOF

docker-compose up -d old new
sleep 5
docker-compose stop new
docker-compose up upgrade

See some improvements

  • mount old data with ro
  • use docker entrypoint (e.g. create user, fix permission)

@asprega I hit the similar problem when I used plain docker commands because binaries tries to use lib under the same path - so copying them doesn't work.

commented

I just added #250 to address a problem with upgrading alpine. The postgres binaries are not in the local MAJOR.MINOR bin folder so the multi-container approaches above do not work.

I was able to successfully upgrade using the script from @chulkilee. However, I had to chown the contents of the alpine data folder to 999 (alpine uses 70), upgrade, and then chown back.

commented

Here is what I did, loosely, to upgrade 9.4 to 9.6:
(You probably want to discard the PGDATA environment variable)

docker run --name postgres94 -d -v /tmp/pg/mount:/var/lib/postgresql/data -e 'PGDATA=/var/lib/postgresql/data/pgdata' postgres:9.4
docker exec -it postgres94 /bin/bash

# Inside the container
cd var/lib/postgresql/data
touch db.sql
chown postgres db.sql
chmod 777 db.sql
su postgres
pg_dumpall > db.sql
exit

docker stop postgres94 && docker rm postgres94
docker run --name postgres96 -d -v /tmp/pg/mount:/var/lib/postgresql/data2 -v /tmp/pg/9.6:/var/lib/postgresql/data -e 'PGDATA=/var/lib/postgresql/data/pgdata' postgres:9.6
docker exec -it postgres96 /bin/bash

# Inside the container
cd var/lib/postgresql/data2
su postgres
psql -f db.sql

/tmp/pg/9.6/data/pgdata now contains the data folder for 9.6. The postgres 9.6 running image is usable.

@Tapuzi Yes, that works, but dumping and re-importing takes a long time for big databases, causing a lot of downtime/read_only time. Pg_upgrade is relatively fast, but not straightforward to use with the default docker images.
I think this thread is about upgrading Postgresql faster, or with less downtime than the approach you just described.

I created a script which uses tianon/docker-postgres-upgrade for upgrading data of postgres container in volume. I hope it will help someone

OLD_PG=9.5
NEW_PG=9.6
PG_DATA_VOLUME=data

docker volume create pg_data_new

# Upgrade db files to pg_data_new volume
docker run --rm -v $PG_DATA_VOLUME:/var/lib/postgresql/$OLD_PG/data -v pg_data_new:/var/lib/postgresql/$NEW_PG/data tianon/postgres-upgrade:$OLD_PG-to-$NEW_PG

# Move files to old volume
docker run --rm -v $PG_DATA_VOLUME:/old -v pg_data_new:/new postgres:$NEW_PG /bin/bash -c 'mv /old/pg_hba.conf /tmp && rm -rf /old/* && cp -a /new/. /old/ && mv /tmp/pg_hba.conf /old/'

docker volume rm pg_data_new

I got a better solution:

This is AFTER you changed the docker-compose.yml to the new version, then run these commands one by one. (ps this is only for the alpine version, you will have to change the apk add and the packages if you use the standard version, you will also have to change the version of postgres you're using (obviously :P ))

docker-compose run --no-dep --rm postgres bash; docker-compose down
  # install the old server 
  apk add --update alpine-sdk ca-certificates openssl tar bison coreutils dpkg-dev dpkg flex gcc libc-dev libedit-dev libxml2-dev libxslt-dev make openssl-dev perl perl-ipc-run util-linux-dev zlib-dev -y; curl -s https://ftp.postgresql.org/pub/source/v9.5.4/postgresql-9.5.4.tar.bz2 | tar xvj -C /var/lib/postgresql/; cd /var/lib/postgresql/postgresql-9.5.4/; ./configure --prefix=/pg9.5; make; make install
  # update the data 
  su postgres -c "initdb --username=postgres /var/lib/postgresql/data2; chmod 700 /var/lib/postgresql/data; pg_upgrade -b /pg9.5/bin/ -B /usr/local/bin/ -d /var/lib/postgresql/data/ -D /var/lib/postgresql/data2; cp /var/lib/postgresql/data/pg_hba.conf /var/lib/postgresql/data2/pg_hba.conf; rm -rf /var/lib/postgresql/data/*; mv /var/lib/postgresql/data2/* /var/lib/postgresql/data/"

Just for reference:

The OpenShift guys implemented an "automagical" POSTGRES_UPGRADE environment variable and they included both postgres-server versions in the image (9.5 and 9.6 ... upgrading from 9.4 is not possible):

https://hub.docker.com/r/centos/postgresql-96-centos7/

Image is around 111 MB in size... I think people would love to download a few extra bytes for convienient updates.

It seems to me what makes this "hard" is the need to mount two data directories from the host (as demonstrated by @tianon's helper images, except under ideal circumstances) and then manage the binding of the final image to the right folder. We use deployment automation so we can manage directories like this automatically, but it seems inefficient to force everyone to manage the same complexity outside of these images.

Possibly heretical, but food for thought:

  • What if data was instead stored in $PGDATA/$PG_MAJOR (or even $PGDATA/$PG_MAJOR/data)
  • A startup script searches (sequentially):
    • $PGDATA/$PG_MAJOR
    • Previous $PGDATA/$PG_MAJOR
    • $PGDATA (for backwards compatibility)
  • If no database is found, it creates $PGDATA/$PG_MAJOR
  • If it finds a database, it uses the PG_VERSION file to decide if an upgrade is necessary
  • If an upgrade is necessary, a separate (to make it easier to customize) upgrade script is called. By default, the script:
    • Installs the old binaries (i.e. the old PG_VERSION)
    • Upgrades using pg_upgrade
    • Uninstalls the old binaries

I appreciate that changing the interpretation of $PGDATA could be viewed as "significant", but it has benefits:

  • You need only mount a single folder from the host (with backwards compatible scripts)
  • The strategy is "generic" since it detects the current version to make upgrade decisions
  • The strategy can use pg_upgrade so it'll be fast
  • There's no need to overwrite/replace existing data; eventually, you delete the old $PG_MAJOR folder

Imho export & remimport via sql would be easiest way.
One could just put the old sql file in some volume and it could be imported automatically on start.

It seems to me what makes this "hard" is the need to mount two data directories from the host

Not exactly -- for me, the thing that makes this "hard" (and unacceptable by default for the official images) is that it requires having both the target old and target new versions of PostgreSQL installed in the image. To support arbitrary version bumps from an older version to a newer version, we'd have to install all versions (because the "export and reimport" method described is essentially how pg_upgrade works, in its most basic form).

To support arbitrary version bumps from an older version to a newer version, we'd have to install all versions

Why can't the upgrade script install the appropriate version (as I described), run the upgrade, then uninstall?

@Jonast I figure people with additional constraints like that are sophisticated enough to understand how to adjust their Docker workflow. Best to design a default for the "plug and play" users who aren't going to be able to troubleshoot something harder.

If you want to make it "even easier", offer a 10.0-upgrade image that pre-installs everything. A no-internet user can use that imagine for the upgrade. If they're concerned about space, they can switch back to 10.0 when done. Since the versions match, the upgrade logic will not run and the lack of internet will not matter.

Philosophically, I think the ideal strategy allows most people to change 9.4 to 10.0 without changing mount points or running any special commands. In my proposal, internet-enabled users can. Internet-disabled users have to use the -upgrade image (either always or transiently).

@Jonast whether or not you bake in the second-to-last is orthogonal to the substance of my proposal. For that matter, why not include all of the binaries and... after the upgrade logic runs (or is skipped)... uninstall all of them? Won't that simultaneously support all upgrades (including your "best practice") and still minimize the final footprint?

My (orthogonal) question is how do you "allow... people to change 9.4 to 10.0 without changing mount points or running any special commands".

  • I don't think there's any debate that the right script can handle the "no special commands" part.
  • But I don't see how it works with the current binding/storage/$PGDATA strategy.

The more user-friendly versions of @tianon's scripts address the binding issue by binding "up" a level (or two) so the versions are found in $PG_MAJOR folders. I think it makes sense to generalize @tianon's solution for the actual postgres containers (with some backwards compatibility).

@claytondaley

For that matter, why not include all of the binaries

Because the image size will blow up if you have them installed there and only remove them as soon as the container launches. I wouldn't mind it much, but others in this thread have expressed concern.

Currently there is no official statement how to upgrade, isn't it?!?

The "official statement" is that you upgrade in Docker just like you would any other instance of PostgreSQL: via pg_upgrade.

The only thing that makes doing so tricky in the case of these images is the need to have both the new and old versions installed concurrently. See https://github.com/tianon/docker-postgres-upgrade (which is linked a couple times in the previous comments) for where I'm maintaining a set of images which accomplish this in the bare minimum way possible (which you can either use directly or refer to in building your own solution).

Is there no way of a SQL dump in the old version & reimport it in the new version to do an upgrade?

Here's my two cents:

  • dump & restore may work, but you should prefer pg_upgrade for many reasons (search it)
  • upgrading PostgreSQL data into the version of current running docker automatically should not happen; it's always safer to require explicit instruction to perform backward-incompatible changes

PostgreSQL has the official way to upgrade, and you can do it. The question is whether this "official" postgres docker image should support it out of box. It's good to have, technically it needs binary in both version - the current version and your version, which is impractical to include it in docker image.

I believe the consensus of "official docker image" is "minimum wrapper of standard package". It is not full installer with upgrade feature. This upgrade problem is nothing special with docker..

I think the documentation of postgres docker image just need to mention pg_upgrade - and probably link to this issue (or even https://github.com/tianon/docker-postgres-upgrade).

Dump and reload is an OK solution (although definitely sub-optimal, as described above) -- just use pg_dump like you normally would, then import it into the new version and you're good to go. As noted above, that's basically how pg_upgrade works in many cases (and why it needs both versions installed).

Sarusso, it's complicated. In general, upgrades to a new minor release within the same major release won't change the on-disk data format. So, upgrading from ~~~9.5 to 9.6 or~~~ 10.3 to 10.4 should work just fine keeping the same data files. Upgrading a major release, say from a 9.x.y release to a 10.x, is more complex. The data file structure can change, sometimes significantly, between major releases. I'm not a developer for the project, but I can guess that it would be a lot of work to maintain a way to read the old format and then re-write the files in the new format. I can't even imagine the bugs and the possible data loss concerns of something like that. Luckily, they have very long support policies for releases, giving lots of time to figure out a migration strategy.

There actually is support for migrating data files in PostgreSQL, it's called pg_upgrade. The Docker container is special in that it's difficult to use this support because it needs both the older and the newer versions installed on the same system. The Docker container isn't set up to work this way. Alternately, you can also look into replication if you have large data sets or absolutely can't have any downtime. Options are actually discussed on this official documentation page.

I'd suggest that if you don't need any new feature in 10, move to the latest 9.x release you can (9.3.23 is latest in the 9.3 series right now) and then read the information about ways to upgrade and plan something out. You have a ~~~while to do so, until 2021~~~ bit of time, until September 2018 if you care about active support. Of course, if it's working fine, then you have even longer if you don't mind no active support.

Edit: Corrected myself a bit as I was confused on the pre-10 version scheme.

@srvrguy sorry I deleted my comment just few minutes after posting as on second thoughts it seemed to me kind of useless for the thread, but your (very well structured) answer actually says otherwise.

Here is my original comment for future reference:

"Why Postgres (not the containers!) does not include support for migrating its own data files from the previous major version completely autonomously? Too complicated?

I just upgraded my Postgres from 9.3 to 10 and I was expecting everything to work by just renaming the data directory. I might have been a bit too optimistic hoping to be able to jump from 9.3 to 10, but Postgres 10 being able to accept data files from 9.6 was the expected behaviour. And so on backwards, i.e. 9.3 to 9.4, 9.4 to 9.5, and 9.5 to 9.6.

Dumping the database is not an option, just too much data..."

@srvrguy thank you for your reply.

My understanding is that even upgrading from 9.3 to 9.4 is not possible without running pg_dump or pg_upgrade as they are major, not minor. So the last minor where I can update without too much hassle is 9.3.23 (supported until September).

By the way they changed major/minor numbering from version 10 (questionable choice) :)

We have no replication (nor interest in diving into it) since we run a single Postgres on our staging platform (where we would like to upgrade) and just use RDS in production.

I will give it a proper try with pg_upgrade, for my specific use case it would not be a problem at all to have two Postgres version in the same container, but for the official containers the problem remains..

Sorry, I did mis-read. It looks like you are correct on the 9.x releases. You do have until September before 9.3 is officially "unsupported", at least.

If you're using the Docker images and can't do a dump and restore, you could try the images at https://github.com/tianon/docker-postgres-upgrade which are designed to allow pg_upgrade to work. Of course, I'd try with a copy of the data first to make sure it works and you have the steps worked out.

I had Postgresql 10 upgraded to 11 in a docker-compose environment.

I went the dump-and-import path and that is frustration-free. My DB's service name is db and the old volume was also db

  1. I duplicated the old service definition, changed the name and chose a new volume.
  db11:
    image: postgres:11
    ...
    volumes:
      - db11:/var/lib/postgresql/data
  1. Then I dumped the old DB and imported into the new
docker-compose up --no-deps -d db db11

docker-compose exec db pg_dumpall -U USERNAME > pgdump10
docker-compose exec -T db11 psql -U USERNAME < pgdump10
# if you are fancy, you can pipe them
# docker-compose exec db pg_dumpall -U USERNAME | \
#   docker-compose exec -T db11 psql -U USERNAME
  1. After adapting the docker-compose.yml to use the new DB under the old name
  db:
    image: postgres:11
    ...
    volumes:
      - db11:/var/lib/postgresql/data
  1. I deleted the old volume after checking the new one is error-free
docker volume rm ....._db

-- I feel very safe with this way because on error you can always go back to the old data and PG version.

I am wondering about automation possiblities. I guess we could have a service that watches Postgresql-containers for the "incompatible versions" error via the Docker API and starts a fitting Postgresql-version for the dump and import (or pg_upgrade).

@jonaseberle Great idea, to have two containers in parallel: pg10 and pg11. I think I'll do that me too. Thanks for the dump-import code snippet :- )

@jonaseberle Thank you very much! Worked great! Just had to add the -T flag to the restore command (because I got "the input device is not a TTY", see docker/compose#5696) like this:

docker-compose exec -T db11 psql -U USERNAME < pgdump10

@LouisMT thank you for pointing that out. I had to add -T, too but forgot to write it. I editted my comment.

Upgrading Docker image: postgres:10 -> postgres:latest

I just tried following the upgrade documentation (https://www.postgresql.org/docs/10/pgupgrade.html)

However, when stopping the database with pg_ctl it'll also signal for the container to stop (thus breaking the upgrade process before you can get to the pg_upgrade step

The postgres docker image needs an environmental variable to explicitly upgrade the database to the latest version if postgres server is started with an incompatible database version

Or there needs to be a docker image to do the upgrade, like postgress:db10to11 or something

Or there needs to be a docker image to do the upgrade, like postgress:db10to11 or something

This PoC seems to work: https://github.com/tianon/docker-postgres-upgrade

I have made a Proof of Concept of a new entrypoint.sh script, to perform both major PG version upgrades (with pg_upgrade) and schema updates. It is far from perfect, see the todo 🙂.

The benefit is that it just upgrades with:

$ docker-compose up -d

https://github.com/bwbroersma/docker-postgres-upgrade

This is also an issue with image tags. I used 12 not thinking that beta releases would ever be paired with release tags. I'm now stuck on 12-beta3 having to deal with updating to 12/12.0. Such a pain.

It's 2021 now, is there any progress on this issue? 🙏🏾

It's 2021 now, is there any progress on this issue? 🙏🏾

As far as I know, none of the upstream PostgreSQL upstream limitations on upgrading have changed.

commented

I'd highly recommend to try tianon/docker-postgres-upgrade for an upgrading scenario. It's clearly meant to be a PoC, but in my case (Upgrade 9.5 to 9.6 of an Gitlab instance) it worked out perfectly.

Looking at the code and procedure of tianon, I'd say it's built in a very flexible and comprehensible way.

I've done the following to upgrade from 10 to 12:

  $ docker run --d -rm --name postgres -e POSTGRES_PASSWORD=mysecret -v $PWD/postgres-data:/var/lib/postgresql/data postgres:10
  $ docker exec -it postgres pg_dumpall -U postgres -f /var/lib/postgresql/data/dump
  $ docker stop postgres
  $ mv postgres-data postgres-data-old
  $ docker run -d --rm --name postgres -e POSTGRES_PASSWORD=mysecret -v $PWD/postgres-data:/var/lib/postgresql/data postgres:12
  $ cp postgres-data-old/dump postgres-data/dump
  $ docker exec -it postgres pg_restore -U postgres -f /var/lib/postgresql/data/dump

Once it all works: rm -rf postgres-data-old postgres-data/dump.

docker exec -it postgres pg_restore -U postgres -f /var/lib/postgresql/data/dump

That line hung when I tried it. Using psql did work:

docker exec -it postgres bash
psql -U postgres -d postgres < /var/lib/postgresql/data/dump

According to the pg_restore docs, -f expects an output file name.

(As I have a 400GB production database, running pg_dumpall is just not an option.)

I don't use the base images but the alpine ones, so @tianon's images do not work, probably because of differences in "stuff".
It saved me time though, and I used it as a basis for a very simple Dockerfile and its associated docker-upgrade entrypoint:

FROM postgres:13-alpine AS old

FROM postgres:14-alpine

COPY --from=old /usr/local /usr/local-old

ENV PGBINOLD /usr/local-old/bin
ENV PGBINNEW /usr/local/bin

ENV PGDATAOLD /var/lib/postgresql/13/data
ENV PGDATANEW /var/lib/postgresql/14/data

RUN mkdir -p "$PGDATAOLD" "$PGDATANEW" \
	&& chown -R postgres:postgres /var/lib/postgresql

WORKDIR /var/lib/postgresql

COPY docker-upgrade /usr/local/bin/

ENTRYPOINT ["docker-upgrade"]

# recommended: --link
CMD ["pg_upgrade"]

The only change to the entrypoint is s/gosu/su/:

#!/bin/bash
set -e

if [ "$#" -eq 0 -o "${1:0:1}" = '-' ]; then
	set -- pg_upgrade "$@"
fi

if [ "$1" = 'pg_upgrade' -a "$(id -u)" = '0' ]; then
	mkdir -p "$PGDATAOLD" "$PGDATANEW"
	chmod 700 "$PGDATAOLD" "$PGDATANEW"
	chown postgres .
	chown -R postgres "$PGDATAOLD" "$PGDATANEW"
	exec su postgres "$BASH_SOURCE" "$@"
fi

if [ "$1" = 'pg_upgrade' ]; then
	if [ ! -s "$PGDATANEW/PG_VERSION" ]; then
		PGDATA="$PGDATANEW" eval "initdb $POSTGRES_INITDB_ARGS"
	fi
fi

exec "$@"

With this, I was able to very simply run (the locale was only because I my postgres 13 was init'ed with it, you will most probably don't need it.):

docker run --rm \
  -v /data/docker-vol/postgres/data13:/var/lib/postgresql/13/data \
  -v /data/docker-vol/postgres/data14:/var/lib/postgresql/14/data \
  -e POSTGRES_INITDB_ARGS=--locale=fr_FR.UTF-8 \
  pg-upgrade:13-to-14

I have a 700GB TimeScaleDB running in Docker on Ubuntu and I was able to use this approach with the following modifications. Thanks to @tianon and @mat813.

I upgraded from timescaledb:latest-pg12 to timescaledb:latest-pg14.

I adjusted the Dockerfile from @mat813 so it pulls the appropriate timescaledb images:

FROM timescale/timescaledb:latest-pg12 AS old

FROM timescale/timescaledb:latest-pg14

COPY --from=old /usr/local /usr/local-old

ENV PGBINOLD /usr/local-old/bin
ENV PGBINNEW /usr/local/bin

ENV PGDATAOLD /var/lib/postgresql/12/data
ENV PGDATANEW /var/lib/postgresql/14/data

RUN mkdir -p "$PGDATAOLD" "$PGDATANEW" \
	&& chown -R postgres:postgres /var/lib/postgresql

WORKDIR /var/lib/postgresql

COPY docker-upgrade /usr/local/bin/

ENTRYPOINT ["docker-upgrade"]

# recommended: --link
CMD ["pg_upgrade"]

The docker-upgrade script was adjusted slightly to make the exec su postgres work with the --link (or any other) argument. I could not get it to work on one line and my bash skills are very limited, but preparing the command in a separate variable first ('cmd') and adding -c works:

#!/bin/bash
set -e

if [ "$#" -eq 0 -o "${1:0:1}" = '-' ]; then
        set -- pg_upgrade "$@"
        echo "Argument(s) found, setting command to $@"
fi

if [ "$1" = 'pg_upgrade' -a "$(id -u)" = '0' ]; then
        echo "Setting permissions..."
        mkdir -p "$PGDATAOLD" "$PGDATANEW"
        chmod 700 "$PGDATAOLD" "$PGDATANEW"
        chown postgres .
        chown -R postgres "$PGDATAOLD" "$PGDATANEW"
        echo "Calling $BASH_SOURCE again but now as user postgres..."
        cmd="$BASH_SOURCE $@"
        exec su postgres -c "$cmd"
fi

if [ "$1" = 'pg_upgrade' ]; then
        if [ ! -s "$PGDATANEW/PG_VERSION" ]; then
                echo "No database found for new versions, calling initdb"
                PGDATA="$PGDATANEW" eval "initdb $POSTGRES_INITDB_ARGS"
        fi
fi

echo "executing $@"
exec "$@"

I also added some echo statements to have a sense of what is happening.

Build the container to see if everything is OK:

docker build -t timescaledb-upgrade -f Dockerfile .

And run with the --check argument to check the upgrade without actually doing it:

docker run --rm -v /mnt/volume_dbdata1/postgresql:/var/lib/postgresql timescaledb-upgrade:latest --check

Note that my db data is at /mnt/volume_dbdata1/postgresql, following the structure as described by @tianon (in subdirectory /12/data, and /14/data will be created).

If all looks good, go ahead with the actual upgrade:

docker run --rm -v /mnt/volume_dbdata1/postgresql:/var/lib/postgresql timescaledb-upgrade:latest --link

or, if required, without the --link argument.

Note that the line host all all all md5 was missing from pg_hba.conf in the /14/data directory, and I added it manually. Seems to be a know issue: tianon/docker-postgres-upgrade#16

I would recommend testing the whole thing on dummy databases like is done in the script at the end of the readme in @tianon 's repo:

https://github.com/tianon/docker-postgres-upgrade

commented
  1. Is there a reason why PG dont can upgrade the tables itself? i mean all major DB´s just handle this without problems, running mariadb for years with latest tag without problems. PG is just pain in this regard. instead of running up to date PG instances i run them to the point the application complains. This is just a bad way to handle updates.
  2. Confusion on this topic is huge, i mean evryone has a diffrent way of handle upgrades, sharing wild scripts, tipps and tricks. Pain too.
  3. Im not sure if hacking something into the container to fix an "issue" that should be fixxed in the main application is the right way. It´s better than nothing but anyway.
  4. i dont use it but if im right orchestraion like kubernetes must be the most pain with this.

Hope that there will be any progress on this topic, seems like its been ignored for years now.

@ wuast94 The problems arise from the way PostgreSQL binary handling upgrades. Postgresql exists way before containerization. When PostgreSQL runs in the machine (real or VM), when the newer version of Postgres is installed, the previous version still exists in the device. The upgrade script could run because both new and old binary is accessible in the machine. Now comes containerization, where each image only contains the binary for the specific version. For new data, this is ok. But upgrading data from the old version now didn't work, because the binary of the old version didn't exist in the image.

commented

Kinda just sounds like lazy engineering. It's been 20 years..

@naisanzaa docker exists less than 10 years...

@naisanzaa maybe propose or implement a solution if it's important to you

Not really a "turnkey solution" to this upgrade problem, but there is a maintained Docker image provided by Zalando which includes several versions of PostgreSQL and the TimescaleDB extension: https://github.com/zalando/spilo.
For example:

$ docker run -ti registry.opensource.zalan.do/acid/spilo-15:3.0-p1 bash
# ls /usr/lib/postgresql/*/bin/pg_upgrade
/usr/lib/postgresql/10/bin/pg_upgrade  /usr/lib/postgresql/12/bin/pg_upgrade  /usr/lib/postgresql/14/bin/pg_upgrade
/usr/lib/postgresql/11/bin/pg_upgrade  /usr/lib/postgresql/13/bin/pg_upgrade  /usr/lib/postgresql/15/bin/pg_upgrade

You can run pg_restore on a dump (from pg_dump/pg_dumpall) made with a previous version without having older binaries installed. You can have a method where you dump the data before exiting your container on old version and start with a dump when upgrading (beware of preventing modifications while dumping the database).

This kind of what this comment is about (though there is a mistake as you need to remove -f since this is used for logging). But it can be automated in a better way, e.g. in an init/sidecar container.

I don't think this should be built in docker images, since it is dependent to your workflow.

pg_upgrade is useful if you want to have a shorter downtime, but this would work best if the container is able to install newer PostgreSQL versions by itself, without restarting/replacing the container (probably with a system package upgrade). This is also out of the scope of this Docker image.

commented

When PostgreSQL runs in the machine (real or VM), when the newer version of Postgres is installed, the previous version still exists in the device. The upgrade script could run because both new and old binary is accessible in the machine.

@donnykurnia Not exactly docker-only problem. For a very long time Postgres installed on macOS via Homebrew had the same issue - new version would be installed, old removed and during startup you would have the same error about having to run the upgrade (possibly it's still an issue, have migrated to docker flavour for ease of maintenance; though, in Homebrew version it should be possible to run the Postgres upgrade while upgrading the installation itself).

Nevertheless, reliance on older version is simply highly inconvenient nowadays...

@woj-tek true. Perhaps the upgrade script could download the older binary as needed, and perform the upgrade. Is this feasible?

commented

Probably possible, though from where? https://www.postgresql.org/download/? And it would hammer Postgresql servers each time someone upgrades the server? So in addition apart from downloading new version there would be another download for older version?

What people actually expect in the containerized world is that containers are short-lived and auto-updating, hence the PostgreSQL image should automatically upgrade the cluster on each container start. Everything else feels like an alien, but not as a good Docker citizen. Actually I do not understand what the technical problem is. Why can't PostgreSQL not simply come with an upgrade tool that knows how to handle ancient file formats?

If anyone's interested in trying out an (alpine based) PostgreSQL docker image that automatically upgrades the PG data files (versions 9.5 through 14) to PG 15, before running the PG 15 server, then I've been putting together a working version here:

https://github.com/justinclift/docker-pgautoupgrade

Haven't yet published it to Docker Hub, but will do so likely over the next few days.

It doesn't yet use --link for the pg_upgrade script, so don't use it on large data sets at the moment. But I have a clear idea (in theory) on how to get that working, so it should be implemented in a few days. Should work fine on larger sized databases now, as long as everything is in the one data directory.

IMPORTANTLY - DO NOT use this on a production database without having a complete backup you can restore quickly and easily! This docker image isn't widely enough tested for that yet, and the edge case where it blows up might be yours. 😄


Some other data points:

  • It's only been tested with x86_64 architecture so far, and works ok there (for me)
  • The image size (as created by build.sh in the repo) is 143.76MB.
    • Might be able to knock that down a bit with some further work, but that's a task for some other time

Initial build of that auto-upgrading PG docker container is on Docker Hub:

https://hub.docker.com/r/pgautoupgrade/pgautoupgrade

Use the 15-alpine3.8 tag (to upgrade to latest in PG 15.x series), or :dev if you're feeling adventurous. 😄

Please try it out and let me know how it goes, either here or in the GitHub repo.

Note that it uses --link for the pg_upgrade step. So it should work fairly quickly (~1 minute maybe) even for fairly large databases, as long as they're not using multiple data directories (which isn't supported).

What people actually expect in the containerized world is that containers are short-lived and auto-updating, hence the PostgreSQL image should automatically upgrade the cluster on each container start.

I think you are very wrong here. Short-lived? For an on-demand application, sure. For your database, not as much. Auto-updating? Never. You expect a container to be stable. No "auto" changing anything. Stability is the primary point. If you want a new version of a container, you pull a new version. Otherwise, the container should not "auto" do anything.

Now, if you're talking about auto migrating a database data store from one version of the DB to another -- with an appropriate flag -- when starting an newer container version, that's another story.

commented

I cant go with that, containers should be updated regularly because of security patches

I cant go with that, containers should be updated regularly because of security patches

Of course they should be updated for security patches, but if a security update changes my database (or auto upgrades it), there is a bigger problem.

The last thing a package like this should do is something surprising.

commented

The last thing a package like this should do is something surprising.

On the other hand, if you give everyone the option to use a specific image tailored their needs, there is no surprise, if auto-upgrades are expected with that particular image. 😃

Though, yes, the default expectation is, that the running container is running in a stable & foreseeable way.
With that said, containers are still essentially very ephemeral entities. Even a database container should be restartable anytime, without any "damage" except the downtime due to the container recreation/restart.

The last thing a package like this should do is something surprising.

The ideal solution would be to have a flag (e.g. env variable or command line option) that enables the auto-upgrade.

In general I think that a simple option for the upgrade is really needed (this issue is one of the major reasons why I still run databases without containers).

With that said, containers are still essentially very ephemeral entities.

Containers - certainly. No disagreement here.

The data volumes, not so much. Those aren't ephemeral at all, so you shouldn't do anything "automatic" to them.

For containers (specifically like PostgreSQL), I think the default should be to do the least surprising thing for the user.

But really, this is just a point to say - hide the upgrade magic behind an environmental variable flag. By all means, yes -- please add this functionality. But if the user tries to start a new container with a data volume that contains an older DB, print an error message and quit.

The user may have pulled ":latest" and not known there was a major version change. They might have multiple containers trying to read this data (don't ask me why).

But if that was what the user intended, adding an env flag to force the upgrade is entirely appropriate.

But really, this is just a point to say - hide the upgrade magic behind an environmental variable flag. By all means, yes -- please add this functionality. But if the user tries to start a new container with a data volume that contains an older DB, print an error message and quit.

For reference: This is what mariadb does, see MARIADB_AUTO_UPGRADE https://hub.docker.com/_/mariadb

commented

The ideal solution would be to have a flag (e.g. env variable or command line option) that enables the auto-upgrade.

But really, this is just a point to say - hide the upgrade magic behind an environmental variable flag. By all means, yes -- please add this functionality. But if the user tries to start a new container with a data volume that contains an older DB, print an error message and quit.

Why would you want to hide the upgrade behind a flag?

Usually the deployment is based on the version tag - if you want stability you use (a) the most detailed version (postgres:x.y-flavour), you may be interested only in (b) major version (postgres:x-flavour) or (c) don't care about that and use latest, which would involve major version upgrade (which is the most problematic thing and the gist here). In case of (a) and (b) there is no much problem. But in case of (c) you already specify, that you want to use the most recent image version, which would involve upgrading from lets say 14 to 15, and this upgrade should happen automatically as, by specifying latest you explicitly agree to the upgrade. If you want to have more control and avoid problems then just use (a) or (b). I don't see much sense i overcomplicating it...

As a data point, after reading through a bunch of feedback yesterday the automatic upgrade container I've been working on now has both a latest tag (for auto updating to whatever the latest PG version is), and also has pinned major PG versions (eg 15-alpine3.8).

So, for people who want to upgrade to PG 15 (from any of PG 9.5 -> 14.x), they can use that pinned version tag.

Seems like a widely suitable approach yeah?

@justinclift So in your solution it will auto-upgrade in any case (both with latest and with any other tag)?

And if someone wants to keep a specific version of PG they can use a specific tag of the image (e.g. 15) instead of latest?

If that is the proposed solution, I definitely vote for it.

Yep, that's the approach. 😄

commented

The ideal solution would be to have a flag (e.g. env variable or command line option) that enables the auto-upgrade.

But really, this is just a point to say - hide the upgrade magic behind an environmental variable flag. By all means, yes -- please add this functionality. But if the user tries to start a new container with a data volume that contains an older DB, print an error message and quit.

Why would you want to hide the upgrade behind a flag?

Usually the deployment is based on the version tag - if you want stability you use (a) the most detailed version (postgres:x.y-flavour), you may be interested only in (b) major version (postgres:x-flavour) or (c) don't care about that and use latest, which would involve major version upgrade (which is the most problematic thing and the gist here). In case of (a) and (b) there is no much problem. But in case of (c) you already specify, that you want to use the most recent image version, which would involve upgrading from lets say 14 to 15, and this upgrade should happen automatically as, by specifying latest you explicitly agree to the upgrade. If you want to have more control and avoid problems then just use (a) or (b). I don't see much sense i overcomplicating it...

As a data point, after reading through a bunch of feedback yesterday the automatic upgrade container I've been working on now has both a latest tag (for auto updating to whatever the latest PG version is), and also has pinned major PG versions (eg 15-alpine3.8).

So, for people who want to upgrade to PG 15 (from any of PG 9.5 -> 14.x), they can use that pinned version tag.

Seems like a widely suitable approach yeah?

@justinclift So in your solution it will auto-upgrade in any case (both with latest and with any other tag)?

And if someone wants to keep a specific version of PG they can use a specific tag of the image (e.g. 15) instead of latest?

If that is the proposed solution, I definitely vote for it.

The problem with this approach is, that it does not conform with what 99% of experienced Docker users expect. Either, you have to make a big deal out of it, write an RFC or something, so it's "official" or you have to choose a non-invading way.

The easiest & probably most suitable solution for everyone involved is the following.

Everything that has been said about the tag versions stays the same, except that the normal tags, like latest <number> (e.g. 15) stay exactly the same & behave exactly the same as before. So, if I'm getting a 15 tag, then I expect it to be a postgres of version 15, rather than an auto-upgrade bundle, which potentially damages my database, in case I accidentally downloaded the wrong version. This example isn't even uncommon. Happens all the time.
In my case, I have tons of different postgres databases, which range from version roughly 11 to roughly 15.
It's very easy to accidentally set a too high version, without the intention of upgrading.
You don't want to upgrade, if upstream provides version specific .sql scripts or something similar.

Coming from this experience & learning from it, I propose that all the tags which are an auto-upgrade bundle, must have the auto-upgrade or upgrade or up tag suffix attached. All other, normal, versions should work, as they are expected by the vast majority of heavy users.

It's not uncommon to not read the entire documentation regarding such very basic containers, like postgres, when you are dealing with such things every day & handle tons of different images & resulting containers. This is for example my experience, too.

My biggest point is: you cannot make something potentially breaking & unexpected the default. You need to make the default the "dumb" way, which is safe, as in it won't actually inherently change the database to a different format, version, whatever.
No matter which software you choose, there is always a risk in having a major version upgrade breaking your current working installation. Sometimes, software with bad manners even breaks on minor or rarely even on patch versions!

You also have to weigh the pros & cons regarding the versions.
For example, with something purely in the backend, you usually don't care too much about the major version. Security fixes arrive in almost all cases through patch versions. A major Postgres version won't change anything or bring business value, except you specifically wait for a new feature, which will barely happen in a mature piece of software, like PostgreSQL, or if your current major version is already fully deprecated & unsupported, but this happens so rarely, you might as well do manual auto-upgrades via https://hub.docker.com/r/tianon/postgres-upgrade/. I have been there too & have already experience with upgrading tons of PostgreSQL deployments on various Kubernetes clusters via this image's containers.

I think we all agree, that in the backend software world, we never expect to have automatic upgrades enabled by default, just like that. Especially, when we are talking about the actual database, holding the most important data of all, rather than some frontend, functional logic, which might as well be auto-updated every single day.

commented

^^
The thing is, from my experience, postgres docker image is an outlier here and all other containers I used are simply hassleless, i.e. they upgrade automatically to make the operation continuous.

And if you don't need automatic upgrade then just stick with specific version tag and nothing will be upgrade, and you will manually upgrade when you need. I'd argue that for your quite specific case (having a bunch of versions that you seem to mangle) there should be a setting requiring explicit flag "warn on automatic upgrade" that will help in your case to not have "damaging" upgrades.

But - if you run a specific version (I assume docker compose/k8s/etc) then you have that version set explicitly, so I can't fathom a use-case where you could "confusingly" and by mistake end-up in situation where your deployment has upgraded to newer version against your intention - you would have to explicitly update the tag that you use...

commented

I'd argue that for your quite specific case (having a bunch of versions that you seem to mangle) there should be a setting requiring explicit flag "warn on automatic upgrade" that will help in your case to not have "damaging" upgrades.

Quite specific? We need a statistic on that, because I'm pretty sure this is the default case for most users. If you run a bunch of different apps, then it's normal to have a bunch of different database versions, to start with.

But - if you run a specific version (I assume docker compose/k8s/etc) then you have that version set explicitly, so I can't fathom a use-case where you could "confusingly" and by mistake end-up in situation where your deployment has upgraded to newer version against your intention - you would have to explicitly update the tag that you use...

If I choose version 15 and my database is of version 13, then the auto-upgrade damages my database, because I did not intend to upgrade.

I think, you expect a different default, when in fact, most people expect no upgrade whatsoever. They just expect different versions, but no auto-upgrade. Which is why it shouldn't be the default. The default should be "dumb" & safe.

Why would you want to hide the upgrade behind a flag?

Most users of a Postgres container are not experts in Postgres. It isn't at all clear to those users that the DB data files need to change between versions of the program. It isn't uncommon in other storage engines to have the same on-disk formats between program versions.

By requiring a flag to do the upgrade, you are explicitly not hiding the fact that the data in the volume container will change. You are making them aware that this volume shouldn't (can't?) be used with an older version.

Not having a flag and automatically updating would be hiding behavior.

Yep, that's the approach. 😄

But that still hides the behavior.

As I argued in the HN post of yours -- you can't do the upgrade silently without notice. It is totally unexpected to the many users don't know that the data volume will change formats.

Add an env flag and then it is clear to novices and your auto upgrading containers can still work their magic for those who already know.

commented

If I choose version 15 and my database is of version 13, then the auto-upgrade damages my database, because I did not intend to upgrade.

Yes, so you use postgres:13-bullseye and it stays that way. Then you change the configuration (compose/k8s/etc) and switch to postgres:15-bullseye which indicates your full and explicit intention to upgrade (because if not then why even make that change?!) and if you change the specification then it means that you DO WANT to upgrade, and it would be nice to actually get it upgraded instead of getting stuck because there is an error (and the upgrade process is quite convoluted in containerised world…).

If you do need some protection because you could "by mistake" change from postgres:13-bullseye to postgres:15-bullseye then as protective measure there could be a flag "don't upgrade automatically even if I explicitly use wrong version!"

Not having a flag and automatically updating would be hiding behavior.

In most software (at least using semver) it's expected that major version is "breaking" (changes in API, data migration) so if someone explicitly changes from one major to another major then it should be assumed that user also expect some sort of migration... If you upgrade your OS or any app do you expect it to NOT perform data migration?