Postgres replication 93 to 82

2014-10-19

Requirements:

Recently at work we upgraded our production cluster from Ubuntu 10.04 to 14.04, and with this upgrade our version of postgres moved from 8.2 to 9.3.

As the new production cluster was going in a week before the new dr cluster we had a requirement of being able to replicate from new production to existing/old dr.

We don't use postgres realtime replication and instead rely on overnight dump and restores.


First problem:

The problem we quickly encountered was that postgres' custom dump format is forward compatible only, meaning we can restore from 8.2 to 9.3 but not the otherway around.

Enter postgres' plain dumping format, also known as 'plain sql dump', This format exports a database as a sequence of sql statements and so is more likely to be backwards compatible.

Using this format is as simple as setting the pg_dump option -f to p (for plain)

pg_dump -h foo -U bar -d baz -fp > plain_dump.sql

Of course this wouldn't have made a blog post if this was all it took, there were of course issues even using plain dumps.


Lowest hanging fruit

The first issue I fixed was caused by postgres 9.0 introducing a new format for storing the bytea data type, this new format is 'Hex' and is both new and the default in postgres 9.0 [1].

The legacy compatible format for bytea is called escape.

You can ask postgres to always use the legacy format escape by running a sql statement on the 9.3 server (before exporting):

dbname=> alter database "dbname" set bytea_output = 'escape';


Something a bit more involved:

The remaining 2 errors are both due to postgres 9.3 using different/new syntax.

Unsupported parameter:

psql:vagrant-db-2014-10-09.sql.dump:6: ERROR:  unrecognized configuration parameter "lock_timeout"

this is caused by the following code, which is not needed:

SET lock_timeout = 0;

Unsupported syntax for enabling a language:

psql:vagrant-db-2014-10-09.sql.dump:25: ERROR:  syntax error at or near "PROCEDURAL"
LINE 1: CREATE OR REPLACE PROCEDURAL LANGUAGE plpgsql;

this is caused by the following new syntax:

CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;

the backwards compatible syntax is:

CREATE PROCEDURAL LANGUAGE plpgsql;

Notice that these are not completely equivalent.

This second problem also caused a raft of other errors:

CREATE FUNCTION
ALTER FUNCTION
psql:vagrant-db-2014-10-09.sql.dump:81: ERROR:  language "plpgsql" does not exist
HINT:  Use CREATE LANGUAGE to load the language into the database.
...
psql:vagrant-db-2014-10-09.sql.dump:48027061: ERROR:  function updated() does not exist

they are all solved by fixing enabling the language correctly.


A solution:

For both of the above 2 issues I was unable to find a way of disabling them from within postgrest, the next option was to modify them in the dump itself.

Our dump files are ~15GB each.

At first I tried to use ed but my version insisted on creating a 15GB file in /tmp and so failed.

I then turned to sed inplace editing, but due to the size of the dumps this took far too long, while waiting for sed to complete I started writing a small c program to perform the edit.

I was able to finish the c program and run it before the sed program had finished executing, after over an hour of execution I killed sed.

I have uploaded the full source of this program to my github [2].

The crux of the c program is this (error checking and minutia omitted):

// seek to location of first line
fseek(f, 62, SEEK_SET)
// check line's contents are as expected
fread(read, 1, 21, f)
strncmp(exp1, read, 21)
// seek back
fseek(f, 62, SEEK_SET)
// comment out line
fwrite("--", 1, 2, f)

// seek to location of second line
fseek(f, 406, SEEK_SET)
// check line's contents are as expected
fread(read, 1, 62, f)
strncmp(exp2, read, 62)
// seek back
fseek(f, 406, SEEK_SET)
// replace line with newer syntax
// match original number of bytes while avoiding use of whitespace
char * line = "CREATE PROCEDURAL LANGUAGE plpgsql;---------------------------";
fwrite(line, 1, 62, f)

This completes in well under a second, and modifies the file in place.

Of course this is brittle, even adding a single byte at the beginning of the file will throw this off, and editing the c file by hand is made a bit tedious by all the noise (error checking).


A more general solution

Not being able to use either ed or sed left me a little dissapointed, The c program worked but in it's current form it is very brittle and isn't helped by all the noise of manual error checking, this all requires far too much effort to debug and maintain.

I want a more general solution.

Ideally I want a dumbed down ed-like that allows editing of the file in place, without having to read or write any of the lines I don't care about, taking care of all of the noise for me, a domain specific language for editing a file in place.

Hopefully details and some code will come in a later post.

EDIT: information about this work is now up on [3]



[1] Postgres 9.3 binary data formats

[2] https://github.com/mkfifo/munge_pg_93_to_82

[3] Dodo the stupid editor