Microservices and Distributed Transactions: Beyond the "Happy Path"
In a previous article, the LIXA project and the XTA programming model have been presented to show how two-phase commit distributed transactions can be implemented in polyglot distributed transactional systems. This second part provides some answers to the question: "How do distributed transactional systems behave in the event of failure?" More specifically, four different failure scenarios are explained with practical examples.
Introduction
A critical aspect of the two-phase commit protocol is the concept of "prepared transaction" which sometimes is synonymous with "recovery pending transaction."
A distributed transaction managed by the XA protocol can be in one of the following three states:
- "Non-prepared": It's a temporary intermediate state, any change can be lost.
- "Prepared": The change has been persisted in a durable way, but it has not been confirmed and data is not visible to other transactions.
- "Confirmed": The change has been persisted and data is visible to other transactions.
From a consistency point of view, in the event of a failure, there are two really interesting conditions:
- The failure happens during the "non-prepared" state.
- The failure happens during the "prepared" state.
Failures that happen when a distributed transaction has already been confirmed are not interesting, because the durability (the "D" of ACID definition) of the data is natively guaranteed by the Resource Manager as it happens for the local transactions.
It must be noted that the sources of truth are the Resource Managers (MySQL and PostgreSQL in our example): one of the consequences of the CAP theorem is the lack of guarantee that two components have the same understanding of the transaction state in a distributed system. The following example explains a typical "split-rain" condition:
- The Transaction Manager sends a "prepare" message to a Resource Manager.
- The Resource Manager prepares the transaction and replies to the Transaction Manager.
- The network between the Transaction Manager and the Resource Manager disconnects (the "P", partition, of the CAP theorem) and the reply message is lost.
- The transaction is aborted due to the network partitioning.
In such situations, the Resource Manager and the Transaction Manager have two different "points of view" about the real state of the transaction.
This article explains how LIXA and XTA manage these types of issues.
Architecture
The same architecture presented in Part I of this series will be used:
The full design is composed of five containers:
- The LIXA state server (lixad)
- A REST client developed in Python 3 and a MySQL instance as its data store
- A REST server developed in Java 8 and a PostgreSQL instance as its data store.
Environment Set Up
The same tools presented in Part I of this series will be used: Git, Docker, and Tive terminal sessions to interact with the containers and to capture the output.
Activate a LIXA State Server
Start the lixad
container:
$ docker run --rm -p 2345:2345 --name=lixad -d lixa/lixad
e21ab3f6093081ac4a84b1edaedac907de0b22f1c088253fd0e0096ea8018f0d
Catch its logs:
$ docker logs -f lixad
2019-05-11 15:44:54.583836 [1/140578614647744] NOTICE: LXD000N this process is starting a new LIXA server (lixa package version is 1.7.8)
2019-05-11 15:44:54.584071 [1/140578614647744] INFO: LXD027I parameter 'pid_file' set to value '/opt/lixa/var/run.pid'
2019-05-11 15:44:54.584083 [1/140578614647744] INFO: LXD026I parameter 'min_elapsed_sync_time' set to value 0
2019-05-11 15:44:54.584086 [1/140578614647744] INFO: LXD026I parameter 'max_elapsed_sync_time' set to value 0
Note that the syslog messages are sent to a Docker log.
Activate a MySQL Instance
Start the mysql
container and connect to it:
$ docker run --rm -e MYSQL_ROOT_PASSWORD=mysecretpw -p 3306:3306 --name=mysql -d lixa/mysql
85978ec9602e2d5c384aa84012ee506175de964f68dc0f07b4a8b72ddd540093
$ docker exec -ti mysql bash
root@85978ec9602e:/#
Wait a few seconds to allow the MySQL instance to initialize, then connect to the database:
root@85978ec9602e:/# mysql -u lixa -h 192.168.123.35 -D lixa --password=passw0rd
mysql: [Warning] Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.7.26 MySQL Community Server (GPL)
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
Activate a PostgreSQL Instance
Start the postgres
container and connect to it:
$ docker run --rm -e POSTGRES_PASSWORD=lixa -p 5432:5432 --name=postgres -d lixa/postgres -c 'max_prepared_transactions=10'
61a84ea7d7a9642bc007758d5b8fd72cca1c50434065da8270f0c12f1a0f53a5
$ docker exec -ti postgres bash
root@61a84ea7d7a9:/#
Wait a few seconds to allow the PostgreSQL instance to initialize, then connect to the database:
root@61a84ea7d7a9:/# psql -U lixa
psql (11.2 (Debian 11.2-1.pgdg90+1))
Type "help" for help.
lixa=>
Build the REST Server
The REST server uses the lixa/xta-jdk-maven
base image, but the standard one that's available at DockerHub can't be used because the "--enable-crash
" config option is necessary to produce system crashes. Building a local customized version is quite easy, just clone the Git repository and enter the xta-java
directory:
$ git clone https://github.com/tiian/lixa-docker.git
$ cd lixa-docker/xta-java
Then rebuild the xta-jdk-maven
image with the --enable-crash
option:
$ docker build -f Dockerfile-jdk-maven --build-arg CONFIG_OPTIONS="--enable-crash" -t lixa/xta-jdk-maven .
And eventually, build the REST server:
$ cd ../examples/PythonJavaREST/
$ docker build -f Dockerfile-server -t rest-server .
Build the REST Client
The REST server uses the lixa/xta-python3
base image, but the standard one available on DockerHub can't be used because the "--enable-crash
" config option is necessary to produce system crashes. Building a local customized version is quite easy. Supposing you have already cloned the Git repository in the previous step, enter the xta-python
directory, and rebuild the xta-python3
with the --enable-crash
option:
$ cd xta-python
$ docker build -f Dockerfile-python3 --build-arg CONFIG_OPTIONS="--enable-crash" -t lixa/xta-python3 .
Then build the REST client:
$ cd ../examples/PythonJavaREST
$ docker build -f Dockerfile-client -t rest-client .
Check Your Environment
If all the above steps worked for you, you should have three running containers:
$ docker ps | grep lixa
e21ab3f60930 lixa/lixad "/home/lixa/lixad-en…" 30 minutes ago Up 30 minutes 0.0.0.0:2345->2345/tcp lixad
61a84ea7d7a9 lixa/postgres "docker-entrypoint.s…" 44 minutes ago Up 44 minutes 0.0.0.0:5432->5432/tcp postgres
85978ec9602e lixa/mysql "docker-entrypoint.s…" 44 minutes ago Up 44 minutes 0.0.0.0:3306->3306/tcp, 33060/tcp mysql
And a couple of available images:
$ docker images | grep rest
rest-server latest 363b2d904621 19 hours ago 725MB
rest-client latest e3dbab970d5c 19 hours ago 628MB
Failure Scenarios
This article describes four possible failure scenarios: the first two are related to a crash of the REST server and the last two are related to a crash of the REST client.
Server Crashes in "Non-Prepared" State
In this scenario, the transaction branch managed by the REST server will crash before the transaction branch has been prepared. To obtain a crash immediately before prepare, the environment variable LIXA_CRASH_POINT
must be set to a value of 12.
Check Data Tables
The "authors
" table must be empty both in MySQL:
mysql> select * from authors;
Empty set (0.00 sec)
mysql>
And in PostgreSQL:
lixa=> select * from authors;
id | last_name | first_name
----+-----------+------------
(0 rows)
lixa=>
Start the REST Server
Start the REST server with the environment variable set to LIXA_CRASH_POINT=12
:
$ docker run -ti --rm -e MAVEN_OPTS="-Djava.library.path=/opt/lixa/lib" -e LIXA_STATE_SERVERS="tcp://192.168.123.35:2345/default" -e PQSERVER="192.168.123.35" -e LIXA_CRASH_POINT=12 -p 18080:8080 rest-server
[...]
May 11, 2019 4:24:49 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:8080]
May 11, 2019 4:24:49 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
Jersey app started with WADL available at http://0.0.0.0:8080/xta/application.wadl
Hit enter to stop it...
Start the REST Client
Start the REST client by specifying the "--no delete
" option (the table is empty and the transaction will crash during an SQL INSERT
statement):
$ docker run -ti --rm -e SERVER="192.168.123.35" -e LIXA_STATE_SERVERS="tcp://192.168.123.35:2345/default" rest-client --no-delete
2019-05-11 16:37:45.361956 [1/140715375716096] INFO: LXC000I this process is starting a new LIXA transaction manager (lixa package version is 1.7.8)
***** REST client *****
2019-05-11 16:37:45.363595 [1/140715375716096] INFO: LXC036I state server noticed there is at least one recovery pending transaction
2019-05-11 16:37:45.364112 [1/140715375716096] NOTICE: LXC037N recovery pending transaction with xid='1279875137.b129ed360e94455bbd45d63ecd67e2cd.e0130ac8303b5a9fc8383c6b92286a09'
2019-05-11 16:37:45.364128 [1/140715375716096] NOTICE: LXC038N transaction with xid='1279875137.b129ed360e94455bbd45d63ecd67e2cd.e0130ac8303b5a9fc8383c6b92286a09' must be rolled back
2019-05-11 16:37:45.364139 [1/140715375716096] INFO: LXC039I transaction with xid='1279875137.b129ed360e94455bbd45d63ecd67e2cd.e0130ac8303b5a9fc8383c6b92286a09' has been recovered (rolled back)
MySQL: executing SQL statement >INSERT INTO authors VALUES(1840, 'Zola', 'Emile')<
Calling REST service passing: xid='1279875137.874e60ce41784155b0cf557cf0a6c693.e0130ac8303b5a9fc8383c6b92286a09', oper='insert'
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
six.raise_from(e, None)
[...]
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 490, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
The REST client catches an exception during the call because the server has crashed:
***** REST service called: xid='1279875137.53a0328502bd4767b026c11ddd76ff3d.e0130ac8303b5a9fc8383c6b92286a09', oper='insert' *****
2019-05-11 16:43:27.652090 [1/140719437825792] INFO: LXC000I this process is starting a new LIXA transaction manager (lixa package version is 1.7.8)
Created a subordinate branch with XID '1279875137.53a0328502bd4767b026c11ddd76ff3d.e0130ac8303b5a9f480eedfdd665454a'
PostgreSQL: executing SQL statement >INSERT INTO authors VALUES(1804, 'Hawthorne', 'Nathaniel')<
Executing first phase of commit (prepare)
2019-05-11 16:43:27.710330 [1/140719437825792] CRIT: LXG000C crash point 12 will immediately terminate the process
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007ffbfc6ca529, pid=1, tid=59
#
# JRE version: OpenJDK Runtime Environment (11.0.3+1) (build 11.0.3+1-Debian-1bpo91)
# Java VM: OpenJDK 64-Bit Server VM (11.0.3+1-Debian-1bpo91, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C [libc.so.6+0x34529] abort+0x269
Looking at rows 5 and 6, we can see that the REST server started the first phase of the commit, but the process crashed as expected.
Neither the client nor the server have prepared their transaction branches and this can be verified by inspecting the Resource Managers (MySQL and PostgreSQL) report:
MySQL:
mysql> select * from authors;
Empty set (0.00 sec)
mysql> xa recover;
Empty set (0.00 sec)
PostgreSQL:
lixa=> select * from authors;
id | last_name | first_name
----+-----------+------------
(0 rows)
lixa=> select * from pg_prepared_xacts;
transaction | gid | prepared | owner | database
-------------+-----+----------+-------+----------
(0 rows)
It's time to restart the REST server without specifying the LIXA_CRASH_POINT
environment variable:
$ docker run -ti --rm -e MAVEN_OPTS="-Djava.library.path=/opt/lixa/lib" -e LIXA_STATE_SERVERS="tcp://192.168.123.35:2345/default" -e PQSERVER="192.168.123.35" -p 18080:8080 rest-server
[...]
May 11, 2019 4:53:31 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:8080]
May 11, 2019 4:53:31 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
Jersey app started with WADL available at http://0.0.0.0:8080/xta/application.wadl
Hit enter to stop it...
Start the client to execute an SQL INSERT
statement:
$ docker run -ti --rm -e SERVER="192.168.123.35" -e LIXA_STATE_SERVERS="tcp://192.168.123.35:2345/default" rest-client --no-delete
2019-05-11 16:54:06.786644 [1/139978787604224] INFO: LXC000I this process is starting a new LIXA transaction manager (lixa package version is 1.7.8)
***** REST client *****
2019-05-11 16:54:06.788080 [1/139978787604224] INFO: LXC036I state server noticed there is at least one recovery pending transaction
2019-05-11 16:54:06.788533 [1/139978787604224] NOTICE: LXC037N recovery pending transaction with xid='1279875137.53a0328502bd4767b026c11ddd76ff3d.e0130ac8303b5a9fc8383c6b92286a09'
2019-05-11 16:54:06.788563 [1/139978787604224] NOTICE: LXC038N transaction with xid='1279875137.53a0328502bd4767b026c11ddd76ff3d.e0130ac8303b5a9fc8383c6b92286a09' must be rolled back
2019-05-11 16:54:06.788581 [1/139978787604224] INFO: LXC039I transaction with xid='1279875137.53a0328502bd4767b026c11ddd76ff3d.e0130ac8303b5a9fc8383c6b92286a09' has been recovered (rolled back)
MySQL: executing SQL statement >INSERT INTO authors VALUES(1840, 'Zola', 'Emile')<
Calling REST service passing: xid='1279875137.314745bd7a8345fe9a916c345866bc33.e0130ac8303b5a9fc8383c6b92286a09', oper='insert'
Server replied >PREPARED<
Executing transaction commit
You can see that:
- In rows 4-5 (messages
LXC036I
,LXC037N
), the LIXA state server suspects the transaction is in recovery pending mode. - In row 6 (message
LXC038N
), the LIXA Transaction Manager embedded in the REST server establishes that a rollback must be performed. - In row 7 (message
LXC039I
), the LIXA Transaction Manager declares "recovered" as its transaction branch.
Coming back to the server we can see the below messages:
***** REST service called: xid='1279875137.314745bd7a8345fe9a916c345866bc33.e0130ac8303b5a9fc8383c6b92286a09', oper='insert' *****
2019-05-11 16:54:06.982736 [1/140065675020032] INFO: LXC000I this process is starting a new LIXA transaction manager (lixa package version is 1.7.8)
2019-05-11 16:54:06.984426 [1/140065675020032] INFO: LXC036I state server noticed there is at least one recovery pending transaction
2019-05-11 16:54:06.984934 [1/140065675020032] NOTICE: LXC037N recovery pending transaction with xid='1279875137.53a0328502bd4767b026c11ddd76ff3d.e0130ac8303b5a9f480eedfdd665454a'
2019-05-11 16:54:06.985086 [1/140065675020032] NOTICE: LXC038N transaction with xid='1279875137.53a0328502bd4767b026c11ddd76ff3d.e0130ac8303b5a9f480eedfdd665454a' must be rolled back
2019-05-11 16:54:06.991819 [1/140065675020032] INFO: LXC018I Resource Manager 'PostgreSQL' returned XAER_NOTA during recovery rollback: the Resource Manager has already rolled back the transaction with xid '1279875137.53a0328502bd4767b026c11ddd76ff3d.e0130ac8303b5a9f480eedfdd665454a'
2019-05-11 16:54:06.992025 [1/140065675020032] INFO: LXC039I transaction with xid='1279875137.53a0328502bd4767b026c11ddd76ff3d.e0130ac8303b5a9f480eedfdd665454a' has been recovered (rolled back)
Created a subordinate branch with XID '1279875137.314745bd7a8345fe9a916c345866bc33.e0130ac8303b5a9f43bace840bdd4cd5'
PostgreSQL: executing SQL statement >INSERT INTO authors VALUES(1804, 'Hawthorne', 'Nathaniel')<
Executing first phase of commit (prepare)
Returning 'PREPARED' to the client
Executing second phase of commit
It can be noticed that:
- In rows 3-4 (messages
LXC036I
,LXC037N
), the LIXA state server suspects the transaction is in recovery pending mode. - In row 5 (message
LXC038N
), the LIXA Transaction Manager embedded in the REST server establishes that a rollback must be performed - In row 6, (message
LXC018I
): the Resource Manager (PostgreSQL) reportsXAER_NOTA
("NOT Available"). - In row 7 (message
LXC039I
), the LIXA Transaction Manager declares "recovered" as its transaction branch.
Checking the content of MySQL and its XA status we can see that everything is fine:
mysql> select * from authors;
+------+-----------+------------+
| id | last_name | first_name |
+------+-----------+------------+
| 1840 | Zola | Emile |
+------+-----------+------------+
1 row in set (0.00 sec)
mysql> xa recover;
Empty set (0.00 sec)
The row has been inserted in the table and there are no recovery pending transactions.
Checking the content of PostgreSQL and its XA status we can see that everything is fine:
lixa=> select * from authors;
id | last_name | first_name
------+-----------+------------
1804 | Hawthorne | Nathaniel
(1 row)
lixa=> select * from pg_prepared_xacts;
transaction | gid | prepared | owner | database
-------------+-----+----------+-------+----------
(0 rows)
Server Crashes in "Prepared" State
In this scenario, the transaction branch managed by the REST server will crash after the transaction branch has been prepared. To obtain a crash immediately after prepare, the environment variable LIXA_CRASH_POINT
must be set to value 14.
Start the REST Server
Start the REST server with the environment variable set to LIXA_CRASH_POINT=14
:
$ docker run -ti --rm -e MAVEN_OPTS="-Djava.library.path=/opt/lixa/lib" -e LIXA_STATE_SERVERS="tcp://192.168.123.35:2345/default" -e PQSERVER="192.168.123.35" -e LIXA_CRASH_POINT=14 -p 18080:8080 rest-server
[...]
May 11, 2019 8:52:26 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:8080]
May 11, 2019 8:52:26 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
Jersey app started with WADL available at http://0.0.0.0:8080/xta/application.wadl
Hit enter to stop it...
Start the REST Client
Start the REST client by specifying the "--no-insert
" option (the table is not empty and the transaction will crash during an SQL DELETE
statement):
$ docker run -ti --rm -e SERVER="192.168.123.35" -e LIXA_STATE_SERVERS="tcp://192.168.123.35:2345/default" rest-client --no-insert
2019-05-11 20:56:53.017341 [1/140060180203264] INFO: LXC000I this process is starting a new LIXA transaction manager (lixa package version is 1.7.8)
***** REST client *****
MySQL: executing SQL statement >DELETE FROM authors WHERE id=1840<
Calling REST service passing: xid='1279875137.3a89df4fd6154470b96f392f70d0a13e.e0130ac8303b5a9fc8383c6b92286a09', oper='delete'
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
six.raise_from(e, None)
[...]
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 490, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
The client catches an exception during the call because the server has crashed:
***** REST service called: xid='1279875137.3a89df4fd6154470b96f392f70d0a13e.e0130ac8303b5a9fc8383c6b92286a09', oper='delete' *****
2019-05-11 20:56:53.189414 [1/140464616359680] INFO: LXC000I this process is starting a new LIXA transaction manager (lixa package version is 1.7.8)
Created a subordinate branch with XID '1279875137.3a89df4fd6154470b96f392f70d0a13e.e0130ac8303b5a9fb3c472b63de54a94'
PostgreSQL: executing SQL statement >DELETE FROM authors WHERE id=1804<
Executing first phase of commit (prepare)
2019-05-11 20:56:53.262442 [1/140464616359680] CRIT: LXG000C crash point 14 will immediately terminate the process
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fc0d7eb9529, pid=1, tid=59
#
# JRE version: OpenJDK Runtime Environment (11.0.3+1) (build 11.0.3+1-Debian-1bpo91)
# Java VM: OpenJDK 64-Bit Server VM (11.0.3+1-Debian-1bpo91, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C [libc.so.6+0x34529] abort+0x269
Looking at rows 5 and 6, we can see that the REST server started the first phase of the commit, but the process crashed as expected.
This time PostgreSQL has prepared its branch, while MySQL has not; the rows are still in the tables:
mysql> select * from authors;
+------+-----------+------------+
| id | last_name | first_name |
+------+-----------+------------+
| 1840 | Zola | Emile |
+------+-----------+------------+
1 row in set (0.00 sec)
mysql> xa recover;
Empty set (0.00 sec)
lixa=> select * from authors;
id | last_name | first_name
------+-----------+------------
1804 | Hawthorne | Nathaniel
(1 row)
lixa=> select * from pg_prepared_xacts;
transaction | gid |
prepared | owner | database
-------------+--------------------------------------------------------------+---
----------------------------+-------+----------
576 | 1279875137_OonfT9YVRHC5bzkvcNChPg==_4BMKyDA7Wp+zxHK2PeVKlA== | 20
19-05-11 20:56:53.249484+00 | lixa | lixa
(1 row)
It is expected that the transaction will be recovered and rolled back; to verify this behavior we can restart the server without specifying the LIXA_CRASH_POINT
environment variable:
$ docker run -ti --rm -e MAVEN_OPTS="-Djava.library.path=/opt/lixa/lib" -e LIXA_STATE_SERVERS="tcp://192.168.123.35:2345/default" -e PQSERVER="192.168.123.35" -p 18080:8080 rest-server
[...]
May 11, 2019 9:07:59 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:8080]
May 11, 2019 9:07:59 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
Jersey app started with WADL available at http://0.0.0.0:8080/xta/application.wadl
Hit enter to stop it...
And execute the client with an SQL DELETE
statement:
$ docker run -ti --rm -e SERVER="192.168.123.35" -e LIXA_STATE_SERVERS="tcp://192.168.123.35:2345/default" rest-client --no-insert
2019-05-11 21:11:13.562803 [1/139853280184064] INFO: LXC000I this process is starting a new LIXA transaction manager (lixa package version is 1.7.8)
***** REST client *****
2019-05-11 21:11:13.563988 [1/139853280184064] INFO: LXC036I state server noticed there is at least one recovery pending transaction
2019-05-11 21:11:13.564377 [1/139853280184064] NOTICE: LXC037N recovery pending transaction with xid='1279875137.d6d6279e80e2475fb62289994ce10466.e0130ac8303b5a9fc8383c6b92286a09'
2019-05-11 21:11:13.564392 [1/139853280184064] NOTICE: LXC038N transaction with xid='1279875137.d6d6279e80e2475fb62289994ce10466.e0130ac8303b5a9fc8383c6b92286a09' must be rolled back
2019-05-11 21:11:13.564405 [1/139853280184064] INFO: LXC039I transaction with xid='1279875137.d6d6279e80e2475fb62289994ce10466.e0130ac8303b5a9fc8383c6b92286a09' has been recovered (rolled back)
MySQL: executing SQL statement >DELETE FROM authors WHERE id=1840<
Calling REST service passing: xid='1279875137.fe3b74c46b1a41bb8630dbc5b6562bcb.e0130ac8303b5a9fc8383c6b92286a09', oper='delete'
Server replied >PREPARED<
Executing transaction commit
We can see that:
- In rows 4-5 (messages
LXC036I
,LXC037N
), the LIXA state server suspects the transaction is in recovery pending mode. - In row 6 (message
LXC038N
), the LIXA Transaction Manager embedded in the server establishes that a rollback must be performed. - In row 7 (message
LXC039I
), the LIXA Transaction Manager declares "recovered" as its branch.
Coming back to the server we can see something like the below messages:
***** REST service called: xid='1279875137.fe3b74c46b1a41bb8630dbc5b6562bcb.e0130ac8303b5a9fc8383c6b92286a09', oper='delete' *****
2019-05-11 21:11:13.752470 [1/140375670560512] INFO: LXC000I this process is starting a new LIXA transaction manager (lixa package version is 1.7.8)
2019-05-11 21:11:13.754308 [1/140375670560512] INFO: LXC036I state server noticed there is at least one recovery pending transaction
2019-05-11 21:11:13.754738 [1/140375670560512] NOTICE: LXC037N recovery pending transaction with xid='1279875137.3a89df4fd6154470b96f392f70d0a13e.e0130ac8303b5a9fb3c472b63de54a94'
2019-05-11 21:11:13.754751 [1/140375670560512] NOTICE: LXC038N transaction with xid='1279875137.3a89df4fd6154470b96f392f70d0a13e.e0130ac8303b5a9fb3c472b63de54a94' must be rolled back
2019-05-11 21:11:13.772826 [1/140375670560512] INFO: LXC039I transaction with xid='1279875137.3a89df4fd6154470b96f392f70d0a13e.e0130ac8303b5a9fb3c472b63de54a94' has been recovered (rolled back)
Created a subordinate branch with XID '1279875137.fe3b74c46b1a41bb8630dbc5b6562bcb.e0130ac8303b5a9f81d06ef1e3ac45d8'
PostgreSQL: executing SQL statement >DELETE FROM authors WHERE id=1804<
Executing first phase of commit (prepare)
Returning 'PREPARED' to the client
Executing second phase of commit
We can see that:
- In rows 3-4 (messages
LXC036I
,LXC037N
), the LIXA state server suspects the transaction in recovery pending mode. - In row 5 (message
LXC038N
), the LIXA Transaction Manager embedded in the server establishes that a rollback must be performed. - In row 7 (message
LXC039I
), the LIXA Transaction Manager declares "recovered" as its transaction branch. - In this scenario, PostgreSQL does not return the
XAER_NOTA
error because it had a prepared transaction to recover.
Checking the content of MySQL and its XA status we can see that everything is fine:
mysql> select * from authors;
Empty set (0.00 sec)
mysql> xa recover;
Empty set (0.00 sec)
The row has been deleted from the table and there are no recovery pending transactions.
Checking the content of PostgreSQL and its XA status we can see that everything is fine:
lixa=> select * from authors;
id | last_name | first_name
----+-----------+------------
(0 rows)
lixa=> select * from pg_prepared_xacts;
transaction | gid | prepared | owner | database
-------------+-----+----------+-------+----------
(0 rows)
The row has been deleted from the table and there are no more prepared transactions.
Client Crashes in "Non-Prepared" State
In this scenario, the transaction branch managed by the REST client will crash before the transaction branch has been prepared. To obtain a crash immediately before prepare, the environment variable LIXA_CRASH_POINT
must be set to a value of 12.
Start the REST Server
Start the REST server without setting the environment variable LIXA_CRASH_POINT
:
$ docker run -ti --rm -e MAVEN_OPTS="-Djava.library.path=/opt/lixa/lib" -e LIXA_STATE_SERVERS="tcp://192.168.123.35:2345/default" -e PQSERVER="192.168.123.35" -p 18080:8080 rest-server
[...]
May 12, 2019 8:34:31 AM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:8080]
May 12, 2019 8:34:31 AM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
Jersey app started with WADL available at http://0.0.0.0:8080/xta/application.wadl
Hit enter to stop it...
Start the REST Client
Make sure the environment variable is set to LIXA_CRASH_POINT=12
and specify the "--no-delete
" option (the table is empty and the transaction will crash during an SQL INSERT
statement):
$ docker run -ti --rm -e SERVER="192.168.123.35" -e LIXA_STATE_SERVERS="tcp://192.168.123.35:2345/default" -e LIXA_CRASH_POINT=12 rest-client --no-delete
2019-05-12 08:41:17.620743 [1/139970505893632] INFO: LXC000I this process is starting a new LIXA transaction manager (lixa package version is 1.7.8)
***** REST client *****
MySQL: executing SQL statement >INSERT INTO authors VALUES(1840, 'Zola', 'Emile')<
Calling REST service passing: xid='1279875137.643298b71af44f5c99774a46c5efc274.e0130ac8303b5a9fc8383c6b92286a09', oper='insert'
Server replied >PREPARED<
Executing transaction commit
2019-05-12 08:41:17.694272 [1/139970505893632] CRIT: LXG000C crash point 12 will immediately terminate the process
The client crashed during the transaction commit as expected. Row 6 shows that the client received a "PREPARED
" answer from the server. Below is the output displayed by the server in its console:
***** REST service called: xid='1279875137.643298b71af44f5c99774a46c5efc274.e0130ac8303b5a9fc8383c6b92286a09', oper='insert' *****
Created a subordinate branch with XID '1279875137.643298b71af44f5c99774a46c5efc274.e0130ac8303b5a9f4cb983944b89443f'
PostgreSQL: executing SQL statement >INSERT INTO authors VALUES(1804, 'Hawthorne', 'Nathaniel')<
Executing first phase of commit (prepare)
Returning 'PREPARED' to the client
Executing second phase of commit
2019-05-12 08:41:22.691844 [1/139661518812928] ERR: LXC034E a message has not arrived before timeout expiration (5000 ms) and its socket has been closed
XtaException: LIXA ReturnCode=-32 ('ERROR: a message has not been received before timeout expiration and the TCP socket has been closed')
org.tiian.lixa.xta.XtaException: ERROR: a message has not been received before timeout expiration and the TCP socket has been closed
at org.tiian.lixa.xta.Transaction.commit(Native Method)
at org.tiian.lixa.xta.examples.MyResource$1.run(MyResource.java:126)
at java.base/java.lang.Thread.run(Thread.java:834)
The server prepared its transaction branch, but it was not able to confirm it as shown by the message LXC034E
and by the XtaException. If we check the status inside MySQL, the database used by the client, we can see a clean situation:
mysql> select * from authors;
Empty set (0.00 sec)
mysql> xa recover;
Empty set (0.00 sec)
This is because the crash happened before the prepare phase. On the other hand, inside PostgreSQL we can see a prepared transaction:
lixa=> select * from authors;
id | last_name | first_name
----+-----------+------------
(0 rows)
lixa=> select * from pg_prepared_xacts;
transaction | gid |
prepared | owner | database
-------------+--------------------------------------------------------------+---
----------------------------+-------+----------
578 | 1279875137_ZDKYtxr0T1yZd0pGxe/CdA==_4BMKyDA7Wp9MuYOUS4lEPw== | 20
19-05-12 08:41:17.653142+00 | lixa | lixa
(1 row)
The JDBC driver of PostgreSQL uses a base64 internal representation for the XID (XA transaction id):
ZDKYtxr0T1yZd0pGxe/CdA==
is the base64 equivalent of 643298b71af44f5c99774a46c5efc2744BMKyDA7Wp9MuYOUS4lEPw==
is the base64 equivalent of e0130ac8303b5a9f4cb983944b89443f
It's time to execute the client without specifying the LIXA_CRASH_POINT
environment variable:
$ docker run -ti --rm -e SERVER="192.168.123.35" -e LIXA_STATE_SERVERS="tcp://192.168.123.35:2345/default" rest-client --no-delete
2019-05-12 08:50:27.940106 [1/139723327334144] INFO: LXC000I this process is starting a new LIXA transaction manager (lixa package version is 1.7.8)
***** REST client *****
2019-05-12 08:50:27.941063 [1/139723327334144] INFO: LXC036I state server noticed there is at least one recovery pending transaction
2019-05-12 08:50:27.941458 [1/139723327334144] NOTICE: LXC037N recovery pending transaction with xid='1279875137.643298b71af44f5c99774a46c5efc274.e0130ac8303b5a9fc8383c6b92286a09'
2019-05-12 08:50:27.941478 [1/139723327334144] NOTICE: LXC038N transaction with xid='1279875137.643298b71af44f5c99774a46c5efc274.e0130ac8303b5a9fc8383c6b92286a09' must be rolled back
2019-05-12 08:50:27.941811 [1/139723327334144] INFO: LXC018I Resource Manager 'MySQL' returned XAER_NOTA during recovery rollback: the Resource Manager has already rolled back the transaction with xid '1279875137.643298b71af44f5c99774a46c5efc274.e0130ac8303b5a9fc8383c6b92286a09'
2019-05-12 08:50:27.941821 [1/139723327334144] INFO: LXC039I transaction with xid='1279875137.643298b71af44f5c99774a46c5efc274.e0130ac8303b5a9fc8383c6b92286a09' has been recovered (rolled back)
MySQL: executing SQL statement >INSERT INTO authors VALUES(1840, 'Zola', 'Emile')<
Calling REST service passing: xid='1279875137.836d2bad098046edaa09bee4e9ac8a2d.e0130ac8303b5a9fc8383c6b92286a09', oper='insert'
Server replied >PREPARED<
Executing transaction commit
As in the previous scenarios, we can see the messages LXC036I
, LXC037N
, ... LXC039I
are related to the automatic recovery on the client side. Looking inside the console of the server we get something like this:
***** REST service called: xid='1279875137.836d2bad098046edaa09bee4e9ac8a2d.e0130ac8303b5a9fc8383c6b92286a09', oper='insert' *****
2019-05-12 08:50:27.954042 [1/139661588543232] INFO: LXC036I state server noticed there is at least one recovery pending transaction
2019-05-12 08:50:27.954752 [1/139661588543232] NOTICE: LXC037N recovery pending transaction with xid='1279875137.643298b71af44f5c99774a46c5efc274.e0130ac8303b5a9f4cb983944b89443f'
2019-05-12 08:50:27.954957 [1/139661588543232] NOTICE: LXC038N transaction with xid='1279875137.643298b71af44f5c99774a46c5efc274.e0130ac8303b5a9f4cb983944b89443f' must be rolled back
2019-05-12 08:50:27.995375 [1/139661588543232] INFO: LXC039I transaction with xid='1279875137.643298b71af44f5c99774a46c5efc274.e0130ac8303b5a9f4cb983944b89443f' has been recovered (rolled back)
Created a subordinate branch with XID '1279875137.836d2bad098046edaa09bee4e9ac8a2d.e0130ac8303b5a9f3f83b37334a24183'
PostgreSQL: executing SQL statement >INSERT INTO authors VALUES(1804, 'Hawthorne', 'Nathaniel')<
Executing first phase of commit (prepare)
Returning 'PREPARED' to the client
Executing second phase of commit
Even on the server side, automatic recovery has been performed.
Let's check the status inside MySQL again:
mysql> select * from authors;
+------+-----------+------------+
| id | last_name | first_name |
+------+-----------+------------+
| 1840 | Zola | Emile |
+------+-----------+------------+
1 row in set (0.00 sec)
mysql> xa recover;
Empty set (0.00 sec)
And inside PostgreSQL:
lixa=> select * from authors;
id | last_name | first_name
------+-----------+------------
1804 | Hawthorne | Nathaniel
(1 row)
lixa=> select * from pg_prepared_xacts;
transaction | gid | prepared | owner | database
-------------+-----+----------+-------+----------
(0 rows)
Client Crashes in "Prepared" State
In this scenario, the transaction branch managed by the REST client will crash after the transaction branch has been prepared. To obtain a crash immediately after prepare, the environment variable LIXA_CRASH_POINT
must be set to value 14.
Start the REST Server
Start the REST server without setting the environment variable LIXA_CRASH_POINT
:
$ docker run -ti --rm -e MAVEN_OPTS="-Djava.library.path=/opt/lixa/lib" -e LIXA_STATE_SERVERS="tcp://192.168.123.35:2345/default" -e PQSERVER="192.168.123.35" -p 18080:8080 rest-server
[...]
May 12, 2019 12:27:32 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:8080]
May 12, 2019 12:27:32 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
Jersey app started with WADL available at http://0.0.0.0:8080/xta/application.wadl
Hit enter to stop it...
Start the REST Client
Setting the environment variable to LIXA_CRASH_POINT=14
and specify the "--no-insert
" option (the table contains one row and the transaction will crash during an SQL DELETE
statement):
$ docker run -ti --rm -e SERVER="192.168.123.35" -e LIXA_STATE_SERVERS="tcp://192.168.123.35:2345/default" -e LIXA_CRASH_POINT=14 rest-client --no-insert
2019-05-12 12:30:31.934668 [1/140414610745088] INFO: LXC000I this process is starting a new LIXA transaction manager (lixa package version is 1.7.8)
***** REST client *****
MySQL: executing SQL statement >DELETE FROM authors WHERE id=1840<
Calling REST service passing: xid='1279875137.4bbeee362190416580342d1e0fbc2c1f.e0130ac8303b5a9fc8383c6b92286a09', oper='delete'
Server replied >PREPARED<
Executing transaction commit
2019-05-12 12:30:32.054258 [1/140414610745088] CRIT: LXG000C crash point 14 will immediately terminate the process
The client crashed during the transaction commit as expected, but this time the client has been prepared for the transaction.
Row 6 shows that the client received a "PREPARED
" answer from the server. Below is the output displayed by the server in its console:
***** REST service called: xid='1279875137.4bbeee362190416580342d1e0fbc2c1f.e0130ac8303b5a9fc8383c6b92286a09', oper='delete' *****
Created a subordinate branch with XID '1279875137.4bbeee362190416580342d1e0fbc2c1f.e0130ac8303b5a9fcfb462b57a45455f'
PostgreSQL: executing SQL statement >DELETE FROM authors WHERE id=1804<
Executing first phase of commit (prepare)
Returning 'PREPARED' to the client
Executing second phase of commit
There are no complaints on the REST server side.
If we check the status inside MySQL, the database used by the client, we can see the row is still in the table and a prepared transaction is waiting to be recovered:
mysql> select * from authors;
+------+-----------+------------+
| id | last_name | first_name |
+------+-----------+------------+
| 1840 | Zola | Emile |
+------+-----------+------------+
1 row in set (0.00 sec)
mysql> xa recover;
+------------+--------------+--------------+------------------------------------------------------------------+
| formatID | gtrid_length | bqual_length | data |
+------------+--------------+--------------+------------------------------------------------------------------+
| 1279875137 | 32 | 32 | 4bbeee362190416580342d1e0fbc2c1fe0130ac8303b5a9fc8383c6b92286a09 |
+------------+--------------+--------------+------------------------------------------------------------------+
1 row in set (0.00 sec)
Inside PostgreSQL, the situation is quite different:
lixa=> select * from authors;
id | last_name | first_name
----+-----------+------------
(0 rows)
lixa=> select * from pg_prepared_xacts;
transaction | gid | prepared | owner | database
-------------+-----+----------+-------+----------
(0 rows)
This is fine because the REST client completed the prepare phase and the REST server completed its transaction branch.
Once again, a new client invocation will activate the automatic recovery feature of the embedded LIXA Transaction Manager:
$ docker run -ti --rm -e SERVER="192.168.123.35" -e LIXA_STATE_SERVERS="tcp://192.168.123.35:2345/default" rest-client --no-insert
2019-05-12 12:38:47.404872 [1/140404202567424] INFO: LXC000I this process is starting a new LIXA transaction manager (lixa package version is 1.7.8)
***** REST client *****
2019-05-12 12:38:47.406843 [1/140404202567424] INFO: LXC036I state server noticed there is at least one recovery pending transaction
2019-05-12 12:38:47.407487 [1/140404202567424] NOTICE: LXC037N recovery pending transaction with xid='1279875137.4bbeee362190416580342d1e0fbc2c1f.e0130ac8303b5a9fc8383c6b92286a09'
2019-05-12 12:38:47.407678 [1/140404202567424] NOTICE: LXC038N transaction with xid='1279875137.4bbeee362190416580342d1e0fbc2c1f.e0130ac8303b5a9fc8383c6b92286a09' must be committed
2019-05-12 12:38:47.436083 [1/140404202567424] INFO: LXC039I transaction with xid='1279875137.4bbeee362190416580342d1e0fbc2c1f.e0130ac8303b5a9fc8383c6b92286a09' has been recovered (committed)
MySQL: executing SQL statement >DELETE FROM authors WHERE id=1840<
Calling REST service passing: xid='1279875137.ae47a9f8a8f04d03b6e8f87a904b6ab7.e0130ac8303b5a9fc8383c6b92286a09', oper='delete'
Server replied >PREPARED<
Executing transaction commit
Important fact: Rows 6 and 7 show that the recovery pending transaction must be committed. This is consistent with the status of the global transaction because the server already committed its own branch.
Looking inside the terminal of the server:
***** REST service called: xid='1279875137.ae47a9f8a8f04d03b6e8f87a904b6ab7.e0130ac8303b5a9fc8383c6b92286a09', oper='delete' *****
Created a subordinate branch with XID '1279875137.ae47a9f8a8f04d03b6e8f87a904b6ab7.e0130ac8303b5a9f558b39ba87cf4e4a'
PostgreSQL: executing SQL statement >DELETE FROM authors WHERE id=1804<
Executing first phase of commit (prepare)
Returning 'PREPARED' to the client
Executing second phase of commit
We see that there are no recovery pending transactions and this is consistent with the previous execution because, as already stated above, the transaction branch has already been committed on the server side.
Finally, we have to check the state in the databases. Looking at MySQL we get:
mysql> select * from authors;
Empty set (0.00 sec)
mysql> xa recover;
Empty set (0.00 sec)
And looking at PostgreSQL we obtain:
lixa=> select * from authors;
id | last_name | first_name
----+-----------+------------
(0 rows)
lixa=> select * from pg_prepared_xacts;
transaction | gid | prepared | owner | database
-------------+-----+----------+-------+----------
(0 rows)
The data is consistent and no prepared transactions are needed to recover!
Summary
In the previous paragraphs, four failure scenarios have been analyzed. Here is a brief summary.
Server Side Crashes Before Prepare
Both server and client don't prepare their own transaction branch, no prepared transaction is required to recover.
Server Side Crashes After Prepare
The server prepares its transaction branch and the client doesn't, only the server is affected by a prepared transaction that needs to be recovered, and the recovery happens automatically as soon as a new REST request is served.
Client Side Crashes Before Prepare
As in the above scenario, the server prepares its transaction branch and the client doesn't, only the server is affected by a prepared transaction that needs to be recovered, and the recovery happens automatically as soon as a new REST request is served.
Client Side Crashes After Prepare
The server prepares and commits its transaction branch, and the client prepares its transaction branch but does not commit. Only the client is affected by a prepared transaction that needs to be recovered and the recovery happens automatically as soon as a new transaction is started by the client.
Conclusion
The XTA API in conjunction with the LIXA state server enables the development of systems that implement ACID distributed transactions among two or more applications (services).
This article shows what happens when one of the two parties crashes: recovery is automatically performed without human intervention and strong consistency is guaranteed.
The issues related to the locks associated with the prepared transactions should not affect a microservices-based architecture: if a table is accessed by a single service and the locks are fine-grained (row lock level for relational databases), the only impact of a prepared transaction is a short recovery phase that's automatically performed by the next invocation of the same service.