PhxSQL is a high-availability and strong-consistency MySQL cluster built on Paxos and Percona.
Authors: Junchao Chen, Haochuan Cui, Duokai Huang, Ming Chen and Sifan Liu
Contact us: phxteam@tencent.com
#PhxSQL features:
- high availability by automatic failovers: the cluster works well when more than half of cluster nodes work and are interconnected.
- guarantee of data consistency among cluster nodes: replacing loss-less semi-sync between MySQL master and MySQL slaves with Paxos, PhxSQL ensures zero-loss binlogs between master and slaves and supports linearizable consistency, which is as strong as that of Zookeeper.
- complete compliance with MySQL for and MySQL client: PhxSQL supports up to serializable isolation level of transaction.
- easy deployment and easy maintenance: PhxSQL, powered by in-house implementation of Paxos, has only 4 components including MySQL and doesn't depend on zookeeper or etcd for anything. PhxSql supports automated cluster membership hot reconfiguration.
This project includes
- Source codes
- Third party submodules
- Pre-compiled binaries for Ubuntu 64bit system.
Projects on which this project depends are also published by tencent-wechat( phxpaxos, phxrpc, libco ). You can download or clone them with --recurse-submodule.
phxpaxos: http://github.com/tencent-wechat/phxpaxos
phxrpc: http://github.com/tencent-wechat/phxrpc
libco: http://github.com/tencent-wechat/libco
If you prefer pre-compiled binaries, just skip this part.
- PhxSQL
- phxsqlproxy
- phxbinlogsvr
- percona
- phx_percona
- plugin
- phxsync_phxrpc
- semisync
- third_party
- glog
- leveldb
- protobuf
- phxpaxos
- colib
- phxrpc
- tools
- phxrpc_package_config
Name | Introduction |
---|---|
phxsqlproxy | surrogate between MySQL client and PhxSql |
phxbinlogsvr | server for global binlog synchronization and storage, as well as management of mastership and membership |
percona | Source code of percona5.6.31-77.0 |
phx_percona/plugin/phxsync_phxrpc | A plugin running in MySql that intercepts MySQL binlogs and forwards them to phxbinlogsvr |
phx_percona/plugin/semisync | A semisync compatible with our modified plugin APIs of MySQL |
third_party/glog | GLOG library |
third_party/leveldb | LevelDB library |
third_party/protobuf | Google Protobuf 3.0+ library |
third_party/phxpaxos | PhxPaxos library |
third_party/colib | Libco library |
third_party/phxrpc | Phxrpc library |
PhxSQL needs 6 third party libs(glog, leveldb, protobuf, phxpaxos, colib, phxrpc). Please install them in phxsql/third_party directory or just link to third_party.
NOTE: Please make sure -fPIC is added while executing configure in GLOG and Protobuf as well as specifying --prefix=/the/current/absolute/path.
For example: ./configure CXXFLAGS=-fPIC --prefix=/home/root/phxsql/third_party/glog
.
Then download percona-server-5.6.31-77.0.tar.gz
Move percona-server-5.6\_5.6.31-77.0
to PhxSQL directory, rename or link as 'percona'
(NOTE: Only percona-server-5.6_5.6.31-77.0 is available)
- Execute
./autoinstall.sh && make && make install
- Execute 'make package' to generate a tar.gz package so you can transfer to your target hosts.
(NOTE: We put the binaries in install_package/sbin, configuration files in install_package/tools/etc_template, install scripts in install_package/tools. The 'make package' command will pack 'install_package' into 'phxsql-$version.tar.gz'. Please specify -prefix=/the/path/you/want/to/install while executing ./autoinstall.sh)
PhxSQL needs to run on more than 2 hosts. We suggest N >= 3 and N is an odd number, where N means the number of hosts.
-
Transfer phxsql.tar.gz to all of the hosts you want to install. Then do as the following steps:
-
Execute
tar -xvf phxsq.tar.gz .
-
Enter phxsql/tools, Execute
python install.py --help
to get the help of installation.(For example:
python2.7 install.py -i"your_inner_ip" -p 54321 -g 6000 -y 11111 -P 17000 -a 8001 -f/tmp/data/
)
-
-
After executing 'install.py' on all the hosts, Execute './phxbinlogsvr_tools_phxrpc -f InitBinlogSvrMaster -h"ip1,ip2,ip3" -p 17000' in any one hosts. 17000 should be replaced with the port on which phxbinlogsvr is listening.
-
The cluster is active while a message shows master initialization is finished.
-
You can execute some SQLs to check the status of cluster through
mysql -uroot -h"your_inner_ip" -P$phxsqlproxy_port
- Enter phxsql/tools/
- Execute
test_phxsql.sh phxsqlproxy_port ip1 ip2 ip3
PhxSQL have 3 configuration files in total.
NOTE:Modify tools/etc_temlate/my.cnf
before installation, Modify etc/my.cnf
after installation
Section name | Key name | comment |
---|---|---|
AgentOption | AgentPort | Port for the connection of binlogsvr and MySQL |
EventDataDir | Directory where to store the binlogsvr data | |
MaxFileSize | File size per data of phxbinlogsvr, the unit is B | |
MasterLease | Lease length of master, the unit is second | |
CheckPointTime | The data before CheckPointTime will be deleted by phxbinlogsvr, but it will not be deleted if some other PhxSQL nodes have not learned yet, the unit is minute | |
MaxDeleteCheckPointFileNum | The maximum number of files deleted each time by phxbinlogsvr | |
FollowIP | Enabled if it is a follower node and will learn binlog from this FollowIP , this node will not vote |
|
PaxosOption | PaxosLogPath | Directory where to store paxos data |
PaxosPort | Port for paxos to connect each other | |
PacketMode | The maximum size of paxos log for PhxPaxos,1 means 100M, but the network timeout will be 1 minute, 0 means 50M and network timeout is 2s(changed in dynamic). | |
Server | IP | IP for phxbinlogsvr to listen |
Port | Port for phxbinlogsvr to listen | |
LogFilePath | Directory to store log | |
LogLevel | Log level of phxbinlogsvr |
Section name | Key name | comment |
---|---|---|
Server | IP | IP for phxsqlproxy to listen |
Port | Port for phxsqlproxy to listen | |
LogFilePath | Directory to store log | |
LogLevel | Log level of phxbinlogsvr | |
MasterEnableReadPort | Enable readonly-port in master node |
phxsqlproxy is the surrogate of PhxSQL, all requests will be sent to phxsqlproxy and then be forwarded to MySQL.
It is the port configured in phxsqlproxy.conf
.
Every requests sent to this port will be forwarded to the master node to excute.
It is (MasterPort + 1). You can also specify it by setting SlavePort = xxxxx
in phxsqlproxy.conf
.
Every requests will be executed on the local MySQL. A master node will make a redirection to another slave nodes if MasterEnableReadPort = 0
(this will save the CPU/IO resource for write requests)
- Using
mysql -u$user -h$phxsqlproxyip -P$phxsqlproxyport -p$pwd
to connect to phxsqlproxy - Execute SQL command.
$phxsqlproxyip
can be any one IP of hosts in a clusters and$phxsqlproxyport
can beMasterPort
orSlavePort
.
PhxSQL provides a tool phxbinlogsvr_tools_phxrpc
to help the mangerment of PhxSQL.
PhxSQL cluster needs 1 MySQL admin accounts and 1 synchronization account. The default admin account is (root
, ""
), the default synchronization account is ( replica
, replica123
), They can be modified( and only be modifyed ) via phxbinlogsvr_tools_phxrpc
. DON'T DO THIS MANUALLY.
Following is some commands you may used frequently.
**Function:**Get the current master info from quorum nodes( IP and timeout ).
Arguments:
- Host: Any one IP of clusters nodes
- Port: Port which phxbinlogsvr is listening. like
17000
phxbinlogsvr_tools -f SetMySqlAdminInfo -h <host> -p <port> -u <admin username> -d <admin pwd> -U <new admin username> -D <new admin pwd>
Function: Set the user and password of admin account.
Arguments:
- Host: Any one IP of clusters nodes
- Port: Port which phxbinlogsvr is listening. like
17000
- Admin username: Current account user( default is
root
) - Admin pwd: Current account password( default is
""
) - New admin username: New user
- New admin pwd: New password
phxbinlogsvr_tools -f SetMySqlReplicaInfo -h <host> -p <port> -u <admin username> -d <admin pwd> -U <new replica username> -D <new replica pwd>
Function: Set the user and password of synchronization account.
Arguments:
- Host: Any one IP of clusters nodes
- Port: Port which phxbinlogsvr is listening. like
17000
- Admin username: Current account user( default is
root
) - Admin pwd: Current account password( default is
""
) - New replica username: New user
- New replica pwd: New password
Function: Membership of this cluster, all IPs and Ports included.
Arguments:
- Host: Any one IP of clusters nodes
- Port: Port which phxbinlogsvr is listening. like
17000
Execute phxbinlogsvr_tools_phxrpc -f RemoveMember -h<host> -p<port> -m <ip_of_nodeA>
to delete node A.
Once it is succesfully executed, A will not learn binlog after a small period.
- Execute
phxbinlogsvr_tools -f AddMember -h<host> -p<port> -m <ip_of_nodeA>
to add node A into the membership. - Install PhxSQL on A.
- A will begin to learn data after installation is finished.
- Copy a snapshot of MySQL from any other nodes to A.
- Kill phxbinlogsvr and access MySQL through the local port( or socket ). then execute
set super_read_only = 0; set read_only = 0
; - Dump the snapshot into MySQL.
- A will begin to work after a while.
Phxbinlogsvr
will pull the checkpoint in another node to reboot during reinstallation. It will self-kill after pulling is over(to make sure the consistency). You can reboot phxbinlogsvr
after a message like "All sm load state ok, start to exit"
appears.
phxbinlogsvr
will stop working if a data problem arise in MySQL. We suggest you to check the status of MySQL.
CPU : Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz * 24
Memory : 32G
Disk : SSD Raid10
Master -> Slave : 3 ~ 4ms
Client -> Master : 4ms
sysbench --oltp-tables-count=10 --oltp-table-size=1000000 --num-threads=500 --max-requests=100000 --report-interval=1 --max-time=200
Client Threads | Clusters | Test sets | |||||
---|---|---|---|---|---|---|---|
insert.lua (100% write) | select.lua (0% write) | OLTP.lua (20% write) | |||||
QPS | Response time(MS) | QPS | Response time(MS) | QPS | Response time(MS) | ||
200 | PhxSQL | 5076 | 39.34/56.93 | 46334 | 4.21/5.12 | 25657 | 140.16/186.39 |
200 | MySQL semi-sync | 4055 | 49.27/66.64 | 47528 | 4.10/5.00 | 20391 | 176.39/226.76 |
500 | PhxSQL | 8260 | 60.41/83.14 | 105928 | 4.58/5.81 | 46543 | 192.93/242.85 |
500 | MySQL semi-sync | 7072 | 70.60/91.72 | 121535 | 4.17/5.08 | 33229 | 270.38/345.84 |
NOTE:The 2 Response times means average and 95% percentile