Amazon RDS skip replication errors by repositioning slave

When there is a replication error in MySQL you can normally do

set global sql_slave_skip_counter=1; start slave;

and replication will run again.

On Amazon RDS instances the standard instruction to skip a replication is not allowed. But there is a procedure you can call like this

CALL mysql.rds_skip_repl_error;

Recently I found our RDS instance replication slave stopped on an error when trying to change a mysql user. On RDS you don’t have the SUPER rights and therefor some actions are not allowed. This was the slave status

*************************** 1. row ***************************
 Slave_IO_State: Waiting for master to send event
 Master_Host: 172.17.40.12
 Master_User: replication
 Master_Port: 8061
 Connect_Retry: 60
 Master_Log_File: mysql-bin.008964
 Read_Master_Log_Pos: 291508012
 Relay_Log_File: relaylog.038300
 Relay_Log_Pos: 995634
 Relay_Master_Log_File: mysql-bin.008960
 Slave_IO_Running: Yes
 Slave_SQL_Running: No
 Replicate_Do_DB: 
 Replicate_Ignore_DB: 
 Replicate_Do_Table: 
 Replicate_Ignore_Table: mysql.plugin,mysql.rds_monitor,mysql.rds_sysinfo,mysql.rds_replication_status,mysql.rds_history,innodb_memcache.config_options,innodb_memcache.cache_policies
 Replicate_Wild_Do_Table: 
 Replicate_Wild_Ignore_Table: 
 Last_Errno: 1064
 Last_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IDENTIFIED WITH 'mysql_native_password' AS '*5AD1182583E7AA6A5DB40F1D258C60EE60A' at line 1' on query. Default database: ''. Query: 'ALTER USER 'root'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*5AD1182583E7AA6A5DB40F1D258C60EE60A50B17' PASSWORD EXPIRE NEVER'
 Skip_Counter: 0
 Exec_Master_Log_Pos: 924431317
 Relay_Log_Space: 3682220649
 Until_Condition: None
 Until_Log_File: 
 Until_Log_Pos: 0
 Master_SSL_Allowed: No
 Master_SSL_CA_File: 
 Master_SSL_CA_Path: 
 Master_SSL_Cert: 
 Master_SSL_Cipher: 
 Master_SSL_Key: 
 Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
 Last_IO_Errno: 0
 Last_IO_Error: 
 Last_SQL_Errno: 1064
 Last_SQL_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IDENTIFIED WITH 'mysql_native_password' AS '*5AD1182583E7AA6A5DB40F1D258C60EE60A' at line 1' on query. Default database: ''. Query: 'ALTER USER 'root'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*5AD1182583E7AA6A5DB40F1D258C60EE60A50B17' PASSWORD EXPIRE NEVER'
 Replicate_Ignore_Server_Ids: 
 Master_Server_Id: 12
 Master_UUID: d56c1b0d-e30f-11e6-8587-002590d648a8
 Master_Info_File: mysql.slave_master_info
 SQL_Delay: 0
 SQL_Remaining_Delay: NULL
 Slave_SQL_Running_State: 
 Master_Retry_Count: 86400
 Master_Bind: 
 Last_IO_Error_Timestamp: 
 Last_SQL_Error_Timestamp: 170418 15:48:02
 Master_SSL_Crl: 
 Master_SSL_Crlpath: 
 Retrieved_Gtid_Set: 
 Executed_Gtid_Set: 
 Auto_Position: 0

OK so it did not accept the ALTER USER statement, no problem we can skip it and solve it differently on RDS. However no matter how often I issued the rds_skip_repl_error call it would stay in error. I noticed the Exec_Master_Log_Pos not changing.

I have no clue why it would not skip the error. I decided to skip the error myself by repositioning the slave to the next log position.

I checked the current master binlog file and relay position from the slave status

Relay_Master_Log_File: mysql-bin.008960
Exec_Master_Log_Pos: 924431317

On the master database I opened the binlog to see what the next postion number would be

sudo mysqlbinlog --start-position=924431317 /mysql2/mysql_logs/mysql-bin.008960 | more

This showed the following output

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#170418 6:51:41 server id 12 end_log_pos 123 CRC32 0x34124130 Start: binlog v 4, server v 5.7.17-log created 170418 6:51:41
BINLOG '
XZv1WA8MAAAAdwAAAHsAAAAAAAQANS43LjE3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA
ATBBEjQ=
'/*!*/;
# at 924431317
#170418 10:25:45 server id 17 end_log_pos 924431382 CRC32 0xd99f7dea Anonymous_GTID last_committed=640906 sequence_number=640907
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 924431382
#170418 10:25:45 server id 17 end_log_pos 924431583 CRC32 0xa0a09ddd Query thread_id=8348 exec_time=7 error_code=0
SET TIMESTAMP=1492503945/*!*/;
SET @@session.pseudo_thread_id=8348/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
ALTER USER 'root'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*5AD1182583E7AA6A5DB40F1D258C60EE60A50B17' PASSWORD EXPIRE NEVER
/*!*/;
# at 924431583
#170418 10:25:52 server id 13 end_log_pos 924431648 CRC32 0x2363cf6d Anonymous_GTID last_committed=640907 sequence_number=640908
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 924431648
#170418 10:25:52 server id 13 end_log_pos 924431730 CRC32 0x776144a5 Query thread_id=43003 exec_time=0 error_code=0
SET TIMESTAMP=1492503952/*!*/;
SET @@session.sql_mode=4194304/*!*/;
/*!\C utf8mb4 *//*!*/;
SET @@session.character_set_client=45,@@session.collation_connection=45,@@session.collation_server=33/*!*/;
BEGIN
/*!*/;
# at 924431730
....

In this output I could indeed 924431583.

Next step was to point the RDS server to this new position in the same master binlog file. Although replication was in error I first had to explicitly stop the slave:

CALL mysql.rds_stop_replication;

Then give a new master position

CALL mysql.rds_set_external_master (
 '172.17.40.12' -- slave's master address here 
 , 8061
 , 'replication'
 , 'xxxxxxxx'
 , 'mysql-bin.008960' -- mysql_binary_log_file_name
 , 924431583 -- mysql_binary_log_file_location
 , 0
);

And RDS slave was running fine again 🙂

Leave a comment