MySQL binlog slave in Rust
One of the things that was a bit magical to me was how various systems downstream of the database got their data. Things like indexing, caching, ElasticSearch, or other data pipelines were a bit of a blackbox to me.
At my previous startup, we used ElasticSearch to power the bank selection autocomplete feature. However, because the data rarely changes, we could rebuild the ElasticSearch index every time. If the data changes a lot or the data is larger, then rebuilding each time wouldn’t scale.
After digging a little bit, I decided to implement code to connect to MySQL as a slave and synchronize binlog from it. Preliminary search showed me that various systems at many companies are built on binlog. I also decided to use this opportunity to learn a little bit of Rust. The result is my fork of rust-mysql-simple here.
Connect to MySQL as Slave
The rust-mysql-simple project already has logic for connecting to a MySQL database. What I needed to do is to implement the COM_REGISTER_SLAVE and COM_BINLOG_DUMP MySQL commands.
I added register_as_slave
and request_binlog_stream
functions by following the MySQL documentation for what bytes to send. I also used go-mysql as reference. Apparently I should send bytes in little endian order.
Implementing the functionality isn’t too difficult, but I did get tripped up by Rust’s ownership model for a bit. Once I got past that though, I can see how Rust helps prevent memory problems. When I used to work in C++, memory ownership was always something to be very careful about. Especially if you were working in a multithreaded environment.
I didn’t implement handling everything MySQL sends in binlog for replication, but what I’ve implemented allowed me to see how these systems can be connected together. This was my main objective.
Facebook Wormhole
Connecting my knowledge now back to my experience at Facebook, I see that internally we have a system called Wormhole that acts as a pub/sub system to distribute DB changes to other systems.
It’s not trivial to build, but the overall concept makes sense once you understand how the “magical” stuff roughly works.
Other Systems in the Industry
While I was researching, I came across several other similar pipelines in the industry. For example, Uber’s Schemaless storage system, LinkedIn’s Databus, and Alibaba’s Canal.