Following the recent significant rises in the prices of cryptocurrencies, Bitcoin’s price has rocketed over 200% in the past 3 months, while some of the major Altcoins have also observed spectacular growth. At the same time, the entire cryptocurrencies market is revived, crypto exchanges globally are showing high-speed growth, giving traders more choice than ever before. Regardless of it being established or newly emerged crypto exchanges, the process of ensuring the security, efficiency and stability of every single trade cannot be separated from the need for strong technological support. So what are the difficulties that are encountered when constructing a crypto exchange?
1. Breaking Down the Technological Difficulties of Crypto Exchanges
Without a doubt, the safety of a user’s assets is the number one priority as this is the lifeline of all crypto exchanges. Hacking attempts are unfortunately a nearly daily occurrence, with reports of hacks happening on even the most renowned exchanges. Just a few weeks ago, 7000 BTC was stolen from Binance, the third major hack on the platform, and there are countless more examples on other exchanges. Of course, besides security of assets, security of user information is also very important.
2. High Availability
Availability refers to the proportion of time that the operating system of the platform is in working order. Cryptocurrencies are being traded 24/7, so if a system were to malfunction, be it an abnormal exit of a process, a server breakdown or a broken network, these occurrences may prevent successful trading and the direct result is the customers’ loss. Trades, especially those with high leverage may simply be liquidated. It is the same when it comes to the maintenance and upgrading of the operating system, and many of the large exchanges are quite frequently shut down for maintenance. It would definitively be unacceptable for a serious trader if an exchange is down for maintenance when huge movements are happening in the market. Up until now has any crypto exchange managed to do it? According to my understanding, NO. With the major crypto exchanges included, it’s either frequent maintenance downtime or system malfunction or system overload. In order to achieve high availability, there will be a very high requirement on the design structure and failure tolerance support, and there will also be a need to guarantee that no errors occur during the execution process, as a single wrong entered command can destroy everything. It is also necessary to go through a complete combat failure recovery exercise, in order for the system to evolve to become highly stable and usable.
3. High Concurrency / High Throughput
The core indicator here is the Transactions Per Second (TPS). Because of the amplification effect of leverage, contract transactions are of a higher frequency for each customer than for coin to coin transactions. Quantitative trading orders through API will be much larger than for individual traders, especially those doing high-frequency trading. Once the concurrent order number exceeds the system load capacity, overload will occur and orders cannot be placed, which will result in missed trading opportunities or losses, or even unjustifiable liquidations. So, high concurrency is the ability that every crypto exchange is competing on, such as BitMEX, where overload often occurs.
4. Minimal Delays
Delay refers to the time from when the order is placed until the order status is received. If the order is executed immediately as a market order, then the time taken to receive each step of the transaction push is observed is called delay. Day traders, especially high-frequency traders, are very concerned about the delay problem, as even a 1 millisecond or even 1 microsecond delay may result in the loss of trading opportunities.
5. High Reliability
This refers to the reliability of the data, the guarantee that user data will not be lost or corrupted even in the extreme cases of program abnormalities, server downtime, hardware failures, computer room power outages, and even earthquakes. In order to maintain the reliability in case of anomalies and failures, programs and data are stored in multiple copies. How to make multiple copies of data consistent, especially strong consistency, meaning the multiple data remains consistent at any time, can be difficult to achieve.
2. How does Bybit Overcome These Difficulties? What Advantages Does it Have Over Other Exchanges?
Bybit has established cooperation with the well-known security team of the crypto community “SlowMist” , and has conducted detailed auditing and penetration testing, and multi-dimensional security protection in the following aspects;
Wallet wise, Bybit uses layered deterministic cold wallet and offline signature technology. The machine that stores the private key never connects to the network, making it impossible for hackers to hack. In addition, the internal process is also strictly controlled. It requires multiple signatures of multiple people to transfer coins. Each coin withdrawal is double-confirmed through automatic plus manual review. The signature server has independent access control and 24 hours camera monitoring, only permitted personnel may enter. There have never been any funds stolen from Bybit since it went online. Bybit also has quasi-real-time automatic reconciliation, taking snapshots of data every minute to reconcile, and in case of special circumstances, our technical team can also locate the problems in a timely basis.
Regarding vulnerability protection (including server vulnerabilities, network vulnerabilities, software vulnerabilities, account vulnerabilities, business logic vulnerabilities, etc.), the Bybit technical team and the SlowMist team have done a lot of auditing and penetration testing before and after going online to ensure the elimination of security vulnerabilities, in order to protect the security of the user’s account assets and information. In addition, Bybit also has a vulnerability report reward program. To date, there have been a number of white hats in the security community that have given feedback to Bybit about low-risk vulnerabilities. They have not caused damage to user accounts and trading systems, and no hackers have abused these vulnerabilities for profit. As for high-risk vulnerabilities, they have never happened in Bybit, but of course, we have been taking precautions against them happening.
Other aspects, such as internal R&D and operation and maintenance processes, data access rights, internal system permissions, office network and office equipment security, mailbox security, etc., have also been strictly controlled within Bybit.
2. High Availability
Bybit has achieved 99.99% availability, which means less than 52 minutes of non-tradable time a year. To achieve the availability of four nines, one must do the following:
1. Online Maintenance & Hot Updates.
Most exchanges require downtime for system maintenance and upgrades, resulting in the inability to trade and a poor user experience. Bybit does all maintenance and upgrades while staying online, and will never actively stop its servers. This necessitates high requirements for the technical architecture, program development, and operation and maintenance support. All data migration, service restart, operation and maintenance changes, etc. must be smoothed out to the stage where users do not notice it. Although it takes a high cost to do so, in order not to affect the user’s trading experience, these costs are very worthwhile, as we must prioritize the user’s experience and appraisal above everything else. In fact, we do 3-5 system iteration upgrades or changes every week, which users can hardly notice unless we upgrade the user experience or release new features.
2. Failure Recovery Architecture, Auto Failover (Automatic Failure Recovery Switching Mechanism) and Failure Preparedness Drills.
For large-scale services, often encountered are various types of errors and failures such as an abnormal exit or restarting of the program, server hardware failure and network failure. But in the event of these failures, to make sure that the service is continuously available, all services must be deployed in multiple copies or clustered. In the event of anomalies and failures, service switching or failure recovery must be automated to ensure high availability. There are many service clusters within Bybit, and automatic failure recovery switching is achieved through service discovery and leader election. And there is a set of high-performance and reliable message bus, which realizes the message confirmation and back-tracking mechanism very well. When the service is switched, the program can seamlessly switch according to the message sequence number. In addition, the failure preparedness drill is very important. It is just as dangerous to let the trainees who pass the theoretical test drive directly on the road.
Bybit message structure
3. Gray release. For some core features to be put online and/or updated, Bybit will generally use grayscale publishing. This is released to a small group of users to experience first (the user may also have no perception), generally it will be first given to active users who are willing to feedback questions and suggestions to Bybit. When there is no problem after experiencing it for a while, it will then gradually expand the user range until it is released to all users. This can minimize the potential of new problems or stability issues of the new features or new systems, which can greatly reduce the scope of impact. In the unlikely event that a user experiencing a new feature or a new system experiences some loss, the Bybit platform will also compensate the user in full.
4. Stereoscopic monitoring. Including hardware, network, service (health status, performance indicators, access times and frequency, error abnormality, etc.), client interface (error abnormality, page performance, network status, hardware and software environment, etc.), business indicators (registration volume, site traffic, Order amount, etc.). Bybit will monitor all valuable indicators, through pre-warning and quick post response, and provide corresponding responsible persons with different warning methods such as mail, SMS, and telephone for different levels of problems.
3. High Concurrency / High Throughput
Bybit’s matching service adopts C++ language, full memory design, adopts epoll network access, and can achieve single-threaded 100,000+ TPS per second (in fact, we removed the debug log and the pressure measurement has reached 300,000+ TPS), and can be tuned to a million TPS per second based on future needs. In addition to high-performance matching services, Bybit recently launched a new memory ordering system that can achieve 10,000 TPS per second for a single machine. If new machines are deployed, the total TPS=10000*n. It is important to note that the order logic of a perpetual contract is much more complicated than that of a currency transaction, because of the complicated ordering strategy such as margin calculation and reduce only. And the order service can be parallelized and provided by sharding. Each user has an independent coroutine to process the order. It will not affect the orders of other users because of the large order volume of some high-frequency trading users.
4. Minimal Delay
Matching single-threaded single coin with 100,000 TPS per second means that a single order matching is completed in 20 microseconds. Because it uses the Disruptor memory lock-free queue, which is invented by the world’s leading exchange LMax, and the technology of zero-copy memory, the order matching is the core part of the exchange, thus the ultimate performance must be pursued. In addition to the order matching, our quote system, ordering system, and push system have also completed the memory transformation and upgrade, and some of the position systems are also undergoing memory transformation. Our goal is starting from the next market order, to the order status, the transaction push and the position update push, the internal processing time of the entire transaction link is controlled within 5 milliseconds. Of course, from the perspective of the client interface, the time of the entire transaction will also include the network transmission time, which depends on the user’s network.
Disruptor/Ring Buffer Theory
5. High Reliability
This refers to the high reliability of data, and the fact that user data must be guaranteed to be correct under any circumstances and cannot be lost. All data must have redundant multiple copies, and can be recovered in the event of these failures, that is, the RPO needs to be 0, and no data loss is allowed. In most cases, we can do a seconds-level RTO, which means the automatic recovery of data and services is done in seconds. Bybit’s core message line enables data persistence and clustering, as long as incoming messages are not lost. The trace-ability of the message can support the system to recover data from any moment. For example, all the persistent order data after t time is lost. We can quickly find the message number n corresponding to time t, and start all the order messages from n+1 to realize the recovery of order data, and our system design. It can guarantee that the data generated by the replay is exactly the same as before the recovery.
Professional and experienced architects know that it is not that difficult to achieve any of the above, but it is very difficult to achieve high availability, low latency, high reliability and high concurrency all at the same time. Because for this issue there is a classic CAP theorem in theoretical computer science, for a distributed computing system, it is impossible to fulfill three points of consistency, availability, and partition fault tolerance. So how does Bybit solve this problem and achieve an optimal balance?
Firstly, to achieve high availability, it is best to make it as a stateless services, or as a cluster services, but stateless and cluster services mean that the state need to be frequently synchronized through the network, which is contradictory to low latency. To achieve a single round of 20 microseconds or even 10 microseconds, it is difficult to synchronize the state through the network.
Secondly, to achieve low latency, it must be a memory-based service, but memory means that if the process exits or the machine loses power data will be lost, thus it is unreliable. To achieve low latency and high reliability at the same time, while ensuring that data is not lost during power failure is not an easy feat.
Thirdly, to be highly reliable, the data must have redundant multiple copies. If there are multiple copies, there will be consistency problems. To achieve high availability at this point of time, a high level of data consistency must be achieved. So that when problems arise, redundant data is available in real time. This brings us back to the first question, to achieve strong consistency, the general approach is to make a cluster, but this requires synchronization of data through the network, at the expense of low latency.
How do we solve it? Take the most core matching engine as an example, the characteristics of the matching engine is status (large amount of data such as order book in memory) and must be single-point (single-threaded) processing one by one in order, strictly speaking, it cannot be disordered, of course neither can it be expanded in parallel. To be highly available, there must be multiple servers. When one is down, the other can provide services immediately. The difficulty is how to ensure that the data of the two matching engines is strong and consistent. The general approach is to make clusters, through the distributed consistency algorithm such as Paxos, Raft or Quorum to ensure strong consistency, but this depends on a large number of network communication to synchronize the state, obviously the delay of each processing will be much higher than the single processing 20 microseconds. And because the order matching can’t be expanded in parallel, it will definitely become the throughput bottleneck of the whole system. Therefore, if the matching system is made into a cluster, although high availability and strong consistency are satisfied, the partition fault tolerance is sacrificed and low latency and high concurrency are sacrificed. We have several clever designs here: a) Through the front-end message queue cluster to complete the request sequencing, where a globally unique order number is generated, the two matching engines subscribe to the message from here as input, so the input is completely consistent; b) The messages in Message Queuing Cluster C are persistent; c) Matching engines all use deterministic algorithms, such as generating transaction IDs without time information or random numbers, etc., to ensure that 2 matching engines output to the results of the respective message queue clusters are completely consistent; d) Each message of the matching result has a unique and consecutive sequence number, and the back-end application (such as the market server) can arbitrarily switch between the two data sources of A and B.
The uniqueness of above design is: 1) The processing of the order matching is not dependent on the network, it is partition-tolerant, and the delay can be extremely short 2) Because the delay of each processing is lower, the throughput becomes higher, and here is another bottleneck that is not scalable, so the concurrency of the entire system becomes higher; 3) High availability is achieved, and when any one of the servers is down, the back-end service can switch to another message queue cluster, processing starts from the next message; 4) The messages of the message queue cluster C are all persistent, and the message can be completely recovered by playing back the message from any moment to achieve high reliability. 5) Relying on the preexisting unique order number and message sequencing and deterministic algorithm, the two non-communicating matching engines can output the same result, thus achieving consistency. In this scenario, strong consistency can be considered.
For the order matching, this resolution seems to achieve C, A, and P at the same time, and can achieve low latency and high throughput. Does it break the CAP theorem? In fact it is no, the truth is this problem is thrown to the front message queue cluster, this cluster is still impossible to meet the CAP at the same time, so our entire system can not do CAP at the same time. But this is an exchange for high availability, high throughput, low latency, high reliability, strong consistency and zonal fault tolerance. As the core and non-scalable module of the entire trading platform, this design is considered to be almost perfect, because it meets almost all of our needs.
Bybit order matching failure recovery architecture
3. What are the Technical Challenges and how to Improve in the Future?
As an exchange that has been on the line for less than a year, Bybit is not inferior than the world’s leading exchanges in the above aspects, and even ahead of them in some aspects. However, to completely solve the above technical challenges, it is obvious that it cannot be overcome overnight. Many exchanges have spent a few years trying and still have not been able to resolve these issues. Bybit also needs to work hard, in areas such as continuing to optimizing delays, to achieve the goal of 5 milliseconds of the entire transaction link; and failure tolerance, and improving off-site failure recovery capabilities. With regards to the browser experience, our performance is still not perfect, more optimization is needed. We understand the importance of technical support for the exchange. Bybit’s operating philosophy is “client first, technology matters”. The good user experience is based on stable and reliable technology. We shoulder the mission and responsibility, carry the dream, the goal is to build an exchange that is technologically advanced and even ahead in the field of cryptocurrencies, and wins the trust and choice of users with our core competitiveness.