Solana Internals Part 3
The Transaction Processing Unit (TPU)
January 23, 2022

Solana recently experienced severe performance degradation due to network congestion. The TPS (number of transactions processed per second) dropped by orders of magnitude (from thousands to tens) for several hours.

Technically, this problem is caused by performance bugs in Solana, in particular — the transaction processing unit (TPU). During market volatility, bots are heavily spraying duplicate spam and that bogs down the TPU.

This article elaborates on the design of the TPU and highlights some intricacies.

Transactions

What Is Included In a Transaction?

When a user submits a transaction, it includes a pre-compiled representation of a sequence of instructions, called “message:

The message must be signed by one or or more keypairs:

The signed signatures are also included in the transaction, and together with the message content, are sent to the Solana cluster via RPCRequest:

The Transaction Processing Unit

Upon receiving a transaction, the TPU has three main stages to process it.

  1. fetch_stage batches input from a UDP socket and sends it to 2.
  2. sigverify_stage verifies if the signature in the transaction is valid and send the transaction to 3.
  3. banking_stage processes the verified transaction
All these three stages are executed by different threads communicated via message passing using crossbeam_channel (a multi-producer multi-consumer channel).

1. fetch_stage

The TPU creates a channel of unbounded capacity with (packet_sender, packet_receiver):

The fetch_stage reads the packets on the transaction sockets, and simply forwards them to the sigverify_stage using packet_sender .

2. sigverify_stage

The sigverify_stage receives the transaction packets from packet_receiver and uses TransactionSigVerifier to verify if the signature in each packet is valid.

It assumes each packet contains one transaction, and the packets are verified in parallel using all available CPU cores (and it can also be done on GPU if available).

Note that the TPU creates another channel (verified_sender, verified_receiver), and it uses verified_sender to forward the verified transactions to the next stage (banking_stage).

The verifier is of significant interest
It not only verifies the signature but is also piggybacked to filter out redundant packets and discard excessive packets in order to improve performance. The fixes to the recent performance degradation are applied in this component.

It contains three steps:

  • deduper — The filter that removes duplicated transactions (typically sent by bots)
  • discard_excess_packets — The filter that discards excessive packets from each IP address. It groups packets by IP addresses, and allocates max_packets evenly across addresses.
  • verify_batches — it uses ed25519_dalek to verify message signatures in those packets that are not discarded in the previous steps.

The discard_excess_packets function is defined as:

The ed25519_dalek::PublicKey.verify function is defined as:

It takes a signature and a message as input, and verifies the signature with respect to the message using the key pair’s public key.

Note that the ed25519_dalek::PublicKey.verify function is non-trivial and subtle, and it is not audited.

3. banking_stage

The banking_stage creates a thread which executes in a loop to process the received transactions batch by batch. The number of transactions in each batch is limited by

The banking_stage uses an important component called bank to load and execute transactions. The function is defined as:

For each transaction, the bank uses MessageProcessor to process the transaction message:

This method calls each instruction in the message over the set of loaded accounts.

For each instruction, it calls the program entrypoint and verifies that the result of the call does not violate the bank’s accounting rules.

Internally, the bank creates an InvokeContext to execute each instruction:

Each transaction has a limited compute budget (by default 200_000 units), defined in ComputeBudget :

The bank involves a lot of complications to execute an instruction, such as

  • loading the specified programs
  • creating the rbpf vm to execute the bfp code
  • dealing with CPI (cross-program invocation) via syscalls
  • verifying that the called programs have not misbehaved
  • measuring the computing units, etc.

We will elaborate on these details and the bank life cycle in the next article.


About sec3 (Formerly Soteria)

sec3 is a security research firm that prepares Solana projects for millions of users. sec3’s Launch Audit is a rigorous, researcher-led code examination that investigates and certifies mainnet-grade smart contracts; sec3’s continuous auditing software platform, X-ray, integrates with GitHub to progressively scan pull requests, helping projects fortify code before deployment; and sec3’s post-deployment security solution, WatchTower, ensures funds stay safe. sec3 is building technology-based scalable solutions for Web3 projects to ensure protocols stay safe as they scale.

To learn more about sec3, please visit https://www.sec3.dev