Fix: S2S Compression Bug Causes Panic Near 2MiB Limit
Hey guys! Today, we're diving deep into a tricky bug that can cause some serious headaches in your server-to-server (s2s) streaming. This bug surfaces when dealing with compression, specifically when the compression process inadvertently expands data near the 2 MiB frame limit, leading to a panic. Let's break down the issue, understand the root cause, and explore a recommended fix to keep your services running smoothly.
Detail Bug Report
For those who want to dive into the nitty-gritty details, you can check out the original bug report here: https://app.detail.dev/org_89d327b3-b883-4365-b6a3-46b6701342a9/bugs/bug_7019df41-099b-4651-a159-8165f7cbc7cf
Summary: The Heart of the Matter
- Context: The s2s streaming protocol, found in
api/src/v1/stream/s2s.rs, uses compression (specifically Gzip and Zstd) to make protobuf messages smaller and faster to transmit. This is all about optimizing efficiency, reducing bandwidth usage, and speeding things up between servers. - Bug: Here's the catch: when compressing large chunks of data that are close to the 2 MiB frame limit, the compression process itself can sometimes increase the overall size of the data. This happens because compression algorithms add overhead. If the compressed data exceeds the
MAX_FRAME_BYTESlimit, theSessionMessage::encode()function throws a panic, effectively crashing the server. This is particularly problematic when dealing with data that's already compressed, encrypted, or just inherently random, as these types of data don't compress well and can easily exceed the limit after the compression overhead is added. - Actual vs. Expected: The
CompressedData::compress()method is designed to compress data when the payload size is greater than or equal to 1024 bytes. However, it always compresses the data if this condition is met, even if the compression makes the payload larger. The expected behavior is that the method should intelligently detect when compression isn't beneficial (i.e., it increases the payload size or causes it to exceed the frame size limit) and fall back to sending the data uncompressed. This would avoid the panic and ensure that the data is still transmitted, albeit without the benefits of compression. - Impact: The practical impact of this bug is significant. When servers encounter large, incompressible payloads in the s2s streaming protocol, they can crash. This can lead to service disruptions, data loss, and a generally poor user experience. Imagine a scenario where your servers are exchanging critical data, and suddenly, one of them crashes due to this compression issue. It's not a pretty picture!
Code with Bug: Spotting the Culprit
The problematic code lies within the CompressedData::compress() method in api/src/v1/stream/s2s.rs:
fn compress(compression: CompressionAlgorithm, data: Vec<u8>) -> std::io::Result<Self> {
if compression == CompressionAlgorithm::None || data.len() < COMPRESSION_THRESHOLD_BYTES {
return Ok(Self {
compression: CompressionAlgorithm::None,
payload: data.into(),
});
}
let mut buf = Vec::with_capacity(data.len());
match compression {
CompressionAlgorithm::Gzip => {
let mut encoder = GzEncoder::new(buf, Compression::default());
encoder.write_all(data.as_slice())?;
buf = encoder.finish()?;
}
CompressionAlgorithm::Zstd => {
zstd::stream::copy_encode(data.as_slice(), &mut buf, 0)?;
}
CompressionAlgorithm::None => unreachable!("handled above"),
};
let payload = Bytes::from(buf.into_boxed_slice());
Ok(Self {
compression, // <-- BUG 🔴 returns compressed even if larger/exceeds frame limit
payload,
})
}
The issue is that the code unconditionally returns the compressed data, even if it's larger than the original or exceeds the frame limit.
Later, in SessionMessage::encode():
pub fn encode(&self) -> Bytes {
let encoded_size = FLAG_TOTAL_SIZE + self.payload_size();
assert!(
encoded_size <= MAX_FRAME_BYTES,
"payload exceeds encoder limit"
);
// ...
}
This assertion is where the panic occurs if the encoded size exceeds the MAX_FRAME_BYTES limit.
Evidence (Example): Seeing is Believing
Let's consider a payload of 2,097,100 bytes of random (incompressible) data:
- Input: Uncompressed data = 2,097,100 bytes
- After compression (gzip): ~353 bytes overhead → 2,097,453 bytes
- Encoded size:
FLAG_TOTAL_SIZE (1) + 2,097,453 = 2,097,454bytes - Frame limit:
2,097,454 > MAX_FRAME_BYTES (2,097,152)→ panic atencode()assertion - If uncompressed:
1 + 2,097,100 = 2,097,101bytes → fits within limit
This clearly demonstrates that forcing compression can push near-limit payloads beyond the frame size, causing a panic. However, if the data were sent uncompressed, it would fit within the limit, and the process would succeed.
Recommended Fix: A Smart Solution
The solution is to modify the CompressedData::compress() method to be more intelligent about when it applies compression. Specifically, it should check if compression actually reduces the size of the data or if it would cause the frame size limit to be exceeded. If either of these conditions is true, the method should fall back to sending the data uncompressed.
Here's the proposed fix:
fn compress(compression: CompressionAlgorithm, data: Vec<u8>) -> std::io::Result<Self> {
if compression == CompressionAlgorithm::None || data.len() < COMPRESSION_THRESHOLD_BYTES {
return Ok(Self {
compression: CompressionAlgorithm::None,
payload: data.into(),
});
}
let mut buf = Vec::with_capacity(data.len());
match compression {
CompressionAlgorithm::Gzip => {
let mut encoder = GzEncoder::new(buf, Compression::default());
encoder.write_all(data.as_slice())?;
buf = encoder.finish()?;
}
CompressionAlgorithm::Zstd => {
zstd::stream::copy_encode(data.as_slice(), &mut buf, 0)?;
}
CompressionAlgorithm::None => unreachable!("handled above"),
};
let compressed_frame_size = FLAG_TOTAL_SIZE + buf.len();
if buf.len() >= data.len() || compressed_frame_size > MAX_FRAME_BYTES { // <-- FIX 🟢 fallback when compression expands or would overflow frame
return Ok(Self {
compression: CompressionAlgorithm::None,
payload: data.into(),
});
}
Ok(Self {
compression,
payload: Bytes::from(buf.into_boxed_slice()),
})
}
Explanation of the Fix
-
Check Compression Efficiency:
- The line
if buf.len() >= data.len() || compressed_frame_size > MAX_FRAME_BYTESis the heart of the fix. It checks two crucial conditions:First,buf.len() >= data.len()determines whether the compressed data (buf.len()) is larger than or equal to the original data (data.len()). If compression increases the size, it's not beneficial, and we should skip it.Second,compressed_frame_size > MAX_FRAME_BYTESchecks if the size of the compressed data, including any framing overhead (FLAG_TOTAL_SIZE), exceeds the maximum allowed frame size (MAX_FRAME_BYTES). If it does, we also skip compression.
- The line
-
Fallback to Uncompressed:
- If either of the above conditions is true, the code executes the
return Ok(Self { ... })block. This creates aCompressedDatainstance withcompression: CompressionAlgorithm::None, indicating that the data will be sent uncompressed, andpayload: data.into(), which contains the original, uncompressed data.
- If either of the above conditions is true, the code executes the
-
Proceed with Compression (if Beneficial):
- If both conditions are false (i.e., compression reduces the size and the compressed data fits within the frame limit), the code proceeds to create a
CompressedDatainstance with the compressed data and the appropriate compression algorithm.
- If both conditions are false (i.e., compression reduces the size and the compressed data fits within the frame limit), the code proceeds to create a
Benefits of the Fix
- Prevents Panics: By intelligently deciding when to compress, the fix avoids the scenario where compression leads to exceeding the frame size limit, thus preventing panics and server crashes.
- Maintains Functionality: Even when compression is skipped, the data is still transmitted, ensuring that the s2s communication remains functional.
- Optimizes Bandwidth (When Possible): The fix still leverages compression when it's beneficial, reducing bandwidth usage and improving performance.
By implementing this fix, you can ensure that your s2s streaming protocol is more robust and reliable, especially when dealing with large, potentially incompressible payloads. Happy coding, folks!