Boost Compression With Higher-Order Linear Predictors (LPC)
Hey guys! Let's dive into an exciting feature that could seriously level up our compression game. We're talking about implementing Higher-Order Linear Predictors (LPC) to make PhantomCodec even more efficient. This is all about squeezing more out of our neural voltage data (LFP) while keeping CPU costs super low. Get ready for some serious compression gains!
Context & Motivation
Right now, we're using a pretty basic method called simple Delta Encoding (current - previous). While it works, it's not the most efficient, especially for neural voltage data (LFP). Think of LFP data as smooth, wavy signals. Simple delta encoding leaves a lot of untapped potential for better compression. Basically, we can do better, and that's what this feature is all about!
The Feature: Linear Predictive Coding (LPC)
Instead of just storing x[t] - x[t-1], we're going to store the prediction error. This means we predict the next value and then store the difference between the actual value and our prediction. Here's how it works:
Second-Order Predictor (Recommended)
For most of our use cases, a second-order predictor should be the sweet spot. It balances accuracy and computational cost effectively.
Predictor: P[t] = 2 * x[t-1] - x[t-2]
Stored Value: Error = x[t] - P[t]
This predictor is based on the idea that the wave will keep going at the same velocity (meaning the first derivative stays constant). This aligns really well with how LFP signals behave, making it a much better model than simple deltas. Basically, it's like saying, "Hey, if the signal is going up at this rate, it'll probably keep going up at about the same rate." This clever prediction reduces the amount of information we need to store, resulting in better compression.
Higher-Order Options
If you are feeling more adventurous, we could consider higher-order predictors. These can capture more complex patterns in the data, but they also come with a higher computational cost.
// 3rd Order (models constant acceleration)
P[t] = 3*x[t-1] - 3*x[t-2] + x[t-3]
// Adaptive LPC (for varying signal characteristics)
P[t] = a₁*x[t-1] + a₂*x[t-2] + ... + aₙ*x[t-n]
Why This Improves PhantomCodec
Let's break down why LPC is such a game-changer for PhantomCodec:
1. Smaller Residuals = Fewer Bits
The name of the game in compression is reducing the size of the data we need to store. The Error values we get from LPC will be significantly smaller (closer to 0) than the deltas we're currently using. And smaller numbers mean we need fewer bits when we use Varint/Rice/tANS to encode them. It's all about efficiency, guys!
2. Expected Compression Gain
Here's where things get really exciting. We're not just talking about a small improvement; we're aiming for a significant leap in compression.
- Current: With the simple delta + Rice setup, we're hitting around ~71% size.
- Target: Our goal is to reach ~50-60% size using LPC + Rice/tANS.
- This translates to roughly a ~20% improvement with minimal CPU impact.
3. Negligible Overhead
Now, you might be thinking, "This sounds great, but what's the catch?" Well, the beauty of LPC is that it's incredibly efficient. LPC prediction only involves a few multiplies and subtracts per sample—cheaper than a single memory access. So, we're getting a huge compression boost without bogging down the CPU. It's a win-win situation!
Implementation Plan
Alright, let's get down to the nitty-gritty of how we're going to implement this feature. We'll break it down into manageable phases to keep things organized.
Phase 1: Core LPC Module
First, we'll build the foundation by creating a dedicated LPC module.
- [ ] Create
src/lpc.rsmodule - [ ] Implement
compute_residuals_order2(samples: &[i16]) -> Vec<i16> - [ ] Implement
restore_from_residuals_order2(residuals: &[i16], initial: (i16, i16)) -> Vec<i16>
Phase 2: Integration with Existing Strategies
Next, we'll integrate the LPC module with our existing compression strategies.
- [ ] Add
PredictorModeenum:Delta,LPC2,LPC3,Adaptive - [ ] Modify
compress()to accept predictor mode parameter - [ ] Update block header to store predictor mode (2 bits)
Phase 3: Benchmarking
Of course, we need to put our changes to the test. We'll run benchmarks to see how LPC performs in the real world.
- [ ] Benchmark on MC_Maze dataset
- [ ] Compare compression ratios: Delta vs LPC2 vs LPC3
- [ ] Measure CPU overhead (should be <1% increase)
Phase 4: Auto-Selection (Optional)
For the grand finale, we could even implement a feature that automatically selects the best predictor based on the input data. This could further optimize compression without requiring any manual tweaking.
- [ ] Implement
select_best_predictor(samples: &[i16]) -> PredictorMode - [ ] Analyze first block to choose optimal predictor for the stream
Technical Specification
Let's get technical and outline the specifics of how we'll implement this feature.
Block Header Extension
We'll need to make a small adjustment to the block header to accommodate the predictor mode.
Existing Header:
[Strategy ID: 4 bits][Flags: 4 bits]
New Header:
[Strategy ID: 4 bits][Predictor: 2 bits][Flags: 2 bits]
Predictor Values:
0b00 = Delta (current behavior)
0b01 = LPC Order 2
0b10 = LPC Order 3
0b11 = Reserved/Adaptive
API Changes
We'll also need to update the API to allow users to specify the predictor mode.
pub struct CompressOptions {
pub strategy: StrategyId,
pub predictor: PredictorMode, // NEW
pub lossy_bits: Option<u8>,
}
pub fn compress_with_options(
samples: &[i16],
options: CompressOptions,
) -> Result<Vec<u8>, CodecError>;
Expected Performance
Alright, let's talk numbers. Here's a sneak peek at the performance gains we're expecting:
| Predictor | Residual Range | Avg Bits/Sample | Compression |
|---|---|---|---|
| Delta | ±200 | ~8-9 bits | ~71% |
| LPC2 | ±50 | ~6-7 bits | ~55% |
| LPC3 | ±30 | ~5-6 bits | ~50% |
Mathematical Background
For those of you who love the math behind the magic, here's a quick explanation of why LPC works so well for LFP signals.
For a signal following x[t] = A·sin(ωt + φ):
- Delta residual:
≈ A·ω·cos(ωt)(still oscillating) - LPC2 residual:
≈ A·ω²·sin(ωt)(much smaller for low frequencies)
Neural LFP signals are dominated by low frequencies (1-100 Hz), making LPC2 particularly effective.
Priority
Let's prioritize Phase 1 – It's a low-effort task that promises a high compression ratio gain. This will allow us to see the benefits of LPC quickly and efficiently.
Labels
enhancement compression good first issue