KeyError In Vllm Bench Serve With Percentile Metrics

Aug 6, 2025 by Kenji Nakamura 53 views

Bug Report: KeyError in `vllm bench serve` with Custom Percentile Metrics

Hey everyone! 👋 We've run into a bit of a snag while using vllm bench serve, and we wanted to share the details so we can squash this bug together! It seems like a KeyError pops up when we try to save results while specifying non-default --percentile-metrics. Let's dive into the specifics!

The Issue: KeyError During Result Saving

So, here's the deal. When you run vllm bench serve and tell it to calculate specific percentile metrics (like ttft and tpot) and then save the results, it throws a KeyError. This only happens when you deviate from the default percentile metrics. It's like the program is saying, "Hey, I don't know how to save these custom metrics!"

Steps to Reproduce 📝

To make sure you can see exactly what we're talking about, here’s how you can reproduce the bug:

First, make sure you have vllm installed and set up correctly. You know, the usual drill.
Run the following command:

vllm bench serve
--model Qwen/Qwen2.5-0.5B
--percentile-metrics ttft,tpot
--save-results ```

Make sure you have the model `Qwen/Qwen2.5-0.5B` or any other model you prefer. The important part is the `--percentile-metrics` flag with custom metrics and the `--save-results` flag.

Boom! 💥 You should see a KeyError in the output when it tries to save the results. It's not a good boom, trust me.

Diving Deeper: The Root Cause 🕵️‍♀️

Alright, let’s put on our detective hats and figure out what’s really going on. When you specify percentile metrics like ttft (Time To First Token) and tpot (Time Per Output Token), the benchmarking script calculates these metrics just fine. The problem arises when it tries to save these metrics to a file. It looks like there’s a part of the code that doesn’t know how to handle these custom percentile metrics when saving the results.

In essence, the script calculates the metrics, but the saving mechanism only anticipates the default set of metrics. It’s like having a fancy calculator that can do all sorts of calculations, but the “save” button only works for basic math. Not very helpful, right?

Impact and Why It Matters ⚠️

So, why is this bug important? Well, when you're benchmarking your models, you often want to track specific metrics that are important for your use case. For example, ttft is crucial for interactive applications where the initial response time is critical. If you can’t save these metrics, it makes it harder to compare different models or configurations and optimize your setup.

Imagine you’re trying to fine-tune your model for the fastest possible response time. You run a bunch of benchmarks with different settings, carefully tweaking parameters. But then, you can’t save the results for your custom metrics! It’s like trying to bake a cake without a recipe – you might get something tasty, but you won’t know exactly what you did right.

Possible Culprits and Potential Fixes 🛠️

Okay, let’s brainstorm some potential fixes! Based on the error, it seems like the issue lies in the result-saving part of the script. Here are a few things we might want to check:

The Saving Function: We need to look at the function that actually saves the benchmark results. It probably has a hardcoded list of metrics it knows how to save.
Handling Custom Metrics: We need to make sure that the saving function can handle arbitrary percentile metrics specified by the user.
Data Structures: It’s possible that the data structure used to store the results isn’t flexible enough to accommodate custom metrics. Maybe we need to use a dictionary or a more adaptable structure.

Here’s a rough idea of how we might fix this:

Step 1: Locate the function responsible for saving the benchmark results. This might be in a separate module or class.
Step 2: Identify where the code assumes a fixed set of metrics. Look for hardcoded lists or dictionaries.
Step 3: Modify the code to dynamically handle percentile metrics. This might involve using a loop to iterate over the specified metrics and save them individually.
Step 4: Ensure the saved data includes the names and values of the custom metrics.

Our Environment 🌍

To give you a complete picture, here’s the output of python collect_env.py from our environment:

Your output of `python collect_env.py` here

This will help the vllm team (and anyone else who wants to help) understand the context in which the bug occurred. Knowing the versions of Python, CUDA, and other relevant libraries can be super helpful in tracking down the issue.

In Conclusion: Let's Fix This Together! 🤝

So, that’s the scoop on the KeyError bug in vllm bench serve. It’s a bit of a nuisance, but with a little teamwork, we can definitely get it sorted out! If you have any insights, suggestions, or even better, a fix, please chime in! Let’s make vllm even more awesome together!

Your current environment

To give you a complete picture, here’s the output of python collect_env.py from our environment:

Your output of `python collect_env.py` here

🐛 Describe the bug

When using the vllm bench serve command with custom percentile metrics and the --save-results option, a KeyError occurs during the saving of results. This issue prevents users from effectively saving and analyzing benchmark results with specific percentile metrics.

To reproduce the bug:

Run the following command:

vllm bench serve
--model Qwen/Qwen2.5-0.5B
--percentile-metrics ttft,tpot
--save-results ```

Observe the KeyError that occurs when the script attempts to save the results.

Before submitting a new issue...

[x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.