vllm.v1.core.sched.output ¶
CachedRequestData dataclass ¶
Source code in vllm/v1/core/sched/output.py
_req_id_to_num_output_tokens cached property ¶
Cache mapping of req_id to num_output_tokens for O(1) lookup.
This cached property is safe because CachedRequestData instances are created fresh each scheduling iteration and not mutated during computation of iteration details.
__init__ ¶
__init__(
req_ids: list[str],
resumed_req_ids: set[str],
new_token_ids: list[list[int]],
all_token_ids: dict[str, list[int]],
new_block_ids: list[tuple[list[int], ...] | None],
num_computed_tokens: list[int],
num_output_tokens: list[int],
) -> None
anon_repr ¶
anon_repr() -> str
Source code in vllm/v1/core/sched/output.py
is_context_phase ¶
make_empty classmethod ¶
make_empty() -> CachedRequestData
GrammarOutput dataclass ¶
Source code in vllm/v1/core/sched/output.py
NewRequestData dataclass ¶
Source code in vllm/v1/core/sched/output.py
__init__ ¶
__init__(
req_id: str,
prompt_token_ids: list[int] | None,
mm_features: list[MultiModalFeatureSpec],
sampling_params: SamplingParams | None,
pooling_params: PoolingParams | None,
block_ids: tuple[list[int], ...],
num_computed_tokens: int,
lora_request: LoRARequest | None,
prompt_embeds: Tensor | None = None,
prefill_token_ids: list[int] | None = None,
) -> None
__repr__ ¶
__repr__() -> str
Source code in vllm/v1/core/sched/output.py
anon_repr ¶
anon_repr() -> str
Source code in vllm/v1/core/sched/output.py
from_request classmethod ¶
from_request(
request: Request,
block_ids: tuple[list[int], ...],
prefill_token_ids: list[int] | None = None,
) -> NewRequestData
Source code in vllm/v1/core/sched/output.py
SchedulerOutput dataclass ¶
Source code in vllm/v1/core/sched/output.py
182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 | |
ec_connector_metadata class-attribute instance-attribute ¶
ec_connector_metadata: ECConnectorMetadata | None = None
has_structured_output_requests class-attribute instance-attribute ¶
has_structured_output_requests: bool = False
kv_connector_metadata class-attribute instance-attribute ¶
kv_connector_metadata: KVConnectorMetadata | None = None
num_invalid_spec_tokens class-attribute instance-attribute ¶
pending_structured_output_tokens class-attribute instance-attribute ¶
pending_structured_output_tokens: bool = False
scheduled_spec_decode_tokens instance-attribute ¶
__init__ ¶
__init__(
scheduled_new_reqs: list[NewRequestData],
scheduled_cached_reqs: CachedRequestData,
num_scheduled_tokens: dict[str, int],
total_num_scheduled_tokens: int,
scheduled_spec_decode_tokens: dict[str, list[int]],
scheduled_encoder_inputs: dict[str, list[int]],
num_common_prefix_blocks: list[int],
finished_req_ids: set[str],
free_encoder_mm_hashes: list[str],
preempted_req_ids: set[str] | None = None,
has_structured_output_requests: bool = False,
pending_structured_output_tokens: bool = False,
num_invalid_spec_tokens: dict[str, int] | None = None,
kv_connector_metadata: KVConnectorMetadata
| None = None,
ec_connector_metadata: ECConnectorMetadata
| None = None,
scheduler_step: int = 0,
) -> None
make_empty classmethod ¶
make_empty() -> SchedulerOutput