vllm.v1.engine ¶
Modules:
| Name | Description |
|---|---|
async_llm | |
coordinator | |
core | |
core_client | |
detokenizer | |
exceptions | |
input_processor | |
llm_engine | |
logprobs | |
output_processor | |
parallel_sampling | |
utils | |
FINISH_REASON_STRINGS module-attribute ¶
EngineCoreOutput ¶
Bases: Struct
Source code in vllm/v1/engine/__init__.py
kv_transfer_params class-attribute instance-attribute ¶
new_prompt_logprobs_tensors class-attribute instance-attribute ¶
new_prompt_logprobs_tensors: LogprobsTensors | None = None
EngineCoreOutputs ¶
Bases: Struct
Source code in vllm/v1/engine/__init__.py
journey_events class-attribute instance-attribute ¶
journey_events: list[RequestJourneyEvent] | None = None
EngineCoreRequest ¶
Bases: Struct
Source code in vllm/v1/engine/__init__.py
params property ¶
params: SamplingParams | PoolingParams
Return the processed params (sampling or pooling).
EngineCoreRequestType ¶
Bases: Enum
Request types defined as hex byte strings, so it can be sent over sockets without separate encoding step.
Source code in vllm/v1/engine/__init__.py
FinishReason ¶
Bases: IntEnum
Reason a request finished - stop, length, abort, or error.
Int rather than Str for more compact serialization.
stop - a stop string was emitted length - max_tokens was consumed, or max_model_len was reached abort - aborted by client error - retryable request-level internal error (e.g., KV load failure). Invariant: always converted to 500 Internal Server Error.
Source code in vllm/v1/engine/__init__.py
ReconfigureDistributedRequest ¶
Bases: Struct
Source code in vllm/v1/engine/__init__.py
ReconfigureRankType ¶
Bases: IntEnum
Rank type for reconfiguring distributed request.
Source code in vllm/v1/engine/__init__.py
UtilityOutput ¶
Bases: Struct