Skip to content

Support PDF inputs for BYOK endpoints#323836

Open
AntonioLujanoLuna wants to merge 5 commits into
microsoft:mainfrom
AntonioLujanoLuna:main
Open

Support PDF inputs for BYOK endpoints#323836
AntonioLujanoLuna wants to merge 5 commits into
microsoft:mainfrom
AntonioLujanoLuna:main

Conversation

@AntonioLujanoLuna

Copy link
Copy Markdown

Summary

Adds explicit PDF input support for BYOK models using the Responses or
Messages APIs.

Previously, PDF attachments could be represented in chat prompts but were
lost while converting between raw prompt messages and VS Code language-model
messages. Custom models also had no way to advertise PDF support independently
of vision support.

This change:

  • Adds fileInputMimeTypes to the proposed language-model capabilities and
    Custom Endpoint model configuration.
  • Advertises application/pdf for configured Responses and Messages endpoints.
  • Preserves PDF data through raw-to-VS Code and VS Code-to-raw message
    conversion.
  • Converts PDFs into Anthropic document blocks, including tool-result
    content.
  • Keeps PDF input disabled for Chat Completions endpoints.
  • Adds regression tests for capability propagation, message conversion,
    prompt rendering, and endpoint-type gating.

How to test

  1. Configure a BYOK Custom Endpoint model with "apiType": "responses" or
    "apiType": "messages" and:

    "fileInputMimeTypes": ["application/pdf"]

Copilot AI review requested due to automatic review settings July 1, 2026 07:23

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class PDF file-input capability signaling and end-to-end PDF payload preservation for BYOK models (Responses/Messages APIs), including conversions to Anthropic document blocks and coverage across capability propagation, endpoint gating, and prompt rendering.

Changes:

  • Introduces fileInputMimeTypes as a proposed language model capability and wires it through VS Code model metadata and Copilot endpoint abstractions.
  • Preserves PDF payloads across raw↔VS Code message conversions and converts PDFs into Anthropic document blocks (including in tool results).
  • Adds regression tests for PDF capability propagation, prompt rendering, conversion correctness, and disabling for Chat Completions.

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/vscode-dts/vscode.proposed.languageModelCapabilities.d.ts Adds proposed fileInputMimeTypes on language model capabilities surface.
src/vscode-dts/vscode.proposed.chatProvider.d.ts Adds proposed fileInputMimeTypes to LanguageModelChatCapabilities for providers.
src/vs/workbench/contrib/chat/common/languageModels.ts Extends workbench-side chat model metadata to carry fileInputMimeTypes.
src/vs/workbench/api/common/extHostLanguageModels.ts Forwards fileInputMimeTypes between extension host and main thread model representations.
extensions/copilot/src/platform/networking/common/networking.ts Adds optional fileInputMimeTypes to Copilot endpoint interface.
extensions/copilot/src/platform/endpoint/vscode-node/test/extChatEndpoint.spec.ts Tests raw→VS Code conversion for PDF document parts.
extensions/copilot/src/platform/endpoint/vscode-node/extChatEndpoint.ts Converts raw PDF document parts into LanguageModelDataPart and exposes endpoint capability.
extensions/copilot/src/platform/endpoint/test/node/chatModelCapabilities.spec.ts Tests precedence rules for explicit PDF support vs legacy family/vision heuristics.
extensions/copilot/src/platform/endpoint/common/chatModelCapabilities.ts Implements modelSupportsPDFDocuments using explicit MIME types, with legacy fallback.
extensions/copilot/src/extension/prompts/node/panel/test/fileVariable.spec.ts Adds coverage ensuring PDF rendering works for explicitly configured custom models.
extensions/copilot/src/extension/prompts/node/panel/fileVariable.tsx Switches PDF gating to rely on modelSupportsPDFDocuments rather than supportsVision directly.
extensions/copilot/src/extension/conversation/vscode-node/test/languageModelAccess.test.ts Tests prompt rendering preserves PDF parts into raw document content parts.
extensions/copilot/src/extension/conversation/vscode-node/languageModelAccessPrompt.tsx Renders PDF LanguageModelDataPart as prompt-tsx <Document> blocks.
extensions/copilot/src/extension/byok/vscode-node/test/customEndpointProvider.spec.ts Tests file-input advertisement behavior across API types (Responses/Messages vs Chat Completions).
extensions/copilot/src/extension/byok/vscode-node/test/byokModelInfo.spec.ts Tests BYOK model info includes configured fileInputMimeTypes.
extensions/copilot/src/extension/byok/vscode-node/customEndpointProvider.ts Gates configured file inputs based on inferred/declared API type.
extensions/copilot/src/extension/byok/vscode-node/anthropicProvider.ts Advertises PDF input support for Anthropic BYOK models.
extensions/copilot/src/extension/byok/common/test/anthropicMessageConverter.spec.ts Adds test for PDF data part → Anthropic document block conversion.
extensions/copilot/src/extension/byok/common/byokProvider.ts Extends BYOK capabilities/model-info mapping to include fileInputMimeTypes.
extensions/copilot/src/extension/byok/common/anthropicMessageConverter.ts Converts PDF parts to Anthropic document blocks (including inside tool results).
extensions/copilot/package.json Adds fileInputMimeTypes to Custom Endpoint model configuration schema (PDF only).
Comments suppressed due to low confidence (1)

src/vs/workbench/api/common/extHostLanguageModels.ts:253

  • fileInputMimeTypes is forwarded only when truthy, so an explicit empty array is lost. That prevents providers from explicitly advertising “no supported file inputs” and can break endpoint-type gating where an empty list is meaningful.
					configurationSchema: m.configurationSchema as IJSONSchema | undefined,
					capabilities: m.capabilities ? {
						vision: m.capabilities.imageInput,
						...(m.capabilities.fileInputMimeTypes ? { fileInputMimeTypes: m.capabilities.fileInputMimeTypes } : {}),
						editTools: m.capabilities.editTools,
						toolCalling: !!m.capabilities.toolCalling,
						agentMode: !!m.capabilities.toolCalling
					} : undefined,

Comment on lines 165 to 170
capabilities: {
toolCalling: capabilities.toolCalling,
imageInput: capabilities.vision,
...(capabilities.fileInputMimeTypes ? { fileInputMimeTypes: capabilities.fileInputMimeTypes } : {}),
editTools: capabilities.editTools,
},
Comment on lines 462 to 467
capabilities: {
supportsImageToText: model.metadata.capabilities?.vision ?? false,
...(model.metadata.capabilities?.fileInputMimeTypes ? { fileInputMimeTypes: model.metadata.capabilities.fileInputMimeTypes } : {}),
supportsToolCalling: !!model.metadata.capabilities?.toolCalling,
editToolsHint: model.metadata.capabilities?.editTools,
},
@AntonioLujanoLuna

Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants