Create eval run
Create a new evaluation run. This is the endpoint that will kick off grading.
Path parameters
eval_idstringRequired
The ID of the evaluation to create a run for.
Request body
namestring
The name of the run.metadataobject or null
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.data_sourceobject
Details about the run's data source.JsonlRunDataSourceobjectRequired
A JsonlRunDataSource object with that specifies a JSONL file that matches the evaltypestringRequiredDefaults: jsonl
The type of data source. Always
jsonl.jsonlstring
sourceobject
EvalJsonlFileContentSourceobjectRequired
typestringRequiredDefaults: file_content
The type of jsonl source. Always
file_content.file_contentstring
contentarrayRequired
The content of the jsonl file.itemsobject
itemobjectRequired
sampleobject
EvalJsonlFileIdSourceobjectRequired
typestringRequiredDefaults: file_id
The type of jsonl source. Always
file_id.file_idstring
idstringRequired
The identifier of the file.
CompletionsRunDataSourceobjectRequired
A CompletionsRunDataSource object describing a model sampling configuration.typestringRequiredDefaults: completions
The type of run data source. Always
completions.completionsstring
input_messagesobject
TemplateInputMessagesobject
typestringRequired
The type of input messages. Always
template.templatestring
templatearrayRequired
A list of chat messages forming the prompt or context. May include variable references to the "item" namespace, ie {{item.name}}.Input messageobject
A message input to the model with a role indicating instruction following hierarchy. Instructions given with the
developerorsystemrole take precedence over instructions given with theuserrole. Messages with theassistantrole are presumed to have been generated by the model in previous interactions.rolestringRequired
The role of the message input. One of
user,assistant,system, ordeveloper.userstring
assistantstring
systemstring
developerstring
contentstring or array
Text, image, or audio input to the model, used to generate a response. Can also contain previous assistant responses.Text inputstringRequired
A text input to the model.Input item content listarrayRequired
A list of one or many input items to the model, containing different content types.Input textobject
A text input to the model.typestringRequiredDefaults: input_text
The type of the input item. Always
input_text.input_textstring
textstringRequired
The text input to the model.
Input imageobject
An image input to the model. Learn about image inputs.
typestringRequiredDefaults: input_image
The type of the input item. Always
input_image.input_imagestring
image_urlstring or null
image_urlstring
The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.image_urlnull
file_idstring or null
file_idstring
The ID of the file to be sent to the model.file_idnull
detailstringRequired
The detail level of the image to be sent to the model. One of
high,low, orauto. Defaults toauto.lowstring
highstring
autostring
Input fileobject
A file input to the model.typestringRequiredDefaults: input_file
The type of the input item. Always
input_file.input_filestring
file_idstring or null
file_idstring
The ID of the file to be sent to the model.file_idnull
filenamestring
The name of the file to be sent to the model.file_datastring
The content of the file to be sent to the model.
typestring
The type of the message input. Always
message.messagestring
Eval message objectobject
A message input to the model with a role indicating instruction following hierarchy. Instructions given with the
developerorsystemrole take precedence over instructions given with theuserrole. Messages with theassistantrole are presumed to have been generated by the model in previous interactions.rolestringRequired
The role of the message input. One of
user,assistant,system, ordeveloper.userstring
assistantstring
systemstring
developerstring
contentstring or object
Text inputs to the model - can contain template strings.Text inputstringRequired
A text input to the model.Input textobjectRequired
A text input to the model.typestringRequiredDefaults: input_text
The type of the input item. Always
input_text.input_textstring
textstringRequired
The text input to the model.
Output textobjectRequired
A text output from the model.typestringRequired
The type of the output text. Always
output_text.output_textstring
textstringRequired
The text output from the model.
typestring
The type of the message input. Always
message.messagestring
ItemReferenceInputMessagesobject
typestringRequired
The type of input messages. Always
item_reference.item_referencestring
item_referencestringRequired
A reference to a variable in the "item" namespace. Ie, "item.name"
sampling_paramsobject
temperaturenumberDefaults: 1
A higher temperature increases randomness in the outputs.max_completion_tokensinteger
The maximum number of tokens in the generated output.top_pnumberDefaults: 1
An alternative to temperature for nucleus sampling; 1.0 includes all tokens.seedintegerDefaults: 42
A seed value to initialize the randomness, during sampling.
modelstring
The name of the model to use for generating completions (e.g. "o3-mini").sourceobject
EvalJsonlFileContentSourceobjectRequired
typestringRequiredDefaults: file_content
The type of jsonl source. Always
file_content.file_contentstring
contentarrayRequired
The content of the jsonl file.itemsobject
itemobjectRequired
sampleobject
EvalJsonlFileIdSourceobjectRequired
typestringRequiredDefaults: file_id
The type of jsonl source. Always
file_id.file_idstring
idstringRequired
The identifier of the file.
StoredCompletionsRunDataSourceobjectRequired
A StoredCompletionsRunDataSource configuration describing a set of filterstypestringRequiredDefaults: stored_completions
The type of source. Always
stored_completions.stored_completionsstring
metadataobject or null
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.modelstring or null
An optional model to filter by (e.g., 'gpt-4o').created_afterinteger or null
An optional Unix timestamp to filter items created after this time.created_beforeinteger or null
An optional Unix timestamp to filter items created before this time.limitinteger or null
An optional maximum number of items to return.
ResponsesRunDataSourceobjectRequired
A ResponsesRunDataSource object describing a model sampling configuration.typestringRequiredDefaults: completions
The type of run data source. Always
completions.completionsstring
input_messagesobject
input_messagesobject
typestringRequired
The type of input messages. Always
template.templatestring
templatearrayRequired
A list of chat messages forming the prompt or context. May include variable references to the "item" namespace, ie {{item.name}}.ChatMessageobject
rolestringRequired
The role of the message (e.g. "system", "assistant", "user").contentstringRequired
The content of the message.
Eval message objectobject
A message input to the model with a role indicating instruction following hierarchy. Instructions given with the
developerorsystemrole take precedence over instructions given with theuserrole. Messages with theassistantrole are presumed to have been generated by the model in previous interactions.rolestringRequired
The role of the message input. One of
user,assistant,system, ordeveloper.userstring
assistantstring
systemstring
developerstring
contentstring or object
Text inputs to the model - can contain template strings.Text inputstringRequired
A text input to the model.Input textobjectRequired
A text input to the model.typestringRequiredDefaults: input_text
The type of the input item. Always
input_text.input_textstring
textstringRequired
The text input to the model.
Output textobjectRequired
A text output from the model.typestringRequired
The type of the output text. Always
output_text.output_textstring
textstringRequired
The text output from the model.
typestring
The type of the message input. Always
message.messagestring
input_messagesobject
typestringRequired
The type of input messages. Always
item_reference.item_referencestring
item_referencestringRequired
A reference to a variable in the "item" namespace. Ie, "item.name"
sampling_paramsobject
temperaturenumberDefaults: 1
A higher temperature increases randomness in the outputs.max_completion_tokensinteger
The maximum number of tokens in the generated output.top_pnumberDefaults: 1
An alternative to temperature for nucleus sampling; 1.0 includes all tokens.seedintegerDefaults: 42
A seed value to initialize the randomness, during sampling.
modelstring
The name of the model to use for generating completions (e.g. "o3-mini").sourceobject
EvalJsonlFileContentSourceobjectRequired
typestringRequiredDefaults: file_content
The type of jsonl source. Always
file_content.file_contentstring
contentarrayRequired
The content of the jsonl file.itemsobject
itemobjectRequired
sampleobject
EvalJsonlFileIdSourceobjectRequired
typestringRequiredDefaults: file_id
The type of jsonl source. Always
file_id.file_idstring
idstringRequired
The identifier of the file.
EvalResponsesSourceobjectRequired
A EvalResponsesSource object describing a run data source configuration.typestringRequired
The type of run data source. Always
responses.responsesstring
metadataobject or null
Metadata filter for the responses. This is a query parameter used to select responses.modelstring or null
The name of the model to find responses for. This is a query parameter used to select responses.instructions_searchstring or null
Optional search string for instructions. This is a query parameter used to select responses.created_afterinteger or null
Only include items created after this timestamp (inclusive). This is a query parameter used to select responses.created_beforeinteger or null
Only include items created before this timestamp (inclusive). This is a query parameter used to select responses.has_tool_callsboolean or null
Whether the response has tool calls. This is a query parameter used to select responses.reasoning_effortstring or nullDefaults: medium
o-series models only
Constrains effort on reasoning for reasoning models. Currently supported values are
low,medium, andhigh. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.lowstring
mediumstring
highstring
temperaturenumber or null
Sampling temperature. This is a query parameter used to select responses.top_pnumber or null
Nucleus sampling parameter. This is a query parameter used to select responses.usersarray or null
List of user identifiers. This is a query parameter used to select responses.itemsstring
allow_parallel_tool_callsboolean or null
Whether to allow parallel tool calls. This is a query parameter used to select responses.
Response
The EvalRun object matching the specified ID.
1 curl https://api.openai.com/v1/evals/eval_67e579652b548190aaa83ada4b125f47/runs \2 -X POST \3 -H "Authorization: Bearer $OPENAI_API_KEY" \4 -H "Content-Type: application/json" \5 -d '{"name":"gpt-4o-mini","data_source":{"type":"completions","input_messages":{"type":"template","template":[{"role":"developer","content":"Categorize a given news headline into one of the following topics: Technology, Markets, World, Business, or Sports.\n\n# Steps\n\n1. Analyze the content of the news headline to understand its primary focus.\n2. Extract the subject matter, identifying any key indicators or keywords.\n3. Use the identified indicators to determine the most suitable category out of the five options: Technology, Markets, World, Business, or Sports.\n4. Ensure only one category is selected per headline.\n\n# Output Format\n\nRespond with the chosen category as a single word. For instance: \"Technology\", \"Markets\", \"World\", \"Business\", or \"Sports\".\n\n# Examples\n\n**Input**: \"Apple Unveils New iPhone Model, Featuring Advanced AI Features\" \n**Output**: \"Technology\"\n\n**Input**: \"Global Stocks Mixed as Investors Await Central Bank Decisions\" \n**Output**: \"Markets\"\n\n**Input**: \"War in Ukraine: Latest Updates on Negotiation Status\" \n**Output**: \"World\"\n\n**Input**: \"Microsoft in Talks to Acquire Gaming Company for $2 Billion\" \n**Output**: \"Business\"\n\n**Input**: \"Manchester United Secures Win in Premier League Football Match\" \n**Output**: \"Sports\" \n\n# Notes\n\n- If the headline appears to fit into more than one category, choose the most dominant theme.\n- Keywords or phrases such as \"stocks\", \"company acquisition\", \"match\", or technological brands can be good indicators for classification.\n"} , {"role":"user","content":"{{item.input}}"}]},"sampling_params":{"temperature":1,"max_completions_tokens":2048,"top_p":1,"seed":42},"model":"gpt-4o-mini","source":{"type":"file_content","content":[{"item":{"input":"Tech Company Launches Advanced Artificial Intelligence Platform","ground_truth":"Technology"}}]}}'
1 {2 "object": "eval.run",3 "id": "evalrun_67e57965b480819094274e3a32235e4c",4 "eval_id": "eval_67e579652b548190aaa83ada4b125f47",5 "report_url": "https://platform.openai.com/evaluations/eval_67e579652b548190aaa83ada4b125f47&run_id=evalrun_67e57965b480819094274e3a32235e4c",6 "status": "queued",7 "model": "gpt-4o-mini",8 "name": "gpt-4o-mini",9 "created_at": 1743092069,10 "result_counts": {11 "total": 0,12 "errored": 0,13 "failed": 0,14 "passed": 015 },16 "per_model_usage": null,17 "per_testing_criteria_results": null,18 "data_source": {19 "type": "completions",20 "source": {21 "type": "file_content",22 "content": [23 {24 "item": {25 "input": "Tech Company Launches Advanced Artificial Intelligence Platform",26 "ground_truth": "Technology"27 }28 }29 ]30 },31 "input_messages": {32 "type": "template",33 "template": [34 {35 "type": "message",36 "role": "developer",37 "content": {38 "type": "input_text",39 "text": "Categorize a given news headline into one of the following topics: Technology, Markets, World, Business, or Sports.\n\n# Steps\n\n1. Analyze the content of the news headline to understand its primary focus.\n2. Extract the subject matter, identifying any key indicators or keywords.\n3. Use the identified indicators to determine the most suitable category out of the five options: Technology, Markets, World, Business, or Sports.\n4. Ensure only one category is selected per headline.\n\n# Output Format\n\nRespond with the chosen category as a single word. For instance: \"Technology\", \"Markets\", \"World\", \"Business\", or \"Sports\".\n\n# Examples\n\n**Input**: \"Apple Unveils New iPhone Model, Featuring Advanced AI Features\" \n**Output**: \"Technology\"\n\n**Input**: \"Global Stocks Mixed as Investors Await Central Bank Decisions\" \n**Output**: \"Markets\"\n\n**Input**: \"War in Ukraine: Latest Updates on Negotiation Status\" \n**Output**: \"World\"\n\n**Input**: \"Microsoft in Talks to Acquire Gaming Company for $2 Billion\" \n**Output**: \"Business\"\n\n**Input**: \"Manchester United Secures Win in Premier League Football Match\" \n**Output**: \"Sports\" \n\n# Notes\n\n- If the headline appears to fit into more than one category, choose the most dominant theme.\n- Keywords or phrases such as \"stocks\", \"company acquisition\", \"match\", or technological brands can be good indicators for classification.\n"40 }41 },42 {43 "type": "message",44 "role": "user",45 "content": {46 "type": "input_text",47 "text": "{{item.input}}"48 }49 }50 ]51 },52 "model": "gpt-4o-mini",53 "sampling_params": {54 "seed": 42,55 "temperature": 1.0,56 "top_p": 1.0,57 "max_completions_tokens": 204858 }59 },60 "error": null,61 "metadata": {}62 }