Create eval run

POSThttps:/api.openai.com/v1/evals/{eval_id}/runs

Create a new evaluation run. This is the endpoint that will kick off grading.

Path parameters

eval_id
string
Required
The ID of the evaluation to create a run for.

Request body

name
string
The name of the run.
metadata
object or null
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
data_source
object
Details about the run's data source.
- JsonlRunDataSource
  object
  Required
  A JsonlRunDataSource object with that specifies a JSONL file that matches the eval
  type
  string
  Required
  Defaults: jsonl
  The type of data source. Always jsonl.
  jsonl
  string
  source
  object
  EvalJsonlFileContentSource
  object
  Required
  type
  string
  Required
  Defaults: file_content
  The type of jsonl source. Always file_content.
  file_content
  string
  content
  array
  Required
  The content of the jsonl file.
  items
  object
  item
  object
  Required
  sample
  object
  EvalJsonlFileIdSource
  object
  Required
  type
  string
  Required
  Defaults: file_id
  The type of jsonl source. Always file_id.
  file_id
  string
  id
  string
  Required
  The identifier of the file.
- CompletionsRunDataSource
  object
  Required
  A CompletionsRunDataSource object describing a model sampling configuration.
  type
  string
  Required
  Defaults: completions
  The type of run data source. Always completions.
  completions
  string
  input_messages
  object
  TemplateInputMessages
  object
  type
  string
  Required
  The type of input messages. Always template.
  template
  string
  template
  array
  Required
  A list of chat messages forming the prompt or context. May include variable references to the "item" namespace, ie {{item.name}}.
  Input message
  object
  A message input to the model with a role indicating instruction following hierarchy. Instructions given with the developer or system role take precedence over instructions given with the user role. Messages with the assistant role are presumed to have been generated by the model in previous interactions.
  role
  string
  Required
  The role of the message input. One of user, assistant, system, or developer.
  user
  string
  assistant
  string
  system
  string
  developer
  string
  content
  string or array
  Text, image, or audio input to the model, used to generate a response. Can also contain previous assistant responses.
  Text input
  string
  Required
  A text input to the model.
  Input item content list
  array
  Required
  A list of one or many input items to the model, containing different content types.
  Input text
  object
  A text input to the model.
  type
  string
  Required
  Defaults: input_text
  The type of the input item. Always input_text.
  input_text
  string
  text
  string
  Required
  The text input to the model.
  Input image
  object
  An image input to the model. Learn about image inputs.
  type
  string
  Required
  Defaults: input_image
  The type of the input item. Always input_image.
  input_image
  string
  image_url
  string or null
  image_url
  string
  The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.
  image_url
  null
  file_id
  string or null
  file_id
  string
  The ID of the file to be sent to the model.
  file_id
  null
  detail
  string
  Required
  The detail level of the image to be sent to the model. One of high, low, or auto. Defaults to auto.
  low
  string
  high
  string
  auto
  string
  Input file
  object
  A file input to the model.
  type
  string
  Required
  Defaults: input_file
  The type of the input item. Always input_file.
  input_file
  string
  file_id
  string or null
  file_id
  string
  The ID of the file to be sent to the model.
  file_id
  null
  filename
  string
  The name of the file to be sent to the model.
  file_data
  string
  The content of the file to be sent to the model.
  type
  string
  The type of the message input. Always message.
  message
  string
  Eval message object
  object
  A message input to the model with a role indicating instruction following hierarchy. Instructions given with the developer or system role take precedence over instructions given with the user role. Messages with the assistant role are presumed to have been generated by the model in previous interactions.
  role
  string
  Required
  The role of the message input. One of user, assistant, system, or developer.
  user
  string
  assistant
  string
  system
  string
  developer
  string
  content
  string or object
  Text inputs to the model - can contain template strings.
  Text input
  string
  Required
  A text input to the model.
  Input text
  object
  Required
  A text input to the model.
  type
  string
  Required
  Defaults: input_text
  The type of the input item. Always input_text.
  input_text
  string
  text
  string
  Required
  The text input to the model.
  Output text
  object
  Required
  A text output from the model.
  type
  string
  Required
  The type of the output text. Always output_text.
  output_text
  string
  text
  string
  Required
  The text output from the model.
  type
  string
  The type of the message input. Always message.
  message
  string
  ItemReferenceInputMessages
  object
  type
  string
  Required
  The type of input messages. Always item_reference.
  item_reference
  string
  item_reference
  string
  Required
  A reference to a variable in the "item" namespace. Ie, "item.name"
  sampling_params
  object
  temperature
  number
  Defaults: 1
  A higher temperature increases randomness in the outputs.
  max_completion_tokens
  integer
  The maximum number of tokens in the generated output.
  top_p
  number
  Defaults: 1
  An alternative to temperature for nucleus sampling; 1.0 includes all tokens.
  seed
  integer
  Defaults: 42
  A seed value to initialize the randomness, during sampling.
  model
  string
  The name of the model to use for generating completions (e.g. "o3-mini").
  source
  object
  EvalJsonlFileContentSource
  object
  Required
  type
  string
  Required
  Defaults: file_content
  The type of jsonl source. Always file_content.
  file_content
  string
  content
  array
  Required
  The content of the jsonl file.
  items
  object
  item
  object
  Required
  sample
  object
  EvalJsonlFileIdSource
  object
  Required
  type
  string
  Required
  Defaults: file_id
  The type of jsonl source. Always file_id.
  file_id
  string
  id
  string
  Required
  The identifier of the file.
  StoredCompletionsRunDataSource
  object
  Required
  A StoredCompletionsRunDataSource configuration describing a set of filters
  type
  string
  Required
  Defaults: stored_completions
  The type of source. Always stored_completions.
  stored_completions
  string
  metadata
  object or null
  Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
  model
  string or null
  An optional model to filter by (e.g., 'gpt-4o').
  created_after
  integer or null
  An optional Unix timestamp to filter items created after this time.
  created_before
  integer or null
  An optional Unix timestamp to filter items created before this time.
  limit
  integer or null
  An optional maximum number of items to return.
- ResponsesRunDataSource
  object
  Required
  A ResponsesRunDataSource object describing a model sampling configuration.
  type
  string
  Required
  Defaults: completions
  The type of run data source. Always completions.
  completions
  string
  input_messages
  object
  input_messages
  object
  type
  string
  Required
  The type of input messages. Always template.
  template
  string
  template
  array
  Required
  A list of chat messages forming the prompt or context. May include variable references to the "item" namespace, ie {{item.name}}.
  ChatMessage
  object
  role
  string
  Required
  The role of the message (e.g. "system", "assistant", "user").
  content
  string
  Required
  The content of the message.
  Eval message object
  object
  A message input to the model with a role indicating instruction following hierarchy. Instructions given with the developer or system role take precedence over instructions given with the user role. Messages with the assistant role are presumed to have been generated by the model in previous interactions.
  role
  string
  Required
  The role of the message input. One of user, assistant, system, or developer.
  user
  string
  assistant
  string
  system
  string
  developer
  string
  content
  string or object
  Text inputs to the model - can contain template strings.
  Text input
  string
  Required
  A text input to the model.
  Input text
  object
  Required
  A text input to the model.
  type
  string
  Required
  Defaults: input_text
  The type of the input item. Always input_text.
  input_text
  string
  text
  string
  Required
  The text input to the model.
  Output text
  object
  Required
  A text output from the model.
  type
  string
  Required
  The type of the output text. Always output_text.
  output_text
  string
  text
  string
  Required
  The text output from the model.
  type
  string
  The type of the message input. Always message.
  message
  string
  input_messages
  object
  type
  string
  Required
  The type of input messages. Always item_reference.
  item_reference
  string
  item_reference
  string
  Required
  A reference to a variable in the "item" namespace. Ie, "item.name"
  sampling_params
  object
  temperature
  number
  Defaults: 1
  A higher temperature increases randomness in the outputs.
  max_completion_tokens
  integer
  The maximum number of tokens in the generated output.
  top_p
  number
  Defaults: 1
  An alternative to temperature for nucleus sampling; 1.0 includes all tokens.
  seed
  integer
  Defaults: 42
  A seed value to initialize the randomness, during sampling.
  model
  string
  The name of the model to use for generating completions (e.g. "o3-mini").
  source
  object
  EvalJsonlFileContentSource
  object
  Required
  type
  string
  Required
  Defaults: file_content
  The type of jsonl source. Always file_content.
  file_content
  string
  content
  array
  Required
  The content of the jsonl file.
  items
  object
  item
  object
  Required
  sample
  object
  EvalJsonlFileIdSource
  object
  Required
  type
  string
  Required
  Defaults: file_id
  The type of jsonl source. Always file_id.
  file_id
  string
  id
  string
  Required
  The identifier of the file.
  EvalResponsesSource
  object
  Required
  A EvalResponsesSource object describing a run data source configuration.
  type
  string
  Required
  The type of run data source. Always responses.
  responses
  string
  metadata
  object or null
  Metadata filter for the responses. This is a query parameter used to select responses.
  model
  string or null
  The name of the model to find responses for. This is a query parameter used to select responses.
  instructions_search
  string or null
  Optional search string for instructions. This is a query parameter used to select responses.
  created_after
  integer or null
  Only include items created after this timestamp (inclusive). This is a query parameter used to select responses.
  created_before
  integer or null
  Only include items created before this timestamp (inclusive). This is a query parameter used to select responses.
  has_tool_calls
  boolean or null
  Whether the response has tool calls. This is a query parameter used to select responses.
  reasoning_effort
  string or null
  Defaults: medium
  o-series models only
  Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
  low
  string
  medium
  string
  high
  string
  temperature
  number or null
  Sampling temperature. This is a query parameter used to select responses.
  top_p
  number or null
  Nucleus sampling parameter. This is a query parameter used to select responses.
  users
  array or null
  List of user identifiers. This is a query parameter used to select responses.
  items
  string
  allow_parallel_tool_calls
  boolean or null
  Whether to allow parallel tool calls. This is a query parameter used to select responses.

Response

The EvalRun object matching the specified ID.

Example request

1curl https://api.openai.com/v1/evals/eval_67e579652b548190aaa83ada4b125f47/runs \
2  -X POST \
3  -H "Authorization: Bearer $OPENAI_API_KEY" \
4  -H "Content-Type: application/json" \
5  -d '{"name":"gpt-4o-mini","data_source":{"type":"completions","input_messages":{"type":"template","template":[{"role":"developer","content":"Categorize a given news headline into one of the following topics: Technology, Markets, World, Business, or Sports.\n\n# Steps\n\n1. Analyze the content of the news headline to understand its primary focus.\n2. Extract the subject matter, identifying any key indicators or keywords.\n3. Use the identified indicators to determine the most suitable category out of the five options: Technology, Markets, World, Business, or Sports.\n4. Ensure only one category is selected per headline.\n\n# Output Format\n\nRespond with the chosen category as a single word. For instance: \"Technology\", \"Markets\", \"World\", \"Business\", or \"Sports\".\n\n# Examples\n\n**Input**: \"Apple Unveils New iPhone Model, Featuring Advanced AI Features\"  \n**Output**: \"Technology\"\n\n**Input**: \"Global Stocks Mixed as Investors Await Central Bank Decisions\"  \n**Output**: \"Markets\"\n\n**Input**: \"War in Ukraine: Latest Updates on Negotiation Status\"  \n**Output**: \"World\"\n\n**Input**: \"Microsoft in Talks to Acquire Gaming Company for $2 Billion\"  \n**Output**: \"Business\"\n\n**Input**: \"Manchester United Secures Win in Premier League Football Match\"  \n**Output**: \"Sports\" \n\n# Notes\n\n- If the headline appears to fit into more than one category, choose the most dominant theme.\n- Keywords or phrases such as \"stocks\", \"company acquisition\", \"match\", or technological brands can be good indicators for classification.\n"} , {"role":"user","content":"{{item.input}}"}]},"sampling_params":{"temperature":1,"max_completions_tokens":2048,"top_p":1,"seed":42},"model":"gpt-4o-mini","source":{"type":"file_content","content":[{"item":{"input":"Tech Company Launches Advanced Artificial Intelligence Platform","ground_truth":"Technology"}}]}}'

Example response

1{
2  "object": "eval.run",
3  "id": "evalrun_67e57965b480819094274e3a32235e4c",
4  "eval_id": "eval_67e579652b548190aaa83ada4b125f47",
5  "report_url": "https://platform.openai.com/evaluations/eval_67e579652b548190aaa83ada4b125f47&run_id=evalrun_67e57965b480819094274e3a32235e4c",
6  "status": "queued",
7  "model": "gpt-4o-mini",
8  "name": "gpt-4o-mini",
9  "created_at": 1743092069,
10  "result_counts": {
11    "total": 0,
12    "errored": 0,
13    "failed": 0,
14    "passed": 0
15  },
16  "per_model_usage": null,
17  "per_testing_criteria_results": null,
18  "data_source": {
19    "type": "completions",
20    "source": {
21      "type": "file_content",
22      "content": [
23        {
24          "item": {
25            "input": "Tech Company Launches Advanced Artificial Intelligence Platform",
26            "ground_truth": "Technology"
27          }
28        }
29      ]
30    },
31    "input_messages": {
32      "type": "template",
33      "template": [
34        {
35          "type": "message",
36          "role": "developer",
37          "content": {
38            "type": "input_text",
39            "text": "Categorize a given news headline into one of the following topics: Technology, Markets, World, Business, or Sports.\n\n# Steps\n\n1. Analyze the content of the news headline to understand its primary focus.\n2. Extract the subject matter, identifying any key indicators or keywords.\n3. Use the identified indicators to determine the most suitable category out of the five options: Technology, Markets, World, Business, or Sports.\n4. Ensure only one category is selected per headline.\n\n# Output Format\n\nRespond with the chosen category as a single word. For instance: \"Technology\", \"Markets\", \"World\", \"Business\", or \"Sports\".\n\n# Examples\n\n**Input**: \"Apple Unveils New iPhone Model, Featuring Advanced AI Features\"  \n**Output**: \"Technology\"\n\n**Input**: \"Global Stocks Mixed as Investors Await Central Bank Decisions\"  \n**Output**: \"Markets\"\n\n**Input**: \"War in Ukraine: Latest Updates on Negotiation Status\"  \n**Output**: \"World\"\n\n**Input**: \"Microsoft in Talks to Acquire Gaming Company for $2 Billion\"  \n**Output**: \"Business\"\n\n**Input**: \"Manchester United Secures Win in Premier League Football Match\"  \n**Output**: \"Sports\" \n\n# Notes\n\n- If the headline appears to fit into more than one category, choose the most dominant theme.\n- Keywords or phrases such as \"stocks\", \"company acquisition\", \"match\", or technological brands can be good indicators for classification.\n"
40          }
41        },
42        {
43          "type": "message",
44          "role": "user",
45          "content": {
46            "type": "input_text",
47            "text": "{{item.input}}"
48          }
49        }
50      ]
51    },
52    "model": "gpt-4o-mini",
53    "sampling_params": {
54      "seed": 42,
55      "temperature": 1.0,
56      "top_p": 1.0,
57      "max_completions_tokens": 2048
58    }
59  },
60  "error": null,
61  "metadata": {}
62}

Built with