Create eval run

POSThttps:/api.openai.com/v1/evals/{eval_id}/runs

Create a new evaluation run. This is the endpoint that will kick off grading.

Path parameters

  • eval_id
    string
    Required
    The ID of the evaluation to create a run for.

Request body

  • name
    string
    The name of the run.
  • metadata
    object or null
    Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
  • data_source
    object
    Details about the run's data source.
    • JsonlRunDataSource
      object
      Required
      A JsonlRunDataSource object with that specifies a JSONL file that matches the eval
      • type
        string
        Required
        Defaults: jsonl

        The type of data source. Always jsonl.

        • jsonl
          string
      • source
        object
        • EvalJsonlFileContentSource
          object
          Required
          • type
            string
            Required
            Defaults: file_content

            The type of jsonl source. Always file_content.

            • file_content
              string
          • content
            array
            Required
            The content of the jsonl file.
            • items
              object
              • item
                object
                Required
              • sample
                object
        • EvalJsonlFileIdSource
          object
          Required
          • type
            string
            Required
            Defaults: file_id

            The type of jsonl source. Always file_id.

            • file_id
              string
          • id
            string
            Required
            The identifier of the file.
    • CompletionsRunDataSource
      object
      Required
      A CompletionsRunDataSource object describing a model sampling configuration.
      • type
        string
        Required
        Defaults: completions

        The type of run data source. Always completions.

        • completions
          string
      • input_messages
        object
        • TemplateInputMessages
          object
          • type
            string
            Required

            The type of input messages. Always template.

            • template
              string
          • template
            array
            Required
            A list of chat messages forming the prompt or context. May include variable references to the "item" namespace, ie {{item.name}}.
            • Input message
              object

              A message input to the model with a role indicating instruction following hierarchy. Instructions given with the developer or system role take precedence over instructions given with the user role. Messages with the assistant role are presumed to have been generated by the model in previous interactions.

              • role
                string
                Required

                The role of the message input. One of user, assistant, system, or developer.

                • user
                  string
                • assistant
                  string
                • system
                  string
                • developer
                  string
              • content
                string or array
                Text, image, or audio input to the model, used to generate a response. Can also contain previous assistant responses.
                • Text input
                  string
                  Required
                  A text input to the model.
                • Input item content list
                  array
                  Required
                  A list of one or many input items to the model, containing different content types.
                  • Input text
                    object
                    A text input to the model.
                    • type
                      string
                      Required
                      Defaults: input_text

                      The type of the input item. Always input_text.

                      • input_text
                        string
                    • text
                      string
                      Required
                      The text input to the model.
                  • Input image
                    object

                    An image input to the model. Learn about image inputs.

                    • type
                      string
                      Required
                      Defaults: input_image

                      The type of the input item. Always input_image.

                      • input_image
                        string
                    • image_url
                      string or null
                      • image_url
                        string
                        The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.
                      • image_url
                        null
                    • file_id
                      string or null
                      • file_id
                        string
                        The ID of the file to be sent to the model.
                      • file_id
                        null
                    • detail
                      string
                      Required

                      The detail level of the image to be sent to the model. One of high, low, or auto. Defaults to auto.

                      • low
                        string
                      • high
                        string
                      • auto
                        string
                  • Input file
                    object
                    A file input to the model.
                    • type
                      string
                      Required
                      Defaults: input_file

                      The type of the input item. Always input_file.

                      • input_file
                        string
                    • file_id
                      string or null
                      • file_id
                        string
                        The ID of the file to be sent to the model.
                      • file_id
                        null
                    • filename
                      string
                      The name of the file to be sent to the model.
                    • file_data
                      string
                      The content of the file to be sent to the model.
              • type
                string

                The type of the message input. Always message.

                • message
                  string
            • Eval message object
              object

              A message input to the model with a role indicating instruction following hierarchy. Instructions given with the developer or system role take precedence over instructions given with the user role. Messages with the assistant role are presumed to have been generated by the model in previous interactions.

              • role
                string
                Required

                The role of the message input. One of user, assistant, system, or developer.

                • user
                  string
                • assistant
                  string
                • system
                  string
                • developer
                  string
              • content
                string or object
                Text inputs to the model - can contain template strings.
                • Text input
                  string
                  Required
                  A text input to the model.
                • Input text
                  object
                  Required
                  A text input to the model.
                  • type
                    string
                    Required
                    Defaults: input_text

                    The type of the input item. Always input_text.

                    • input_text
                      string
                  • text
                    string
                    Required
                    The text input to the model.
                • Output text
                  object
                  Required
                  A text output from the model.
                  • type
                    string
                    Required

                    The type of the output text. Always output_text.

                    • output_text
                      string
                  • text
                    string
                    Required
                    The text output from the model.
              • type
                string

                The type of the message input. Always message.

                • message
                  string
        • ItemReferenceInputMessages
          object
          • type
            string
            Required

            The type of input messages. Always item_reference.

            • item_reference
              string
          • item_reference
            string
            Required
            A reference to a variable in the "item" namespace. Ie, "item.name"
      • sampling_params
        object
        • temperature
          number
          Defaults: 1
          A higher temperature increases randomness in the outputs.
        • max_completion_tokens
          integer
          The maximum number of tokens in the generated output.
        • top_p
          number
          Defaults: 1
          An alternative to temperature for nucleus sampling; 1.0 includes all tokens.
        • seed
          integer
          Defaults: 42
          A seed value to initialize the randomness, during sampling.
      • model
        string
        The name of the model to use for generating completions (e.g. "o3-mini").
      • source
        object
        • EvalJsonlFileContentSource
          object
          Required
          • type
            string
            Required
            Defaults: file_content

            The type of jsonl source. Always file_content.

            • file_content
              string
          • content
            array
            Required
            The content of the jsonl file.
            • items
              object
              • item
                object
                Required
              • sample
                object
        • EvalJsonlFileIdSource
          object
          Required
          • type
            string
            Required
            Defaults: file_id

            The type of jsonl source. Always file_id.

            • file_id
              string
          • id
            string
            Required
            The identifier of the file.
        • StoredCompletionsRunDataSource
          object
          Required
          A StoredCompletionsRunDataSource configuration describing a set of filters
          • type
            string
            Required
            Defaults: stored_completions

            The type of source. Always stored_completions.

            • stored_completions
              string
          • metadata
            object or null
            Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
          • model
            string or null
            An optional model to filter by (e.g., 'gpt-4o').
          • created_after
            integer or null
            An optional Unix timestamp to filter items created after this time.
          • created_before
            integer or null
            An optional Unix timestamp to filter items created before this time.
          • limit
            integer or null
            An optional maximum number of items to return.
    • ResponsesRunDataSource
      object
      Required
      A ResponsesRunDataSource object describing a model sampling configuration.
      • type
        string
        Required
        Defaults: completions

        The type of run data source. Always completions.

        • completions
          string
      • input_messages
        object
        • input_messages
          object
          • type
            string
            Required

            The type of input messages. Always template.

            • template
              string
          • template
            array
            Required
            A list of chat messages forming the prompt or context. May include variable references to the "item" namespace, ie {{item.name}}.
            • ChatMessage
              object
              • role
                string
                Required
                The role of the message (e.g. "system", "assistant", "user").
              • content
                string
                Required
                The content of the message.
            • Eval message object
              object

              A message input to the model with a role indicating instruction following hierarchy. Instructions given with the developer or system role take precedence over instructions given with the user role. Messages with the assistant role are presumed to have been generated by the model in previous interactions.

              • role
                string
                Required

                The role of the message input. One of user, assistant, system, or developer.

                • user
                  string
                • assistant
                  string
                • system
                  string
                • developer
                  string
              • content
                string or object
                Text inputs to the model - can contain template strings.
                • Text input
                  string
                  Required
                  A text input to the model.
                • Input text
                  object
                  Required
                  A text input to the model.
                  • type
                    string
                    Required
                    Defaults: input_text

                    The type of the input item. Always input_text.

                    • input_text
                      string
                  • text
                    string
                    Required
                    The text input to the model.
                • Output text
                  object
                  Required
                  A text output from the model.
                  • type
                    string
                    Required

                    The type of the output text. Always output_text.

                    • output_text
                      string
                  • text
                    string
                    Required
                    The text output from the model.
              • type
                string

                The type of the message input. Always message.

                • message
                  string
        • input_messages
          object
          • type
            string
            Required

            The type of input messages. Always item_reference.

            • item_reference
              string
          • item_reference
            string
            Required
            A reference to a variable in the "item" namespace. Ie, "item.name"
      • sampling_params
        object
        • temperature
          number
          Defaults: 1
          A higher temperature increases randomness in the outputs.
        • max_completion_tokens
          integer
          The maximum number of tokens in the generated output.
        • top_p
          number
          Defaults: 1
          An alternative to temperature for nucleus sampling; 1.0 includes all tokens.
        • seed
          integer
          Defaults: 42
          A seed value to initialize the randomness, during sampling.
      • model
        string
        The name of the model to use for generating completions (e.g. "o3-mini").
      • source
        object
        • EvalJsonlFileContentSource
          object
          Required
          • type
            string
            Required
            Defaults: file_content

            The type of jsonl source. Always file_content.

            • file_content
              string
          • content
            array
            Required
            The content of the jsonl file.
            • items
              object
              • item
                object
                Required
              • sample
                object
        • EvalJsonlFileIdSource
          object
          Required
          • type
            string
            Required
            Defaults: file_id

            The type of jsonl source. Always file_id.

            • file_id
              string
          • id
            string
            Required
            The identifier of the file.
        • EvalResponsesSource
          object
          Required
          A EvalResponsesSource object describing a run data source configuration.
          • type
            string
            Required

            The type of run data source. Always responses.

            • responses
              string
          • metadata
            object or null
            Metadata filter for the responses. This is a query parameter used to select responses.
          • model
            string or null
            The name of the model to find responses for. This is a query parameter used to select responses.
          • instructions_search
            string or null
            Optional search string for instructions. This is a query parameter used to select responses.
          • created_after
            integer or null
            Only include items created after this timestamp (inclusive). This is a query parameter used to select responses.
          • created_before
            integer or null
            Only include items created before this timestamp (inclusive). This is a query parameter used to select responses.
          • has_tool_calls
            boolean or null
            Whether the response has tool calls. This is a query parameter used to select responses.
          • reasoning_effort
            string or null
            Defaults: medium

            o-series models only

            Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

            • low
              string
            • medium
              string
            • high
              string
          • temperature
            number or null
            Sampling temperature. This is a query parameter used to select responses.
          • top_p
            number or null
            Nucleus sampling parameter. This is a query parameter used to select responses.
          • users
            array or null
            List of user identifiers. This is a query parameter used to select responses.
            • items
              string
          • allow_parallel_tool_calls
            boolean or null
            Whether to allow parallel tool calls. This is a query parameter used to select responses.

Response

The EvalRun object matching the specified ID.

Example request
1
curl https://api.openai.com/v1/evals/eval_67e579652b548190aaa83ada4b125f47/runs \
2
-X POST \
3
-H "Authorization: Bearer $OPENAI_API_KEY" \
4
-H "Content-Type: application/json" \
5
-d '{"name":"gpt-4o-mini","data_source":{"type":"completions","input_messages":{"type":"template","template":[{"role":"developer","content":"Categorize a given news headline into one of the following topics: Technology, Markets, World, Business, or Sports.\n\n# Steps\n\n1. Analyze the content of the news headline to understand its primary focus.\n2. Extract the subject matter, identifying any key indicators or keywords.\n3. Use the identified indicators to determine the most suitable category out of the five options: Technology, Markets, World, Business, or Sports.\n4. Ensure only one category is selected per headline.\n\n# Output Format\n\nRespond with the chosen category as a single word. For instance: \"Technology\", \"Markets\", \"World\", \"Business\", or \"Sports\".\n\n# Examples\n\n**Input**: \"Apple Unveils New iPhone Model, Featuring Advanced AI Features\" \n**Output**: \"Technology\"\n\n**Input**: \"Global Stocks Mixed as Investors Await Central Bank Decisions\" \n**Output**: \"Markets\"\n\n**Input**: \"War in Ukraine: Latest Updates on Negotiation Status\" \n**Output**: \"World\"\n\n**Input**: \"Microsoft in Talks to Acquire Gaming Company for $2 Billion\" \n**Output**: \"Business\"\n\n**Input**: \"Manchester United Secures Win in Premier League Football Match\" \n**Output**: \"Sports\" \n\n# Notes\n\n- If the headline appears to fit into more than one category, choose the most dominant theme.\n- Keywords or phrases such as \"stocks\", \"company acquisition\", \"match\", or technological brands can be good indicators for classification.\n"} , {"role":"user","content":"{{item.input}}"}]},"sampling_params":{"temperature":1,"max_completions_tokens":2048,"top_p":1,"seed":42},"model":"gpt-4o-mini","source":{"type":"file_content","content":[{"item":{"input":"Tech Company Launches Advanced Artificial Intelligence Platform","ground_truth":"Technology"}}]}}'
Example response
1
{
2
"object": "eval.run",
3
"id": "evalrun_67e57965b480819094274e3a32235e4c",
4
"eval_id": "eval_67e579652b548190aaa83ada4b125f47",
5
"report_url": "https://platform.openai.com/evaluations/eval_67e579652b548190aaa83ada4b125f47&run_id=evalrun_67e57965b480819094274e3a32235e4c",
6
"status": "queued",
7
"model": "gpt-4o-mini",
8
"name": "gpt-4o-mini",
9
"created_at": 1743092069,
10
"result_counts": {
11
"total": 0,
12
"errored": 0,
13
"failed": 0,
14
"passed": 0
15
},
16
"per_model_usage": null,
17
"per_testing_criteria_results": null,
18
"data_source": {
19
"type": "completions",
20
"source": {
21
"type": "file_content",
22
"content": [
23
{
24
"item": {
25
"input": "Tech Company Launches Advanced Artificial Intelligence Platform",
26
"ground_truth": "Technology"
27
}
28
}
29
]
30
},
31
"input_messages": {
32
"type": "template",
33
"template": [
34
{
35
"type": "message",
36
"role": "developer",
37
"content": {
38
"type": "input_text",
39
"text": "Categorize a given news headline into one of the following topics: Technology, Markets, World, Business, or Sports.\n\n# Steps\n\n1. Analyze the content of the news headline to understand its primary focus.\n2. Extract the subject matter, identifying any key indicators or keywords.\n3. Use the identified indicators to determine the most suitable category out of the five options: Technology, Markets, World, Business, or Sports.\n4. Ensure only one category is selected per headline.\n\n# Output Format\n\nRespond with the chosen category as a single word. For instance: \"Technology\", \"Markets\", \"World\", \"Business\", or \"Sports\".\n\n# Examples\n\n**Input**: \"Apple Unveils New iPhone Model, Featuring Advanced AI Features\" \n**Output**: \"Technology\"\n\n**Input**: \"Global Stocks Mixed as Investors Await Central Bank Decisions\" \n**Output**: \"Markets\"\n\n**Input**: \"War in Ukraine: Latest Updates on Negotiation Status\" \n**Output**: \"World\"\n\n**Input**: \"Microsoft in Talks to Acquire Gaming Company for $2 Billion\" \n**Output**: \"Business\"\n\n**Input**: \"Manchester United Secures Win in Premier League Football Match\" \n**Output**: \"Sports\" \n\n# Notes\n\n- If the headline appears to fit into more than one category, choose the most dominant theme.\n- Keywords or phrases such as \"stocks\", \"company acquisition\", \"match\", or technological brands can be good indicators for classification.\n"
40
}
41
},
42
{
43
"type": "message",
44
"role": "user",
45
"content": {
46
"type": "input_text",
47
"text": "{{item.input}}"
48
}
49
}
50
]
51
},
52
"model": "gpt-4o-mini",
53
"sampling_params": {
54
"seed": 42,
55
"temperature": 1.0,
56
"top_p": 1.0,
57
"max_completions_tokens": 2048
58
}
59
},
60
"error": null,
61
"metadata": {}
62
}
Built with