Getting Started
Spaces Guide
Tools Guide
Developer Guide
API REFERENCE
- Spaces
- Data
- Chat
- Assistants
- Documents
PULZE ACADEMY
Get Conversation By Id
Retrieve a Conversation by ID. Including all Requests
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
Response
The ID of the app that performed the request
The Auth0 ID of the user that performed the request
The response object
The fully qualified model name used by PulzeEngine
The type of response object
text_completion
, chat.completion
The Response contains a list of choices. The role: *.message.role, and the content: *.message.content or, for text completions, the *.text attribute.
user
, assistant
, system
If tool_choice
parameter was passed, and the model supports it, this will return a list of the different calls/functions used.
The reason the model stopped generating tokens. Possible values: "stop", "length", "content_filter", "function_call", "max_tokens", ...
Creation timestamp -- in milliseconds (!)
Metadata of the response
The ID of the app this request belongs to
The model used in the request
The name of the model. Can belong to many providers
The fully qualified (namespaced) model name
The provider for the model.
The owner of the model. Sometimes, for a provider/model combination, many instances exist, trained on different data
Extra model settings inferred from namespace
Generated artifacts
Generated artifacts
Search results
Search results
The time it took for the Provider to return a response
Custom labels (metadata) sent along in the request
If an error occurs, it will be stored here
The score for the currently used LLM
Temperature used for the request
Maximum number of tokens that can be used in the request+response.Leave empty to make it automatic, and set to -1
to use the maximum context size (model-dependent)
x > -1
Status code of the response
The number of retries needed to get the answer. null
or 0
means no retries were required
Extra data
Show a warning -- deprecation messages, etc.
This ID gets generated by the database when we save the request
The timestamp of the request, in milliseconds
ID of the request
The rating given to this request. It can be good (True
), bad (False
) or none (None
== null
)
An optional text with accompanies the feedback's rating
Number of tokens the request used
Number of tokens the response used
Number of tokens of (request + response)
Cost (in $) of the prompt
Cost (in $) of the response
Cost (in $) of the (request + response)
Cost (in $) saved in the prompt costs comparison to the benchmark model
Cost (in $) saved in the completion costs comparison to the benchmark model
Cost (in $) saved in total, in comparison to the benchmark model
When a request requires multiple intermediate calls, they are stored as 'no costs incurred' -- that way we can store the costs, but don't charge the user
The name of the provider's model which was used to answer the request
The payload sent with the request
https://docs.pulze.ai/overview/models Specify the model you'd like Pulze to use. (optional). Can be the full model name, or a subset for multi-matching.
Defaults to our dynamic routing, i.e. best model for this request.
The maximum number of tokens that the response can contain.
Optionally specify the temperature for this request only. Leave empty to allow Pulze to guess it for you.
0 < x < 1
https://octo.ai/docs/text-gen-solution/rest-api#input-parameters A value between 0.0 and 1.0 that controls the probability of the model generating a particular token.
A unique name for the function
A JSON schema object with the parameter definitons
A description of the function. Might help the LLM to figure out the structure and parameters
function
none
, auto
How many completions to generate for each prompt. @default 1
x > 1
Specify if you want the response to be streamed or to be returned as a standard HTTP request. Currently we only support streaming for OpenAI-compatible models.
COMING SOON https://platform.openai.com/docs/api-reference/completions/create#completions/create-logprobs Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens.
0 < x < 5
Stop responding when this sequence of characters is generated. Leave empty to allow the model to decide.
https://platform.openai.com/docs/api-reference/completions/create#completions/create-presence_penalty Increase the model's likelihood to talk about new topics
-2 < x < 2
https://platform.openai.com/docs/api-reference/completions/create#completions/create-frequency_penalty Increase the model's likelihood to not repeat tokens/words
-2 < x < 2
The number of responses to generate. Out of those, it will return the best n
.
x > 1
COMING SOON https://platform.openai.com/docs/api-reference/completions/create#completions/create-logit_bias Modify the likelihood of specified tokens appearing in the completion.
See here for a detailed explanation on how to use: https://help.openai.com/en/articles/5247780-using-logit-bias-to-define-token-probability
https://platform.openai.com/docs/api-reference/chat/create#chat-create-response_format An object specifying the format that the model must output. Must be one of "text" or "json_object". Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. To help ensure you don't forget, the API will throw an error if the string "JSON" does not appear somewhere in the context.
text
, json_object
user
, assistant
, system
If tool_choice
parameter was passed, and the model supports it, this will return a list of the different calls/functions used.
The Assistant ID to use for this request
The Assistant version ID to use for this request
The Conversation ID to use for this request
The Parent Request ID to use for this request
The list of plugins to enable for the request
Images to be analyzed
A list of file urls that should be included. Images, audio, etc.
Custom overrides for this request, sent as HTTP headers
Custom labeling of this request. If None
, no labels will be stored.
HTTP Headers, without the custom pulze-*
headers
Overrides the weights of the app for this request.
Prioritizes cost when selecting the most optimized models for your use case.
Prioritizes latency and reduces the time delay between submitting a request and receiving the response.
Prioritizes the quality and readability of the generated responses.
Overrides the policies of the app for this request. See LLMModelPolicies
for more info.
The level of privacy for a given request
0 = (UNSUPPORTED -- public logs)
1 = Log request, response and all of its metadata (Normal mode)
2 = Do not log neither the request prompt nor the response text. Logs are still visible, and all of the request metadata accessible. Retrievable as a log. (TBD)
3 = Do not log at all. Internally, a minimal representation may be stored for billing: model name, tokens used, which app it belongs to, and timestamp. Not retrievable as a log. (TBD)
1
, 2
, 3
The maximum cost allowed for a request. Only works with compounded requests that require multiple LLM calls. If the value is reached, it will exit with an exception.
x > 0.0001
If an LLM call fails, how many other models should Pulze try, chosen by quality descending? It will be a maximum of N+1 models (original + N other models)
0 < x < 5
If an LLM call fails, how many times should Pulze retry the call to the same LLM? There will be a maximum of N+1 calls (original + N retries)
0 < x < 3
Optimize the internal / intermediate LLM requests, for a big gain in speed and cost savings, at the cost of a potential, and very slight, penalty on quality. The final request ("SYNTHESIZE") is always performed using your original settings.
0
, 1
Prompt ID that we will use for requests
Feature flags for this request
Whether to include citations to data sources in the response
Whether to automatically select the appropriate tool to aid in generation
Learn from liked responses
How much is logged? 1: everything, 2: mask request+response (but show log), 3: Not visible, not retrievable, no information stored.
1
, 2
, 3
The prompt in text format
The type of request (text completion or chat) the user sends and expects back
completions
, chat_completions
The response in text format
The status code of the request to the AI model
True if the request was performed from a sandbox app
When the request was performed
Time it took for the LLM to respond
Reference to the ID of the parent of this log. A log has a parent when it's a subrequest used to retrieve the final answer.
The parent of the Request, if any. Requests which are part of a series of sub-requests (like multiple LLM calls, or RAG) will have the final, resulting Log as parent.
The ID of the app that performed the request
The Auth0 ID of the user that performed the request
The response object
The fully qualified model name used by PulzeEngine
The type of response object
text_completion
, chat.completion
The Response contains a list of choices. The role: *.message.role, and the content: *.message.content or, for text completions, the *.text attribute.
The reason the model stopped generating tokens. Possible values: "stop", "length", "content_filter", "function_call", "max_tokens", ...
Creation timestamp -- in milliseconds (!)
Metadata of the response
The ID of the app this request belongs to
The model used in the request
Cost (in $) of the request
Price difference -- compared with GPT-4
Generated artifacts
Search results
The time it took for the Provider to return a response
Custom labels (metadata) sent along in the request
If an error occurs, it will be stored here
A ranking of the best models for a given request
The score for the currently used LLM
Temperature used for the request
Maximum number of tokens that can be used in the request+response.Leave empty to make it automatic, and set to -1
to use the maximum context size (model-dependent)
x > -1
Status code of the response
The number of retries needed to get the answer. null
or 0
means no retries were required
Extra data
Show a warning -- deprecation messages, etc.
This ID gets generated by the database when we save the request
The timestamp of the request, in milliseconds
ID of the request
The rating given to this request. It can be good (True
), bad (False
) or none (None
== null
)
An optional text with accompanies the feedback's rating
Number of tokens the request used
Number of tokens the response used
Number of tokens of (request + response)
Cost (in $) of the prompt
Cost (in $) of the response
Cost (in $) of the (request + response)
Cost (in $) saved in the prompt costs comparison to the benchmark model
Cost (in $) saved in the completion costs comparison to the benchmark model
Cost (in $) saved in total, in comparison to the benchmark model
When a request requires multiple intermediate calls, they are stored as 'no costs incurred' -- that way we can store the costs, but don't charge the user
The name of the provider's model which was used to answer the request
The payload sent with the request
https://docs.pulze.ai/overview/models Specify the model you'd like Pulze to use. (optional). Can be the full model name, or a subset for multi-matching.
Defaults to our dynamic routing, i.e. best model for this request.
The maximum number of tokens that the response can contain.
Optionally specify the temperature for this request only. Leave empty to allow Pulze to guess it for you.
0 < x < 1
https://octo.ai/docs/text-gen-solution/rest-api#input-parameters A value between 0.0 and 1.0 that controls the probability of the model generating a particular token.
none
, auto
How many completions to generate for each prompt. @default 1
x > 1
Specify if you want the response to be streamed or to be returned as a standard HTTP request. Currently we only support streaming for OpenAI-compatible models.
COMING SOON https://platform.openai.com/docs/api-reference/completions/create#completions/create-logprobs Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens.
0 < x < 5
Stop responding when this sequence of characters is generated. Leave empty to allow the model to decide.
https://platform.openai.com/docs/api-reference/completions/create#completions/create-presence_penalty Increase the model's likelihood to talk about new topics
-2 < x < 2
https://platform.openai.com/docs/api-reference/completions/create#completions/create-frequency_penalty Increase the model's likelihood to not repeat tokens/words
-2 < x < 2
The number of responses to generate. Out of those, it will return the best n
.
x > 1
COMING SOON https://platform.openai.com/docs/api-reference/completions/create#completions/create-logit_bias Modify the likelihood of specified tokens appearing in the completion.
See here for a detailed explanation on how to use: https://help.openai.com/en/articles/5247780-using-logit-bias-to-define-token-probability
https://platform.openai.com/docs/api-reference/chat/create#chat-create-response_format An object specifying the format that the model must output. Must be one of "text" or "json_object". Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. To help ensure you don't forget, the API will throw an error if the string "JSON" does not appear somewhere in the context.
text
, json_object
user
, assistant
, system
If tool_choice
parameter was passed, and the model supports it, this will return a list of the different calls/functions used.
The Assistant ID to use for this request
The Assistant version ID to use for this request
The Conversation ID to use for this request
The Parent Request ID to use for this request
The list of plugins to enable for the request
Images to be analyzed
A list of file urls that should be included. Images, audio, etc.
Custom overrides for this request, sent as HTTP headers
Custom labeling of this request. If None
, no labels will be stored.
HTTP Headers, without the custom pulze-*
headers
Overrides the weights of the app for this request.
Overrides the policies of the app for this request. See LLMModelPolicies
for more info.
Feature flags for this request
How much is logged? 1: everything, 2: mask request+response (but show log), 3: Not visible, not retrievable, no information stored.
1
, 2
, 3
The prompt in text format
The type of request (text completion or chat) the user sends and expects back
completions
, chat_completions
The response in text format
The status code of the request to the AI model
True if the request was performed from a sandbox app
When the request was performed
Time it took for the LLM to respond
Reference to the ID of the parent of this log. A log has a parent when it's a subrequest used to retrieve the final answer.
The parent of the Request, if any. Requests which are part of a series of sub-requests (like multiple LLM calls, or RAG) will have the final, resulting Log as parent.
The ID of the app that performed the request
The Auth0 ID of the user that performed the request
The response object
The fully qualified model name used by PulzeEngine
The type of response object
text_completion
, chat.completion
The Response contains a list of choices. The role: *.message.role, and the content: *.message.content or, for text completions, the *.text attribute.
Creation timestamp -- in milliseconds (!)
Metadata of the response
This ID gets generated by the database when we save the request
Tokens used
The timestamp of the request, in milliseconds
ID of the request
The rating given to this request. It can be good (True
), bad (False
) or none (None
== null
)
An optional text with accompanies the feedback's rating
Number of tokens the request used
Number of tokens the response used
Number of tokens of (request + response)
Cost (in $) of the prompt
Cost (in $) of the response
Cost (in $) of the (request + response)
Cost (in $) saved in the prompt costs comparison to the benchmark model
Cost (in $) saved in the completion costs comparison to the benchmark model
Cost (in $) saved in total, in comparison to the benchmark model
When a request requires multiple intermediate calls, they are stored as 'no costs incurred' -- that way we can store the costs, but don't charge the user
The name of the provider's model which was used to answer the request
The payload sent with the request
https://docs.pulze.ai/overview/models Specify the model you'd like Pulze to use. (optional). Can be the full model name, or a subset for multi-matching.
Defaults to our dynamic routing, i.e. best model for this request.
The maximum number of tokens that the response can contain.
Optionally specify the temperature for this request only. Leave empty to allow Pulze to guess it for you.
0 < x < 1
https://octo.ai/docs/text-gen-solution/rest-api#input-parameters A value between 0.0 and 1.0 that controls the probability of the model generating a particular token.
none
, auto
How many completions to generate for each prompt. @default 1
x > 1
Specify if you want the response to be streamed or to be returned as a standard HTTP request. Currently we only support streaming for OpenAI-compatible models.
COMING SOON https://platform.openai.com/docs/api-reference/completions/create#completions/create-logprobs Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens.
0 < x < 5
Stop responding when this sequence of characters is generated. Leave empty to allow the model to decide.
https://platform.openai.com/docs/api-reference/completions/create#completions/create-presence_penalty Increase the model's likelihood to talk about new topics
-2 < x < 2
https://platform.openai.com/docs/api-reference/completions/create#completions/create-frequency_penalty Increase the model's likelihood to not repeat tokens/words
-2 < x < 2
The number of responses to generate. Out of those, it will return the best n
.
x > 1
COMING SOON https://platform.openai.com/docs/api-reference/completions/create#completions/create-logit_bias Modify the likelihood of specified tokens appearing in the completion.
See here for a detailed explanation on how to use: https://help.openai.com/en/articles/5247780-using-logit-bias-to-define-token-probability
https://platform.openai.com/docs/api-reference/chat/create#chat-create-response_format An object specifying the format that the model must output. Must be one of "text" or "json_object". Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. To help ensure you don't forget, the API will throw an error if the string "JSON" does not appear somewhere in the context.
The Assistant ID to use for this request
The Assistant version ID to use for this request
The Conversation ID to use for this request
The Parent Request ID to use for this request
The list of plugins to enable for the request
Images to be analyzed
A list of file urls that should be included. Images, audio, etc.
Custom overrides for this request, sent as HTTP headers
How much is logged? 1: everything, 2: mask request+response (but show log), 3: Not visible, not retrievable, no information stored.
1
, 2
, 3
The prompt in text format
The type of request (text completion or chat) the user sends and expects back
completions
, chat_completions
The response in text format
The status code of the request to the AI model
True if the request was performed from a sandbox app
When the request was performed
Time it took for the LLM to respond
Reference to the ID of the parent of this log. A log has a parent when it's a subrequest used to retrieve the final answer.
The parent of the Request, if any. Requests which are part of a series of sub-requests (like multiple LLM calls, or RAG) will have the final, resulting Log as parent.
The ID of the app that performed the request
The Auth0 ID of the user that performed the request
The response object
The timestamp of the request, in milliseconds
ID of the request
The rating given to this request. It can be good (True
), bad (False
) or none (None
== null
)
An optional text with accompanies the feedback's rating
Number of tokens the request used
Number of tokens the response used
Number of tokens of (request + response)
Cost (in $) of the prompt
Cost (in $) of the response
Cost (in $) of the (request + response)
Cost (in $) saved in the prompt costs comparison to the benchmark model
Cost (in $) saved in the completion costs comparison to the benchmark model
Cost (in $) saved in total, in comparison to the benchmark model
When a request requires multiple intermediate calls, they are stored as 'no costs incurred' -- that way we can store the costs, but don't charge the user
The name of the provider's model which was used to answer the request
The payload sent with the request
How much is logged? 1: everything, 2: mask request+response (but show log), 3: Not visible, not retrievable, no information stored.
1
, 2
, 3
The prompt in text format
The type of request (text completion or chat) the user sends and expects back
completions
, chat_completions
The response in text format
The status code of the request to the AI model
True if the request was performed from a sandbox app
When the request was performed
Time it took for the LLM to respond
Reference to the ID of the parent of this log. A log has a parent when it's a subrequest used to retrieve the final answer.
The parent of the Request, if any. Requests which are part of a series of sub-requests (like multiple LLM calls, or RAG) will have the final, resulting Log as parent.
The children of the Request. Will equal None unless you use eager loading in the query
The model this request used. Optional beause it's not always populated
The collections this request is associated with
The children of the Request. Will equal None unless you use eager loading in the query
The ID of the app that performed the request
The Auth0 ID of the user that performed the request
The response object
The timestamp of the request, in milliseconds
ID of the request
The rating given to this request. It can be good (True
), bad (False
) or none (None
== null
)
An optional text with accompanies the feedback's rating
Number of tokens the request used
Number of tokens the response used
Number of tokens of (request + response)
Cost (in $) of the prompt
Cost (in $) of the response
Cost (in $) of the (request + response)
Cost (in $) saved in the prompt costs comparison to the benchmark model
Cost (in $) saved in the completion costs comparison to the benchmark model
Cost (in $) saved in total, in comparison to the benchmark model
When a request requires multiple intermediate calls, they are stored as 'no costs incurred' -- that way we can store the costs, but don't charge the user
The name of the provider's model which was used to answer the request
The payload sent with the request
How much is logged? 1: everything, 2: mask request+response (but show log), 3: Not visible, not retrievable, no information stored.
1
, 2
, 3
The prompt in text format
The type of request (text completion or chat) the user sends and expects back
completions
, chat_completions
The response in text format
The status code of the request to the AI model
True if the request was performed from a sandbox app
When the request was performed
Time it took for the LLM to respond
Reference to the ID of the parent of this log. A log has a parent when it's a subrequest used to retrieve the final answer.
The parent of the Request, if any. Requests which are part of a series of sub-requests (like multiple LLM calls, or RAG) will have the final, resulting Log as parent.
The children of the Request. Will equal None unless you use eager loading in the query
The model this request used. Optional beause it's not always populated
The collections this request is associated with
The model this request used. Optional beause it's not always populated
True if the model supports function
/tool
call
True if the model supports json
-formatted responses
True if the model supports n
and best_of
-- i.e, multiple responses
True if the model supports frequency_penalty
and presence_penalty
True if the model supports streaming responses
True if the model supports image recognition (vision)
The cost of a completion token, in USD
The cost of a prompt token, in USD
A (usually 0) cost added on top of a request. Some models charge per request, not only per token
The unit of billing for this model
tokens
, characters
The name of the model. Can belong to many providers
The fully qualified (namespaced) model name
The max_tokens for this model
The most recent data this model has been trained with
A description of the model
A URL to the model's page or more informatino
Whether it's rag-tuned or not
Whether it's fine-tuned or not
True if the model is open source
True if the model complies with GDPR
True if the model is of type Chat Completions, False if it's a Text Completion model.
The ID of this model
The app_id that has access to this model (if only one)
The org_id that has acccess to this model
The user (auth0_id) who created the model
When the model was added. Auto-populated in DB
When the model was updated. Auto-populated in DB
True if the model is publicly accessible to all
Test models are only used for testing and do not perform any LLM requests
Model has been created and shared by Pulze
This determines if the model will be available + pre-selected when users create new apps.
The provider for the model.
The owner of the model. Sometimes, for a provider/model combination, many instances exist, trained on different data
Extra model settings inferred from namespace
Store the name of the model the API requires
For models whose deprecation date is known (past or future), to show errors and deny service, or show warnings
The ID of parent, in case it's not a base model
The ID of prompt, used for this model
The children of the Request. Will equal None unless you use eager loading in the query
The ID of the app that performed the request
The Auth0 ID of the user that performed the request
The response object
The fully qualified model name used by PulzeEngine
The type of response object
text_completion
, chat.completion
The Response contains a list of choices. The role: *.message.role, and the content: *.message.content or, for text completions, the *.text attribute.
Creation timestamp -- in milliseconds (!)
Metadata of the response
This ID gets generated by the database when we save the request
Tokens used
The timestamp of the request, in milliseconds