...
- An End User is analyzing data using their BI Tool and determines that predictive analytics for the data would be valuable and they wish to "train" a model with the data for that purpose. This step is the traditional step when a user interacts with BI.
(a) Obtain a token a token with permission associated to the user making the request. This token is going to pass to AI allowing the access to the training data with a SQL statement running against the datastore. (b) BI tool, on behalf of the user, requests AI platform through OBAIC, to train/prepare a model that accepts features of a certain type (numeric, categorical, text, etc.)
Expand title API to train model using provided dataset Model configuration is based on configs from the open-source Ludwig project. At a minimum, we should be able to define inputs and outputs in a fairly standard way. Other model configuration parameters are subsumed by the options field.
The data stanza provides a bearer token allowing the ML provider to access the required data table(s) for training. The provided SQL query indicates how the training data should be extracted from the source.
Don't be confused with the Bearer token which is used to authenticate with OBAIC, and the dbToken which is created in 2(a) and AI platform will use that to access the data source for training
HTTP Request Value Method POST
Header Authorization: Bearer {token}
URL {prefix}/models/
Query Parameters {
"dbToken": "D41C4A382C27A4B5DF824E2D4F148";
"inputs":[
{
"name":"customerAge",
"type":"numeric"
},
{
"name":"activeInLastMonth",
"type":"binary"
}
],
"outputs":[
{
"name":"canceledMembership",
"type":"binary"
}
],
"modelOptions": {“providerSpecificOption”: “value”
},
"data":{
"sourceType":"snowflake",
"endpoint":"some/endpoint",
"bearerToken":"...",
"query":"SELECT foo FROM bar WHERE baz"
}
}Expand title Alternatively, we may also consider to support SQL-like syntax for Model Training If we go beyond just REST API, SQL-like is an alternative as the syntax is also well-known
Use BigQuery ML model creation as an example and generalizing
CREATE MODEL (
customerAge WITH ENCODING (
type=numeric
),
activeInLastMonth WITH ENCODING (
type=binary
),
canceledMembership WITH DECODING (
type=binary
)
)
FROM myData (
sourceType=snowflake,
endpoint="some/endpoint",
bearerToken=<...>,
)AS (SELECT foo FROM BAR)
WITH OPTIONS ();Expand title 200: Training is started and the corresponding ID is return for future reference HTTP Response Value Header Content-Type: application/json; charset=utf-8
Body {
"modelID": "d677b054-2cd4-4711-959b-971af0081a73"
}
modelID
is generated and returned to the caller if training is started successfully. This will be used to check the status of the training, or for future Inference (see Inference section below)
- AI Platform provides the implementation to fulfill the request by connecting to the datasource with the provided token and the set of training data specified in SQL. This step is up to how the AI platform interacts with the data source to performance the training.
BI tool polls for the status or retrieve the training result. If the training is still in progress, the status will be returned. When training is completed, results and performance of the model will be returned.
Expand title API to get model status HTTP Request Value Method GET
Header Authorization: Bearer {token}
URL {prefix}/modelStatus?modelID=
Query Parameters modelID (type: String): The modelID returned from previous OBAIC call either from training or list of Models.
Expand title 200: Status of the Model returned HTTP Response Value Header Content-Type: application/json; charset=utf-8
Body {
"modelID": "d677b054-2cd4-4711-959b-971af0081a73",
"status": "training",
"progress": "80",
}
modelID
is same ID provided in the requeststatus
can be training | inferencing | readyprogress
is the estimated progress of the current status
- BI tool presents the result to the user in their own way, which is the "secret sauce" and unique to each other.
Protocol - Inference
1. When a BI user wants to extend its capability to AI, it reaches out to AI platform and request for requests a list of available models of which the credential of the provided token is authorized to see
...
2. After the list of models is returnreturned, the BI user can selectively retrieve the detail of the model(s). This step can also be called right after the newly trained model is completed as described in the previous section since modelID is returned as a result of the training request.
...
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
{ "id": "6d4b571a-80ca-41ef-bc67-b158f4352ad8", "name": "Model 1", "revision": 3, "format": { "name": "PMML", "version": "4.3" }, "algorithm": "Neural Network", "tags": [ "Anomaly detection", "Banking" ], "dependency", "", "creator": "John Doe", "description": "This is a predictive model, refer to {input} and {output} for detailed format of each field, such as value range of a field, as well as possible predictions the model will gave. You may also refer to the example data here.", "input": { "fields": [ { "name": "Account ID", "opType": "categorical", "dataType": "string", "taxonomy": "ID", "example": "account abc-001", "allowMissing": false, "description": "unique value" }, { "name": "Account Balance", "opType": "continuous", "dataType": "double", "taxonomy": "currency", "example": "1,378,560.00", "allowMissing": true, "description": "Minimum: 0, Maximum: 999,999,999.00" }, ], "ref": "http://dmg.org/pmml/v4-3/pmml-4-3.xsd" } "output": { "fields": [ { "name": "Churn", "opType": "continuous", "dataType": "string", "taxonomy": "ID", "example": "0.67", "allowMissing": false, "description": "the possibility of the account stop doing business with a company over 6 months" } ], "ref": "http://dmg.org/pmml/v4-3/pmml-4-3.xsd" } "performance": { "metric": "accuracy", "value": 0.85 }, "rating": 5, "url": "uri://link_to_the_model" } |
3. The BI tool tools will use the information retrieved from the AI platform to display to the user, including what type of models are available and the performance. It can optionally match the data and suggest what may be the good match based on what the user has.
4. User interacts with the result BI presented and decides what can be a good model to make a prediction on certain set of data. Please note that the model can also be returned as the result of the training step described in the previous section. In the case, the user may bypass these 2 steps and go directly to see the result.
5. Once the BI user/developer decides which model to run for prediction. BI will compile all required predictions, they will take the appropriate actions in the BI tool to prepare the data and call OBAIC and request it run that model with the data.
Use POST instead of GET
Query should be in the body
...
Expand | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||
Pass by value is recommended only for small data set sets
|
...
- Define potential value for each parameter in the API call
- Formally define JSON in http://json-schema.org/ so that future development can validate the JSON structure
- Define data pipeline to transform data before running
- Define containerized model so that prediction can run in BI instead of in AI
- Define format of nextPageToken
- Define different types of
errorCode
andmessage
for each API call
FAQ
- Why should AI share model to BI?The setting of OBAIC assumes an organization owns both the BI Tool(s) and AI platform(s). However, they are 2 (or more) discrete entities and may not have a good way to integrate. Hence OBAIC comes in to connect the dots.vendors care to participate in the OBAIC standard?
- Most AI model training and execution is done by a very small set of data scientists. The OBAIC protocol extends the influence and ability for the AI vendors. They will no longer need to work with BI partners to create one off implementations and continually maintain those.
- Why should BI vendors care to participate in the OBAIC standard?
- Providing end users access to predictions/prescriptions is the desired goal. Writing one off drivers to support every AI flavor of the week isn't the goal. The OBAIC standard provides BI vendors the opportunity to build and support 1 driver, while then enabling all customers/prospects to bring their own AI implementation(s) to the table and not lose deals as the result of not having a customer driver in place for that customer to expand, or prospect to purchase.
- Who owns the model and data?
- The AI platform owns the model models but share those with BI tools through OBAIC. The data is owned by the business but BI has been authorized to use it and re-share this to AI for training and inference.
- How do you deal with Security?
- Call will be handled by HTTPS protocol and authorized by bearer token standard
...