The following methods are exposed by the Prediction API:
Action | REST URL |
---|---|
Invoke training |
POST
|
Check training status |
GET
|
Predict |
POST
|
Predict against a hosted model |
POST
|
Stream additional training data [ Categorical models only ] |
PUT
|
Delete a trained model |
DELETE
|
Important: The Google Prediction API supports only HTTPS calls; HTTP is no longer supported. If you are receiving an "HTTP 404 Not Found" error, check to be sure that you are accessing the Prediction API using HTTPS, not HTTP.
Invoke Training
Invoke training on your data by sending a POST request as described below. Note that each time you call this method, it will clear out any existing model with the same name. After making this request, you must make another request to check training status to determine when training is complete.
You must have read permission on the Google Storage object that holds your training data. By default, a Google Storage object only supports read access to the object creator. See here to learn how to read or modify Google Storage object ACLs.
REST Request
POST https://www.googleapis.com/prediction/v1.2/training?key=api_key { "id":"mybucket/mydata" }
- api_key
- [ Present only when not using OAuth2 ] This is the user's Google API access key from the Identities page of the Google APIs console . Ignored when using OAuth2.
- mybucket / mydata
- The full Google Storage path to your training data. The Google Storage path you specify here will be used to identify this model in all future requests.
REST Response
Training requests are asynchronous; if successful, the request returns immediately with the following reply, indicating that training has begun. You must check training status to learn when training is complete. Training can take up to 10 minutes, depending on the complexity and size of the data, but will typically take less time. A successful return value is a simple echo of the data location:
{ "kind":"prediction#training", "id":"training_bucket/training_data", "selfLink":"https://www.googleapis.com/prediction/v1.2/URL_of_resource, }
-
kind
- The resource type of the reply.
-
id
- The name of the model, which is the path to the training data in Google Storage.
-
selfLink
- The URL to re-request this resource.
Check Training Status
After you invoke training, send this GET request to determine the status of the training session. The URL specifies the Google Storage location of your training file.
Only the user who trained a model can check its training status.
Note:
If you are retraining an existing model, the
modelInfo
field will show an accuracy value in even if the new training is not complete. This number will be the accuracy of the previously trained model, which is still usable, until the new model has finished training.
REST Request
GET https://www.googleapis.com/prediction/v1.2/training/mybucket%2Fmydata?key=api_key
- api_key
- [ Present only when not using OAuth2 ] This is the user's Google API access key from the Identities page of the Google APIs console . Ignored when using OAuth2.
REST Response
Two different replies are possible, depending on whether you are training a categorical or a regression model:
Complete Status Reply:
The reply varies slightly depending on whether this was a categorical or regression query:
{
"kind":"prediction#training",
"id":"training_bucket/training_data",
"selfLink":"https://www.googleapis.com/prediction/v1.2/URL_of_resource,
"modelInfo":{
"modelType":"classification | regression",
"classificationAccuracy":0.XX, // Categorical models only!
"meanSquaredError":X.XX // Regression models only!
},
"trainingStatus":status
}
-
kind
- The resource type of the reply.
-
id
- The name of the model, which is the path to the training data in Google Storage.
-
selfLink
- The URL to call to re-request this resource.
-
modelType
- The type of model. This will be either "classification" or "regression".
-
classificationAccuracy
[ Categorical models only ] - A number between 0.0 and 1.0, where 1.0 is 100% accurate. This is an estimate, based on the amount and quality of the training data, of the estimated prediction accuracy. You can use this is a guide to decide whether the results are accurate enough for your needs. This estimate will be more reliable if your real input data is similar to your training data.
-
meanSquaredError
[ Regression models only ] - A number 0.0 or greater, representing the mean squared error. The mean squared error is the average of the square of the difference between the predicted and actual values. This is an estimate, based on the amount and quality of the training data, of the estimated prediction accuracy. You can use this is a guide to decide whether the results are accurate enough for your needs. This estimate will be more reliable if your real input data is similar to your training data.
-
trainingStatus
- The status of the training request. It will be one of the following values: RUNNING; DONE; ERROR; ERROR: TRAINING JOB NOT FOUND
Predict
To request a prediction, send data in the same format as a single row of the training data, minus the first column (the result column).
Here are some things to be aware of when requesting a prediction:
- Successive queries with the same data should return the same result, if you have not retrained the system.
- You can compare regression values or categorical scores from successive queries in order to rank results.
- Only the user who trained a model can send prediction requests to it.
REST Request
POST https://www.googleapis.com/prediction/v1.2/training/mybucket%2Fmyobject/predict?key=api_key { "input":{ "csvInstance":[ col1_value, col2_value, ... ] } }
- api_key
- [ Present only when not using OAuth2 ] This is the user's Google API access key from the Identities page of the Google APIs console . Ignored when using OAuth2.
- POST data
-
Where
col1
,col2
, and so on are features in the same order as your training table. Note that string fields must be surrounded by escaped quotes. Here's an example request to predict a person's height, based on the example training data:{ "input":{ "csvInstance":["M", 1.59, 1.51,"France"] } }
The response syntax varies, depending on whether the training model uses categorical or regression data.
Categorical Response
For categorical tasks, the system responds with an output in this format:
{ "kind":"prediction#output", "id":"training_bucket/training_data", "selfLink":"https://www.googleapis.com/prediction/v1.2/URL_of_resource, "outputValue":X.XX, // Regression only "outputLabel":"most_likely_category", // Categorical only "outputMulti":[{"label":"category1", "score":0.XX}, // Categorical only {"label":"category2", "score":0.XX}, // Categorical only ...] }
-
kind
- The resource type of the reply.
-
id
- The name of the model, which is the path to the training data in Google Storage.
-
selfLink
- The URL to re-request this resource.
-
outputValue
- [ Regression models only ] A predicted value for the submitted item, calculated based on given values in the training data.
-
outputLabel
- [ Categorical models only ] The category that best fits the submitted data.
-
outputMulti
-
[
Categorical models only
] The results, with one entry for every category in the training table, along with a score assigned to that category. The largest, most positive score is the most likely match. A value will be returned for every category present in the training data; you cannot currently specify how many categories to return.
-
label
- The category being described. -
score
- A score associated with this category. Scores are typically negative; whether negative or positive, the largest score is the most likely. See notes below.
-
Notes on Categorical Model Scores
- Score values range from 0.0–1.0, with 1.0 being the highest. All values should add up to 1.0. Note: if you used an earlier version of the API that used a different range, you must retrain your data model in order for scores to be scaled to 0.0–1.0.
- Consider having a cutoff value above which the categorization is useful and below which you might ignore it. We can't advise a hard cutoff value; instead, try running a few queries for borderline items, and use that as an approximate cutoff value for your categories.
- These values are not probabilities; that is, they are not the confidence that a rating is correct. They are a measure of how closely a category seems to conform to the query item.
- Scores are relative to each other, and do not need to add up to a specific value (for example, to 1.0).
- It is hard to say absolutely what is a significant difference in scores. For example, is 0.33 is "significantly" better than 0.42? Is 0.25 "twice as good" as 0.125? Instead, assume that the highest value is the best fit, and have a cutoff value that, if the best fit is below it, you won't use the data. You'll have to experiment with the system to determine what is a meaningful cutoff value for your data.
Predict Against a Hosted Model
To send a prediction request against a hosted model is nearly the same as sending a prediction against any other model; the only difference is that the request URL is slightly different, as shown below. Using hosted models is convenient when you don't have the time, resources, or expertise to build a model for a specific topic.
REST Request
POST https://www.googleapis.com/prediction/v1.2/hostedmodels/model_name/predict?key=api_key { "input":{ "csvInstance":[ col1_value, col2_value, ... ] } }
- api_key
- [ Present only when not using OAuth2 ] This is the user's Google API access key from the Identities page of the Google APIs console . Ignored when using OAuth2.
- model_name
- The name of the hosted model. The model's documentation should give this value.
- POST data
-
Where
col1
,col2
, and so on are features in the same order as the model training table. See the model documentation to learn the example format. Note that string fields must be surrounded by escaped quotes.
REST Response
The response is the same as the
response to non-hosted models
, except that the
selfLink
member does not point to a valid resource.
Stream Additional Data
You can add additional training examples to a trained category model with an API call. This is useful if you have a regular stream of new information that you'd like to add to your model as it becomes available, rather than having to recompile, re-upload, and retrain the data with batches of new data. The model will not be retrained each time it receives a new example; rather, it retrains after every N new examples have been added, where N is a small number.
Note that the system may weight newer streamed examples more than earlier examples. If you do not want this, you should periodically retrain the system .
You can only stream additional examples to a categorization model.
Note: If you retrain a model against its original training data file, all the streamed data will be lost.
REST Request
The request takes a JSON object with parameters.
PUT https://www.googleapis.com/prediction/v1.2/training/mybucket%2Fmydata { "classLabel" : my_label "csvInstance: [ col1, col2....colN ] }
- classLabel
- The category label to assign to this example. Only category examples can be streamed to an existing model.
- csvInstance
- The example data as an array of columns, in the same format as the CSV file .
REST Response
When successful, the following object is returned.
{ "kind": "prediction#training", "id": "mybucket%2Fmydata", "selfLink": "https://www.googleapis.com/prediction/v1.2/training/mybucket/mydata" }
- kind
- The resource type of this JSON object.
- id
- The URI-encoded model name.
- selfLink
- A link to this resource.
Delete a Trained Model
Only the user who trained a model can delete it.
To delete a trained model, send the following DELETE request to delete a previously trained model from the Google Prediction API:
REST Request
DELETE https://www.googleapis.com/prediction/v1.2/training/mybucket%2Fmydata&key=api_key
- api_key
- [ Present only when not using OAuth2 ] This is the user's Google API access key from the Identities page of the Google APIs console . Ignored when using OAuth2.
REST Response
An empty response is returned if the model exists and is successfully deleted, and a 400 error if the model does not exist.