This Reference Guide is a detailed technical reference of the collections, resources, methods, and authentication requirements for the Prediction API v1.3.
Introduction
Contents and Overview
Data | Method |
REST URI
Relative to
https://www.googleapis.com/prediction/v1.3
|
Access |
---|---|---|---|
Hostedmodels Collection
↳ Hostedmodels Resource |
Hosted model prediction |
POST
|
AUTHENTICATED |
Training Collection
↳ Training Resource |
Train |
POST
|
AUTHENTICATED |
Streaming training |
PUT
|
AUTHENTICATED | |
Get training status / model info |
GET
|
AUTHENTICATED | |
Predict |
POST
|
AUTHENTICATED | |
Delete a model |
DELETE
|
AUTHENTICATED |
Hostedmodels Collection
Hosted Models
Hosted models are trained models that anyone can call. These models can be free, but most have a usage fee associated with them, as described in their documentation. See a list of available hosted models in the hosted model hosted model gallery . To send a prediction request against a hosted model is nearly the same as sending a prediction against any other model; the only difference is that the request URL is slightly different. Using hosted models is convenient when you don't have the time, resources, or expertise to build a model for a specific topic. If you have a model that you'd like to make public, follow the submission links in the hosted model gallery.
Hostedmodels Resource
Hostedmodel resources are not needed or returned by any method calls.
prediction.hostedmodels.predict (AUTHENTICATED)
Run a prediction request against a hosted model. The hosted model name is part of the model URL in this format:
https://www.googleapis.com/prediction/vx.x/hostedmodels/ {model_name} /predict.
For example, if the access URL were this:
https://www.googleapis.com/prediction/v1.3/hostedmodels/sample.languageid/predict
then the model name would be "sample.languageid" .
Input data is an object with the following syntax:
{ "input":{ "csvInstance":[ col1_value, col2_value, ... ] } }
Where col1_value , col2_value , and so on are entity features, as described by the hosted model's documentation. Note that string fields must be surrounded by escaped quotes.
Here's an example request to a hosted model that predicts a person's height, if the model expects a string gender ("M" or "F"), two height numbers, and a string country name:
{ "input":{ "csvInstance":["M", 1.59, 1.51,"France"] } }
Notes on categorical model scores in the response:
- Score values range from 0.0–1.0, with 1.0 being the highest. All values should add up to 1.0. Note: if you used an earlier version of the API that used a different range, you must retrain your data model in order for scores to be scaled to 0.0–1.0.
- Consider having a cutoff value above which the categorization is useful and below which you might ignore it. We can't advise a hard cutoff value; instead, try running a few queries for borderline items, and use that as an approximate cutoff value for your categories.
- These values are not probabilities; that is, they are not the confidence that a rating is correct. They are a measure of how closely a category seems to conform to the query item.
- Scores are relative to each other, and do not need to add up to a specific value (for example, to 1.0).
- It is hard to say absolutely what is a significant difference in scores. For example, is 0.33 is "significantly" better than 0.42? Is 0.25 "twice as good" as 0.125? Instead, assume that the highest value is the best fit, and have a cutoff value that, if the best fit is below it, you won't use the data. You'll have to experiment with the system to determine what is a meaningful cutoff value for your data.
POST https://www.googleapis.com/prediction/v1.3/hostedmodels/hostedModelName/predict
Try it now in the APIs Explorer!
{ "kind": "prediction#output", "id": string, "selfLink": string, "outputLabel": string, "outputMulti": [ { "label": string, "score": double } ], "outputValue": double }
Property Name | Value | Description |
---|---|---|
kind
|
string
|
What kind of resource this is. |
id
|
string
|
The name of the hosted model. |
selfLink
|
string
|
A URL to re-request this resource. |
outputLabel
|
string
|
[ Present in categorical models only ] A predicted value for the submitted item, calculated based on given values in the training data. |
outputMulti[]
|
list
|
[ Present in categorical models only ] The results, with one entry for every category in the training table, along with a score assigned to that category. The largest, most positive score is the most likely match. A value will be returned for every category present in the training data; you cannot currently specify how many categories to return. See the notes above in the method description. |
outputMulti[].label
|
string
|
The category being described. |
outputMulti[].score
|
double
|
A score associated with this category; the largest score is the most likely. See notes below. |
outputValue
|
double
|
[ Present in regression models only ] The category that best fits the submitted value. |
Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction
Training Collection
Training Resource
A Training resource describes a trained prediction model.
{ "kind": "prediction#training", "id": string, "selfLink": string, "utility": [ { label_n: double }, ... ], "modelInfo": { "numberInstances": long, "modelType": string, "numberClasses": long, "classificationAccuracy": double, "classWeightedAccuracy": double, "confusionMatrix": { actual_label_name: { predicted_label_name: double, ... }, ... }, "confusionMatrixRowTotals": { label_name: double }, "meanSquaredError": double }, "trainingStatus": string }
Property Name | Value | Description |
---|---|---|
kind
|
string
|
What kind of resource this is. |
id
|
string
|
The name of the model. This is the bucket/object path of the training data in Google Storage. |
selfLink
|
string
|
A URL to re-request this resource. |
utility[]
|
list
|
[
Categorical models only
] Input only, for training requests. See
prediction.training.insert()
for details. Format is: [{'label1':
val_1
},{'label2':
val_2
}] where the value is a positive double precision value. Not all labels must be specified; default value for unspecified labels is 1.0. Labels must match example labels exactly.
Example:
'utility': [ {'not_spam' : 5}, {'spam' : 1} ]
|
modelInfo
|
object
|
An object containing information about the model. Present on replies; do not include this member in requests. |
modelInfo.numberInstances
|
long
|
Describes how many training entries are present in the training data. This is less than or equal to the number of entries in the training data + any streaming training entries. If an entry could not be imported or parsed, it will not be included in this value. This number can be used to check for import errors or to count how many training examples comprise the model. Streaming training entries are included in this value. |
modelInfo.modelType
|
string
|
The type of model. This will be either "classification" or "regression". |
modelInfo.numberClasses
|
long
|
[ Categorical models only ] Describes the number of categories in the training data and any streaming updates. |
modelInfo.classificationAccuracy
|
double
|
[C ategorical models only ] A number between 0.0 and 1.0, where 1.0 is 100% accurate. This is an estimate, based on the amount and quality of the training data, of the estimated prediction accuracy. You can use this is a guide to decide whether the results are accurate enough for your needs. This estimate will be more reliable if your real input data is similar to your training data. |
modelInfo.classWeightedAccuracy
|
double
|
[ Categorical models only ] Similar to modelInfo.classificationAccuracy , but takes any utility weights into account. |
modelInfo.confusionMatrix
|
object
|
[
Categorical models only
] Describes a confusion matrix of labels that the Prediction engine properly and improperly categorized during training, as assessed during a post-training self-assessment. See
prediction.training.get()
for details.
|
modelInfo.confusionMatrixRowTotals
|
object
|
Description of total number of labels assigned to each category. |
modelInfo.meanSquaredError
|
double
|
[R egression models only ] A number 0.0 or greater, representing the mean squared error. The mean squared error is the average of the square of the difference between the predicted and actual values. This is an estimate, based on the amount and quality of the training data, of the estimated prediction accuracy. You can use this is a guide to decide whether the results are accurate enough for your needs. This estimate will be more reliable if your real input data is similar to your training data. |
trainingStatus
|
string
|
The status of the training request. It will be one of the following values: RUNNING; DONE; ERROR; ERROR: TRAINING JOB NOT FOUND |
prediction.training.get (AUTHENTICATED)
Returns information about a trained model; most often used to request the training status of a model. Training is an asynchronous process; after invoking training by calling prediction.training.insert() , you must call get() and examine the trainingStatus member of the returned resource to learn the training status.
Important: Only the user who trained a model can call this method.
This method returns a modelInfo.confusionMatrix property that describes a confusion matrix of labels properly and improperly applied to each training entry during training. This is useful for evaluating the accuracy of training over your data; if the matrix indicates that specific values are often confused, you might want to change your training data structure.
Here is an example confusion matrix for a language identification model. In this model, for all entries with the label "French", 12 were properly identified as French and 0.5 were improperly identified as English. You can see the values for items labeled "Spanish" and "English" as well. Numbers can be fractions because they are averaged across multiple training runs. confusionMatrixRowTotals describes the total number of each label applied.
"confusionMatrix": { "French": { "French": 12.0, "English": 0.5 }, "Spanish": { "Spanish": 6.0, "English": 1.0 }, "English": { "French": 0.5, "Spanish": 2.0, "English": 20.0 } }, "confusionMatrixRowTotals": { "French": 12.5, "Spanish": 7.0, "English": 22.5 } }
Note: If you are retraining an existing model, the modelInfo field will show an accuracy value in even if the new training is not complete. This number will be the accuracy of the previously trained model, which is still usable, until the new model has finished training.
GET https://www.googleapis.com/prediction/v1.3/training/bucket%2Fobject
Try it now in the APIs Explorer!
Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction
prediction.training.insert (AUTHENTICATED)
Asynchronous request to train your model.
Invoke training on your data by sending a POST request as described
below. Note that each time you call this method, it will clear out any
existing model with the same name. After making this request, you must call
prediction.training.get()
to check training status to determine when training
is complete.
You must have read permission on the Google Storage object that holds your training data. By default, a Google Storage object only supports read access to the object creator. See here to learn how to read or modify Google Storage object ACLs.
Request data is a Training resource with the following properties:
- id - The bucket/object name of the model.
-
utility
[
Optional, categorical models only
] - Assigns a numeric weight to one or more categories in the training data. The purpose of this property is to prevent false positives by assigning a relative weight to specific categories, where the higher the value, the higher the associated cost with mislabeling something that is actually in that category as something else. For example, in a spam identification model, identifying some spam as non-spam is relatively lower cost than identifying some non-spam as spam. Therefore you would include a utility property with the following value (assuming your non-spam examples have the label 'not_spam'):
'utility':[{'not_spam':5.0}]
. Unlisted labels receive a default weight of 1.0, so the previous example would assign 'spam' a utility value of 1.0.
Training requests are asynchronous; if successful, the request returns immediately with the following reply, indicating that training has begun. Check training status to learn when training is complete. Training can take up to 10 minutes, depending on the complexity and size of the data, but will typically take less time. A successful response is a simple echo of the data location as shown here:
{ "kind":"prediction#training", "id":"bucket/object", "selfLink":"https://www.googleapis.com/prediction/v1.3/URL_of_resource, }
Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction
prediction.training.update (AUTHENTICATED)
[ Categorization models only ] Streaming training: trains a previously trained model against a new example. This is useful if you have a regular stream of new information that you'd like to add to your model as it becomes available, rather than having to recompile, re-upload, and retrain the data with batches of new data. The model is not retrained each time it receives a new example; rather, it retrains after every N new examples have been added, where N is a small number.
Note that the system may weight newer streamed examples more than
earlier examples. If you do not want this, you should add the examples to your training data and retrain the system against all the data by calling
prediction.training.insert()
.
Note: If you retrain a model against its original training data file, all the streamed data will be lost.
The request takes a JSON object with the following parameters:
{ "classLabel" : my_label "csvInstance: [ col1, col2....colN ] }
- classLabel
- The category label to assign to this example. Only category examples can be streamed to an existing model.
- csvInstance
- The example data as an array of columns, in the same format as the CSV file .
PUT https://www.googleapis.com/prediction/v1.3/training/bucket%2Fobject
Try it now in the APIs Explorer!
Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction
prediction.training.delete (AUTHENTICATED)
Deletes a trained model. Only the user who inserted (trained) a model can delete it.
If successful, an empty response is returned. Otherwise, an appropriate HTTP or Prediction API error will be returned.
DELETE https://www.googleapis.com/prediction/v1.3/training/bucket%2Fobject
Try it now in the APIs Explorer!
Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction
prediction.training.predict (AUTHENTICATED)
Run a prediction request against your model.
Input data is an object with the following syntax:
{ "input":{ "csvInstance":[ col1_value, col2_value, ... ] } }
Where col1_value , col2_value , and so on are entity features, as described by the hosted model's documentation. Note that string fields must be surrounded by escaped quotes.
Here's an example request to a hosted model that predicts a person's height, if the model expects a string gender ("M" or "F"), two height numbers, and a string country name:
{ "input":{ "csvInstance":["M", 1.59, 1.51,"France"] } }
Notes on categorical model response scores:
- Score values range from 0.0–1.0, with 1.0 being the highest. All values should add up to 1.0. Note: if you used an earlier version of the API that used a different range, you must retrain your data model in order for scores to be scaled to 0.0–1.0.
- Consider having a cutoff value above which the categorization is useful and below which you might ignore it. We can't advise a hard cutoff value; instead, try running a few queries for borderline items, and use that as an approximate cutoff value for your categories.
- These values are not probabilities; that is, they are not the confidence that a rating is correct. They are a measure of how closely a category seems to conform to the query item.
- Scores are relative to each other, and do not need to add up to a specific value (for example, to 1.0).
- It is hard to say absolutely what is a significant difference in scores. For example, is 0.33 is "significantly" better than 0.42? Is 0.25 "twice as good" as 0.125? Instead, assume that the highest value is the best fit, and have a cutoff value that, if the best fit is below it, you won't use the data. You'll have to experiment with the system to determine what is a meaningful cutoff value for your data.
POST https://www.googleapis.com/prediction/v1.3/training/mybucket%2Fmyobject/predict
Try it now in the APIs Explorer!
{ "kind": "prediction#output", "id": string, "selfLink": string, "outputLabel": string, "outputMulti": [ { "label": string, "score": double } ], "outputValue": double }
Property Name | Value | Description |
---|---|---|
kind
|
string
|
What kind of resource this is. |
id
|
string
|
The name of the hosted model. |
selfLink
|
string
|
A URL to re-request this resource. |
outputLabel
|
string
|
[ Present in categorical models only ] A predicted value for the submitted item, calculated based on given values in the training data. |
outputMulti[]
|
list
|
[ Present in categorical models only ] The results, with one entry for every category in the training table, along with a score assigned to that category. The largest, most positive score is the most likely match. A value will be returned for every category present in the training data; you cannot currently specify how many categories to return. See the notes above in the method description. |
outputMulti[].label
|
string
|
The category being described. |
outputMulti[].score
|
double
|
A score associated with this category. Scores are typically negative; whether negative or positive, the largest score is the most likely. See notes below. |
outputValue
|
double
|
[ Present in regression models only ] The category that best fits the submitted data. |
Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction