Get started with the Censius Python SDK
To install the latest Python package, you can simply use pip
pythonpip install censius
You can also install a specific version of the
censius
package,pythonpip install censius==VERSION_HERE
Initialisation
censius_client()
Pre-requisite: To create a Project from dashboard.
The first step is to instantiate a
Client
object with an API key and a project ID. This is successively used to authenticate every call made using the SDK.api_key: can be found the console navigator bar (on left side).
project_id: can be found via on project list page as each project is associated with certain integer id.
pythonfrom censius.ml import CensiusClient, ModelType, DatasetType, ExplanationType, Dataset client = CensiusClient(api_key = *YOUR_API_KEY*, project_id = *YOUR_PROJECT_ID*)
register_dataset()
You can use the
register_dataset
API to register a dataset to the Censius platform.Download the titanic dataset that we use in the example here.
Arguments | Type | Description | |
name | string | The name of the dataset | Required |
file | dataframe | File that stores feature values. Right now, only CSV files are supported. | Required |
features | list<dict> | The list of columns for the dataset. It is a list of dictionaries, each containing two keys, one name and the other type. Valid values of type is DatasetType.STRING, DatasetType.INT, DatasetType.BOOLEAN, DatasetType.DECIMAL.
Optional for categorical variable representation:
- categorical = Boolean
- category_map = Map(key, actual) | Required |
timestamp | dict | The default timestamp column to be processed, if it is part of the dataset.
The accepted timestamp types are DatasetType.UNIX_MS which represent unix format in milliseconds. | Optional |
ㅤ | ㅤ | ㅤ | ㅤ |
Dataset feature, and target variable name are only accepted if they satisfy following constraints:
Lowercase Alphanumeric, and underscore (eg: age_in_years).
pythonimport pandas as pd from datetime import datetime dataframe_object = pd.read_csv("training-titanic.csv") datasetDetails = client.register_dataset( name="titanic_dataset", file=dataframe_object, features=[ {"name": "age_in_years", "type": DatasetType.INT}, { "name": "gender", "type": DatasetType.INT, "categorical": True, "category_map": {0: "male", 1: "female"}, }, {"name": "pclass", "type": DatasetType.INT}, {"name": "sibsp", "type": DatasetType.INT}, {"name": "parch", "type": DatasetType.INT}, {"name": "fare", "type": DatasetType.DECIMAL}, {"name": "survived", "type": DatasetType.INT}, ], ) datasetId = datasetDetails["dataset_id"]
register_model()
You can use this API to register a new model to the Censius platform. For subsequent updates to the model with new versions, register_new_model_version() should be called.
Arguments | Type | Description | |
model_id | string | The ID of the model | Required |
model_name | string | The name of the model | Required |
model_type | enum | This is the type of the targets of the model. Currently supported value is ModelType.BINARY_CLASSIFICATION and ModelType.REGRESSION | Required |
model_version | string | A string to represent the version of the model | Required |
training_info | dict | Recording the ID of the dataset the model is trained on | Required |
targets | list<string> | These are the columns the model predicts. | Required |
features | list<string> | They are the columns the model uses to predict the targets | Required |
pythonuniq_key = datetime.today().strftime('%M%H%d') input_features = ["age_in_years", "gender", "pclass", "sibsp", "parch", "fare"] target_feature = ["survived"] model_id = "titanic_model_" + uniq_key # its supposed to be unique. modelDetails = client.register_model( model_id=model_id, model_name="titanic model", model_type=ModelType.BINARY_CLASSIFICATION, model_version="v1", training_info={"method": Dataset.ID, "id": datasetId}, targets=target_feature, features=input_features, ) modelId = modelDetails["userDefinedModelID"] modelVersion = modelDetails["version"]
model_id
must be unique across an entire tenant. Combination of model_id
and model_version
must be unique as well.register_new_model_version()
You can use this API to add a new version to an existing model. Example, “v2” of a model.
Arguments | Type | Description | |
model_id | string | The ID of the model | Required |
model_version | string | A string to represent the version of the model | Required |
training_info | dict | Recording the ID of the dataset the model is trained on | Required |
targets | list<string> | These are the columns the model predicts. | Required |
features | list<string> | They are the columns the model uses to predict the targets | Required |
pythonnewVersion = "v2" client.register_new_model_version( model_id=modelId, model_version=newVersion, training_info={"method": Dataset.ID, "id": datasetId}, targets=["survived"], features=["gender", "pclass", "sibsp"], )
model_version
must be unique here as you are registering a new version of an existing modelLogging predictions, features, and explanations
log()
This function enables logging individual predictions, features (and optionally explanations). It can be integrated as part of the production environment to log these values as predictions are made.
Arguments | Type | Description | |
prediction_id | string | The ID of this prediction log. This can be used to update the actual of this log later | Required |
model_id | string | The model ID against which you want to log the prediction | Required |
model_version | string | The version of the model against which you want to log the prediction | Required |
features | dict | A dict with feature names as keys and processed feature values as values. | Required |
prediction | dict | A dictionary containing feature headings as keys, and a dict that contains two keys, label and optionally, confidence as values. For example, ”Loan Status”: {”label”: 2, "confidence": 0.2} | Required |
timestamp | int | UNIX epoch timestamp in milliseconds or time.time.now() to indicate the current time. | Required |
actual | dict | A dictionary containing actual for the prediction log. The keys are the target features and the values are the ground truth values of the feature. | Optional |
When using
time.time.now()
remember that the time is calculated in UTC on the client-side, not the server side.Logging a single prediction
pythonimport time predictionId="<unique_id>" #supposed to be random or system associated. client.log( prediction_id=predictionId, model_id=modelId, model_version=modelVersion, features={ "age_in_years": 23.0, "gender": 0, "pclass": 3, "sibsp": 0, "parch": 0, "fare": 6.975, }, prediction={"survived": {"label": 1, "confidence": 1}}, timestamp=int(round(time.time() * 1000)), )
Logs are currently being aggregated at every 60 mins by default. This can be changed in custom deployments—reach out to us if you need a different frequency.
log_actual()
If the actual wasn't available when
log()
was called, it can be updated at a later time using log_actual()
. This can be the case for certain types of models where the ground truth isn't immediately available.Arguments | Type | Description | |
prediction_id | int | The prediction ID against which you want to update the actual | Required |
actual | dict | A dictionary containing actual to be updated. The keys are the target feature headings and the values are the ground truth values of the feature. | Required |
model_id | string | The model ID for the prediction for which you need to update the actual | Required |
model_version | string | The model version for the prediction for which you need to update the actual | Required |
Keys in the
actual
attribute should match the target
attribute of the model. For example, if your model target column is Loan
, when updating actual, the actual
attribute should be a dict of the format{”Loan”: ACTUAL_VALUE}
pythonclient.update_actual( prediction_id=predictionId, model_id=modelId, model_version=modelVersion, actual={ "survived": 1, }, )
log_explanations()
Arguments | Type | Description | |
prediction_id | int | The prediction ID against which you want to update the actual | Required |
model_id | string | The model ID for the prediction for which you need to update the actual | Required |
model_version | string | The model version for the prediction for which you need to update the actual | Required |
explanation_type | enum | The type of explanation. Currently supports ExplanationType.SHAP | Required |
explanation_values | dict | A dictionary containing features and their explanations. The keys are the target feature headings and the values are the explanation values. | Required |
pythonclient.log_explanation( prediction_id=predictionId, model_id=modelId, model_version=modelVersion, explanation_type=ExplanationType.SHAP, explanation_values={ "age_in_years": 0.467479, "gender": 0.038536, "pclass": 0.665614, "sibsp": 0.607935, "parch": 0.240294, "fare": -0.522526, }, )
Bulk Log Insertion
bulk_log()
This function enables you to send the predictions, actuals and explanations logs in bulk. It can be integrated as part of the production environment where you are collecting the model logs and send them altogether in a single insertion call (something like once in a day frequency).
Following cases has to be part of
bulk_log
calls
1. All logs (predictions, actuals and explanations) details must be present in the bulk_log
call.
2. Combination of predictions, and explanations.
3. Just the predictions.
4. Combination of actuals, and explanations. (predictions has to be send prior) Note: Just the explanation logging is not accepted
5. Just the actuals (predictions has to be send prior)
Arguments | Type | Description | |
input | Pandas Dataframe | Pandas DataFrame of bulk logs containing predictions, actuals, and explanations values. | Required |
model_id | string | The model ID against which you want to log the bulk insertion. | Required |
model_version | string | The version of the model for which you want to log the bulk insertion. | Required |
prediction_id_column | string | Name of the <ID> column in input DataFrame. The values of this columns must be NOT NULL & Unique. | Required |
predictions | object | The object used is Prediction.Tabular this collect information regarding the predictions and feature columns in the input DataFrame.
More details in Prediction.Tabular table below. | optional |
actuals | string | Name of the column in input DataFrame which refers to the values of Actual. | optional |
explanations | object | The object used is Explanation.Tabular this collect information regarding the explanations values, explanation type, and feature columns in the input DataFrame.
More details in Explanation.Tabular table below. | optional |
Prediction.Tabular
Arguments | Type | Description | |
timestamp_column | timestamp | Name of the column which specify the timestamp for each prediction in the input DataFrame. | Required |
prediction_column | string | Name of the column which specify the Prediction values in the input DataFrame. This column must be NOT NULL. | Required |
prediction_confidence_column | float | Name of the column which specify the prediction_score value in the input DataFrame. This column must be NOT NULL. | Required |
features | list<object> | List of object with a mapping of registered features to column names in the input DataFrame.
Example:
{"feature": "Age" , "input_column": "age_in_years"}
Here, “Age” was mentioned while registering model and “age_in_years” is a column in DataFrame which corresponds “Age” feature values in bulk_logs. | Optional |
Explanation.Tabular
Arguments | Type | Description | |
type | enum | The type of explanation. Currently supports ExplanationType.SHAP | Required |
explanation_mapper | list<object> | List of object with a mapping of registered features to column names in the input DataFrame.
Example:
{"feature": "Age" , "input_column": "age_shap"}
Here, “Age” was mentioned while registering model and “age_shap” is a column in DataFrame which corresponds to SHAP values of “Age” feature in bulk_logs. | Required |
pythonfrom censius.ml import CensiusClient, Prediction, Explanation, ExplanationType import pandas as pd BULK_LOG_CSV_PATH = "<path-to-csv>" bulk_log_data = pd.read_csv(BULK_LOG_CSV_PATH) client.bulk_log( input=bulk_log_data[:], prediction_id_column="log_id", model_id=modelId, model_version=modelVersion, predictions=Prediction.Tabular( timestamp_column="timestamp", prediction_column="prediction_survived", prediction_confidence_column="prediction_confidence", features=[ {"feature": "age_in_years", "input_column": "age_in_years"}, {"feature": "gender", "input_column": "gender"}, {"feature": "pclass", "input_column": "pclass"}, {"feature": "sibsp", "input_column": "sibsp"}, {"feature": "parch", "input_column": "parch"}, {"feature": "fare", "input_column": "fare"}, ], ), actuals="actual_survived", explanations=Explanation.Tabular( type=ExplanationType.SHAP, explanation_mapper=[ {"feature": "age_in_years", "input_column": "age_shap"}, {"feature": "gender", "input_column": "gender_shap"}, {"feature": "pclass", "input_column": "pclass_shap"}, {"feature": "sibsp", "input_column": "sibsp_shap"}, {"feature": "parch", "input_column": "parch_shap"}, {"feature": "fare", "input_column": "fare_shap"}, ], ), )
Updating model metadata
update_model_iteration()
If there is a model retrain in the production environment, you can use this function to mark the start and end time of the production data that the model was retrained on. This newly added model iteration will get registered along with provided meta data. This iteration details are helpful while doing Root Cause Analysis for violation raised by monitors. Iteration will get represented across the monitor in edit monitors sections.
Arguments | Type | Description | |
model_id | string | The ID of the model for which you want to update the the metadata ( model_id, model_version, and model_name ) | Required |
model_version | string | The version of the model for which you want to update the metadata ( model_id, model_version, and model_name ) | Required |
model_iteration_id | string | Client ID which act as a correlation ID on Censius dashboard. | Optional |
release_datetime | int | UNIX epoch timestamps in millisecond conveying the release time for the model (example: 946684800000) | Required |
dataset_start_datetime | int | UNIX epoch timestamps in millisecond conveying the training dataset start time (example: 946684800000) | Required |
dataset_end_datetime | int | UNIX epoch timestamps in millisecond conveying the training dataset end time (example: 946684800000)
Note: dataset_end_datetime > dataset_start_datetime | Required |
training_accuracy | float | Provides detail of training accuracy of current iteration
Supposed to be in between 0-100 | Optional |
validation_accuracy | float | Provides detail of validation accuracy of current iteration
Supposed to be in between 0-100 | Optional |
area_under_curve | float | Provides detail of AUC of current iteration
Supposed to be in between 0-1 | Optional |
samples_count | int | Provides sample detail of current iteration was trained on | Optional |
pythontimestampMS = int(round(time.time() * 1000)) client.update_model_iteration( model_id=modelId, model_version=modelVersion, release_datetime=timestampMS, dataset_start_datetime=1671517394000, dataset_end_datetime=1672726994000, training_accuracy=0.95, validation_accuracy=0.95, area_under_curve=0.24, sample_count=100, )