Neuton
Explainability Office ®
When building solutions for machine learning, it is equally important to evaluate the quality of a model and prediction results, as well as to be able to interpret them.

Hence we created the Explainability Office, a unique set of tools that allow users to evaluate model quality at every stage, identify the logic behind the model analysis, and therefore also understand why certain predictions have been made.
Interpretation
To comprehend the decision-making process and identify internal patterns, we simulate the output of the model across the entire variety of input variables, and present the result in the form of comprehensible slices of multidimensional space. At the same time, we rank these slices by influence for each specific prediction.
Model Interpreter
The Model Interpreter is a tool that allows you to visually see the logic, direction and the effects of changes in individual variables in the model. It also shows the importance of these variables in relation to the target variable.
Read more
Feature Importance Matrix (FIM)
After the model has been trained, the platform displays a chart with the 10 features that had the most significant impact on the model prediction power. You can also select any other features to see their importance. FIM also has 2 modes, displaying either only the original features or the features after feature engineering. For classification tasks you can see the feature importance for every class.
Quality
Evaluate model quality at every stage:
Model Lifecycle: data
Exploratory Data Analysis (EDA)
Exploratory Data Analysis (EDA) is a tool that automates graphical data analysis and highlights the most important statistics in the context of a single variable, overall data, interconnections, and in relation to the target variable in a training dataset. Given the potentially wide feature space, up to 20 of the most important features with the highest statistical significance are selected for EDA based on machine learning modeling.
Read more
Model Lifecycle: Training
Model Quality Diagram
Model Quality Diagram simplifies the process of evaluating the quality of the model, and also allows users to look at the model from the perspective of various metrics simultaneously in a single graphical view. We offer an extensive list of metrics describing the quality popular in the data science community.
Model Lifecycle: Prediction
Besides well-known indicators for evaluation of model quality (e.g. probability and credibility interval), we also calculate a set of additional indicators (row-level explainability):
Confidence Interval
The Confidence Interval, for regression problems, shows in what range the predicted value can change and with what probability.
Model-to-Data Relevance Indicator
Model-to-data Relevance Indicator calculates the statistical differences between the data uploaded for predictions and the data used for model training. Significant differences in the data may indicate metric decay (model prediction quality degradation).
Model Lifecycle: APPLICATION
Historical Model-to-Data Relevance Indicator
Historical Model-to-data Relevance is an excellent signal for models to retrain. This indicator is designed even for downloadable models, which allows to manage a model lifecycle even outside the platform.
COMING SOON
Validate Model on New Data
Validate Model on New Data shows model metrics on new data to help determine whether the model should be retrained to reflect the statistical changes and dependencies in new data. It also shows metrics in multidimensional space (Model Quality Diagram).
In the «Exploratory Data Analysis» tool you can find the specified information on graphics in each of the following sections:
Dataset overview
This section displays brief data statistics of your training dataset and provides the following information: problem type, dataset dimension and number of missing values recorded.
Continuous data distribution and relation to the target variable
Visualization of each continuous variable yields two plots:
Variable density distribution chart
A Density plot visualizing the distribution of data across all rows in the dataset.
This chart is a variation of a Histogram that uses kernel smoothing to plot values, allowing for smoother distributions by smoothing out the noise. The peaks of a Density plot help display where values are concentrated over the interval.
Feature relation to the target variable (different for regression and classification task types)
This chart is presented in one of the following two formats: line chart, indicating the continuous variable changes with the changes in the continuous target variable (regression task type) or histogram, showing the mean continuous variable value for each of the classes of the target variable (classification task type).
Discrete data distribution and relation to the target variable
Visualization of each categorical variable yields two plots:
Histogram displaying feature categories count
Feature categories relation to the target variable (different for regression and classification task types)
This chart is presented in one of two formats, depending on task type: a histogram displaying the mean target variable for each of the feature categories (regression task type) or a histogram displaying the number of each of the target classes in each of the feature classes (classification task type).
Feature correlations
Visualization of the correlations in the data yields two plots:
Heatmap displaying the binary correlation of the 10 most important variables, between each other and with the target variable (the 10 most important features are selected based on the binary correlation of the features with the target variable).
Histogram (horizontal) displaying the level of high mutual correlation between independent variable pairs. Pairs are selected if the value of their mutual correlation exceeds 0.7.
Target variable distribution
Visualization of the target variable statistics is presented in one of two formats:
Violin plot displaying the distribution, median and outliers in the target variable (regression task type).
Histogram/count plot displaying the number and percentage of each of the target classes throughout the whole dataset (classification task type).
Outliers Visualization
Outliers Visualization of the outliers in the data is presented in one of the two plots:
Scatter plot displaying the variable distribution in relation to the target variable (regression task type);
Box plot displaying the variable distribution/quantiles/median and the outliers (classification task type). Outliers are marked purple according to the plots legends.
Time Dependencies
A Time dependency plot is created if a date-time type column is presented in the data. Visualization of time dependency yields three plots, each displaying a line chart of the target variable changes over time. The difference between the charts is the level of data aggregation:
Chart 1: No aggregation. Target variable value is plotted against each date point in the data.
Charts 2 and 3 will dynamically aggregate data into years/months/weeks/days/hours/minutes. Aggregation options are automatically selected based on the data timeframe.
Missing Values Visualization
Missing Values Visualization yields two histograms of the missing values in the data, displaying each data feature as an equal bar with missing values indicator against corresponding data indexes.
The «Missing Values Map Overall Dataset» plot displays all the data feature bars without feature names and with missing values percentage indicator. The purpose of this plot is to give an overall visual representation of the missing values in the data.
The «Missing Values Map and percentage» plot displays only the columns which contain missing values with feature names, missing values percentage and the corresponding locations of the missing values in the dataset.
Stay updated, join the community
slack