Dataiku 11.1 update improves Data Science and MLOps

Dataiku has introduced the latest update to its data science and machine learning platform, Dataiku 11.1. This update includes enhancements to existing capabilities as well as new features for data scientists, ML engineers, and analysts.

Dataiku 11 introduced a guided task within VisualML to simplify the development and deployment of time series forecasting models. Update 11.1 now allows users to optimize hyperparameters for their prediction models. This optimization uses a k-fold cross-validation strategy that respects time ordering and ensures that the validation folds are both consistent on the training sets and do not overlap, according to Dataiku.

When k-fold cross-validation is enabled for binary or multiclass classification tasks, a new stratified option splits samples into the same proportion as they occur in the entire population and can be used to eliminate sampling bias during cross-validations validations. The company says this approach allows users to more accurately model situations seen during forecasting, or when users model past data to make predictions with forward-looking data. There are also new model comparison generation capabilities for time series models that allow data scientists to compare and contrast models with metrics such as performance metrics, time series resampling settings, feature processing, and details about the training.

Hyperparameter optimization for prediction models using a k-fold cross-validation strategy is now available. Source: Dataiku

Explaining model behavior or troubleshooting unexpected or incorrect predictions is valuable for clarifying model predictions to stakeholders. The Dataiku platform supports explainability through its VisualML interpretation functionalities, and for computer vision users this has now been enhanced for image classification and object detection modeling in 11.1. The What If section now contains a visual representation of a heat map that highlights which areas had the greatest impact on the model’s prediction. When you hover over images for each predicted class, the heatmap is overlaid on the estimated image to see exactly which pixels the model focused on for its prediction.

The platform’s explainer features are now also available for externally sourced models brought to Dataiku through the MLflow integration: “Data scientists can now calculate partial dependence to see how the model is affected by values ​​in each variable, subpopulation analysis to track potential outliers on subsets of data and individual explanations for deep diving extreme probabilities,” Dataiku said in a company blog post.

A new heatmap overlay shows which pixels the computer vision model is focused on when making a prediction. Source: Dataiku

For those who used Dataiku’s MLflow integration to import models, the reverse is now possible. Models developed in Dataiku 11.1 can now be exported in the open source MLflow format for ML engineers who wish to deploy models outside of Dataiku. Users can also directly export Dataiku models to Python code for use in any Python code outside of Dataiku.

Dataiku 11.1 also has two new chart types for data visualization. There is now a treemap diagram for visualizing relationships and relationships between items in categorical and hierarchical data. A second addition is a KPI chart that displays individual summary functions as single numbers with conditional formatting to measure KPI progress.

Other platform enhancements include support for additional data link types and table descriptions, improved data exploration, cleaning and exporting, and new coding capabilities. Visit the release notes or a blog by Christina Hsiao of Dataiku to read more about 11.1.

Similar products:

Dataiku 11 Release offers an enhanced set of AI tools

Dataiku launches new documentary series ‘AI & Us’.

Dataiku raises $400M in quest to democratize AI

Leave a Comment