Quick Start Guide
Get up and running with SuperML Java 2.0.0 in just a few minutes! This guide will walk you through setting up the framework, training your first model with AutoML, and creating professional visualizations.
π 5-Minute Quickstart with AutoML & Visualization
Step 1: Add Dependency
Complete Framework (Recommended)
<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-bundle-all</artifactId>
<version>2.0.0</version>
</dependency>
Modular Installation (Advanced)
<!-- Core + Linear Models + Visualization -->
<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-core</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-linear-models</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-visualization</artifactId>
<version>2.0.0</version>
</dependency>
Step 2: AutoML - Your First Model (One Line!)
import org.superml.datasets.Datasets;
import org.superml.autotrainer.AutoTrainer;
import org.superml.visualization.VisualizationFactory;
public class QuickStartAutoML {
public static void main(String[] args) {
// 1. Load a dataset
var dataset = Datasets.loadIris();
// 2. AutoML - One line training!
var result = AutoTrainer.autoML(dataset.X, dataset.y, "classification");
System.out.println("π― Best Algorithm: " + result.getBestAlgorithm());
System.out.println("π Best Score: " + result.getBestScore());
System.out.println("βοΈ Best Parameters: " + result.getBestParams());
// 3. Professional visualization (GUI + ASCII fallback)
VisualizationFactory.createDualModeConfusionMatrix(
dataset.y,
result.getBestModel().predict(dataset.X),
new String[]{"Setosa", "Versicolor", "Virginica"}
).display();
}
}
Step 3: Traditional ML Pipeline with Visualization
import org.superml.datasets.Datasets;
import org.superml.linear_model.LogisticRegression;
import org.superml.preprocessing.StandardScaler;
import org.superml.pipeline.Pipeline;
import org.superml.model_selection.ModelSelection;
import org.superml.visualization.VisualizationFactory;
public class QuickStartPipeline {
public static void main(String[] args) {
// 1. Load and split data
var dataset = Datasets.loadIris();
var split = ModelSelection.trainTestSplit(dataset.X, dataset.y, 0.2, 42);
// 2. Create ML pipeline
var pipeline = new Pipeline()
.addStep("scaler", new StandardScaler())
.addStep("classifier", new LogisticRegression().setMaxIter(1000));
// 3. Train the pipeline
pipeline.fit(split.XTrain, split.yTrain);
// 4. Make predictions
double[] predictions = pipeline.predict(split.XTest);
double[] predictions = model.predict(split.XTest);
double[][] probabilities = model.predictProba(split.XTest);
// 5. Evaluate performance
var metrics = Metrics.classificationReport(split.yTest, predictions);
System.out.println("π Accuracy: " + String.format("%.3f", metrics.accuracy));
System.out.println("π F1-Score: " + String.format("%.3f", metrics.f1Score));
// 6. Create professional confusion matrix (GUI or ASCII)
VisualizationFactory.createDualModeConfusionMatrix(
split.yTest,
predictions,
new String[]{"Setosa", "Versicolor", "Virginica"}
).display();
// 7. Feature importance visualization
VisualizationFactory.createRegressionPlot(
split.yTest,
predictions,
"Iris Classification Results"
).display();
}
}
Step 3: Run and See Results
mvn compile exec:java -Dexec.mainClass="QuickStart"
Expected output:
Accuracy: 100.00%
Class probabilities for first 3 samples:
Sample 1: [0.000, 0.020, 0.980]
Sample 2: [0.980, 0.020, 0.000]
Sample 3: [0.000, 1.000, 0.000]
π§ Core Concepts
Estimators
All models implement the Estimator
interface:
// Training
model.fit(X, y);
// Prediction
double[] predictions = model.predict(X);
// Parameters
Map<String, Object> params = model.getParams();
model.setParams(params);
Datasets
Built-in datasets for quick experimentation:
// Classification datasets
var iris = Datasets.loadIris();
var wine = Datasets.loadWine();
// Regression datasets
var boston = Datasets.loadBoston();
var diabetes = Datasets.loadDiabetes();
// Synthetic data
var classification = Datasets.makeClassification(1000, 20, 2);
var regression = Datasets.makeRegression(1000, 10);
Model Selection
Split data and validate models:
// Train/test split
var split = ModelSelection.trainTestSplit(X, y, 0.2, 42);
// Cross-validation
double[] scores = ModelSelection.crossValidate(model, X, y, 5);
double meanScore = Arrays.stream(scores).average().orElse(0.0);
ποΈ Building Pipelines
Chain preprocessing and models together:
import org.superml.pipeline.Pipeline;
import org.superml.preprocessing.StandardScaler;
// Create a pipeline
var pipeline = new Pipeline()
.addStep("scaler", new StandardScaler())
.addStep("classifier", new LogisticRegression());
// Train the entire pipeline
pipeline.fit(X, y);
// Make predictions (automatically applies preprocessing)
double[] predictions = pipeline.predict(X);
π Model Evaluation
Comprehensive metrics for model evaluation:
// Classification metrics
double accuracy = Metrics.accuracy(yTrue, yPred);
double precision = Metrics.precision(yTrue, yPred);
double recall = Metrics.recall(yTrue, yPred);
double f1 = Metrics.f1Score(yTrue, yPred);
// Confusion matrix
int[][] confMatrix = Metrics.confusionMatrix(yTrue, yPred);
// Regression metrics
double mse = Metrics.meanSquaredError(yTrue, yPred);
double mae = Metrics.meanAbsoluteError(yTrue, yPred);
double r2 = Metrics.r2Score(yTrue, yPred);
π Hyperparameter Tuning
Automatically find the best parameters:
import org.superml.model_selection.GridSearchCV;
// Define parameter grid
Map<String, Object[]> paramGrid = Map.of(
"maxIterations", new Object[]{500, 1000, 1500},
"learningRate", new Object[]{0.001, 0.01, 0.1}
);
// Create grid search
var gridSearch = new GridSearchCV(
new LogisticRegression(), paramGrid, 5);
// Find best parameters
gridSearch.fit(X, y);
// Get results
System.out.println("Best score: " + gridSearch.getBestScore());
System.out.println("Best params: " + gridSearch.getBestParams());
π Kaggle Integration
Train models on real Kaggle datasets with one line:
import org.superml.datasets.KaggleTrainingManager;
import org.superml.datasets.KaggleIntegration.KaggleCredentials;
// Setup Kaggle credentials (see Kaggle Integration guide)
var credentials = KaggleCredentials.fromDefaultLocation();
var trainer = new KaggleTrainingManager(credentials);
// Train on any Kaggle dataset
var results = trainer.trainOnDataset("titanic", "titanic", "survived");
// Get best model
var bestResult = results.get(0);
System.out.println("Best algorithm: " + bestResult.algorithm);
System.out.println("Best score: " + bestResult.score);
π Available Algorithms
Supervised Learning
Classification:
LogisticRegression
- Binary and multiclass classificationRidge
- L2 regularized classification (when used with discrete targets)
Regression:
LinearRegression
- Ordinary least squaresRidge
- L2 regularized regressionLasso
- L1 regularized regression with feature selection
Unsupervised Learning
Clustering:
KMeans
- K-means clustering with k-means++ initialization
Preprocessing
StandardScaler
- Feature standardization (z-score normalization)
π Project Structure
src/main/java/com/superml/
βββ core/ # Base interfaces
βββ linear_model/ # Linear algorithms
βββ cluster/ # Clustering algorithms
βββ preprocessing/ # Data preprocessing
βββ metrics/ # Evaluation metrics
βββ model_selection/ # Cross-validation & tuning
βββ pipeline/ # ML pipelines
βββ datasets/ # Data loading & Kaggle integration
π― Next Steps
- Try More Examples: Check out Basic Examples
- Learn Pipelines: Read the Pipeline System guide
- Explore Kaggle: Try Kaggle Integration
- Optimize Models: Learn Hyperparameter Tuning
- Production Ready: Study Performance Optimization
π‘ Tips for Success
- Start Simple: Begin with basic models before complex pipelines
- Use Built-in Datasets: Great for learning and testing
- Validate Everything: Always use cross-validation for model evaluation
- Log Performance: Use the logging framework to track training progress
- Read the Examples: Real code examples are in the
examples/
folder
Ready to build amazing ML applications? Letβs go! π
π― Algorithm Quick Examples
Tree-Based Algorithms
// Decision Tree
DecisionTree dt = new DecisionTree("gini", 10);
dt.fit(XTrain, yTrain);
double[] predictions = dt.predict(XTest);
// Random Forest
RandomForest rf = new RandomForest(100, 15);
rf.fit(XTrain, yTrain);
double[] rfPredictions = rf.predict(XTest);
// Gradient Boosting
GradientBoosting gb = new GradientBoosting(100, 0.1, 6);
gb.fit(XTrain, yTrain);
double[] gbPredictions = gb.predict(XTest);
Multiclass Classification
// One-vs-Rest with any binary classifier
LogisticRegression base = new LogisticRegression();
OneVsRestClassifier ovr = new OneVsRestClassifier(base);
ovr.fit(XTrain, yTrain);
// Direct multinomial approach
SoftmaxRegression softmax = new SoftmaxRegression();
softmax.fit(XTrain, yTrain);
double[][] probabilities = softmax.predictProba(XTest);
// Enhanced LogisticRegression (auto multiclass)
LogisticRegression lr = new LogisticRegression().setMultiClass("auto");
lr.fit(XTrain, yTrain); // Automatically handles multiclass
Linear Models
// Logistic Regression
LogisticRegression lr = new LogisticRegression()
.setMaxIter(1000)
.setRegularization("l2")
.setC(1.0);
// Ridge Regression
Ridge ridge = new Ridge()
.setAlpha(1.0)
.setNormalize(true);
// Lasso Regression
Lasso lasso = new Lasso()
.setAlpha(0.1)
.setMaxIter(1000);
π 30-Second Examples
Binary Classification
var data = Datasets.makeClassification(1000, 10, 2);
var split = DataLoaders.trainTestSplit(data.X,
Arrays.stream(data.y).asDoubleStream().toArray(), 0.2, 42);
RandomForest rf = new RandomForest(50, 10);
rf.fit(split.XTrain, split.yTrain);
System.out.println("Accuracy: " + rf.score(split.XTest, split.yTest));
Multiclass Classification
var data = Datasets.loadIris(); // 3-class problem
var split = DataLoaders.trainTestSplit(data.X,
Arrays.stream(data.y).asDoubleStream().toArray(), 0.3, 42);
SoftmaxRegression softmax = new SoftmaxRegression();
softmax.fit(split.XTrain, split.yTrain);
double[][] probas = softmax.predictProba(split.XTest);
Regression
var data = Datasets.makeRegression(800, 5, 1, 0.1);
var split = DataLoaders.trainTestSplit(data.X, data.y, 0.2, 42);
GradientBoosting gb = new GradientBoosting(100, 0.05, 6);
gb.fit(split.XTrain, split.yTrain);
System.out.println("RΒ² Score: " + gb.score(split.XTest, split.yTest));
π― Advanced Features Showcase
AutoML with Hyperparameter Optimization
import org.superml.datasets.Datasets;
import org.superml.autotrainer.AutoTrainer;
import org.superml.model_selection.GridSearchCV;
public class AdvancedAutoML {
public static void main(String[] args) {
// Load dataset
var dataset = Datasets.makeClassification(1000, 20, 5, 42);
// Advanced AutoML with custom configuration
var config = new AutoTrainer.Config()
.setAlgorithms("logistic", "randomforest", "gradientboosting")
.setSearchStrategy("random") // or "grid", "bayesian"
.setCrossValidationFolds(5)
.setMaxEvaluationTime(300) // 5 minutes max
.setEnsembleMethods(true);
var result = AutoTrainer.autoMLWithConfig(dataset.X, dataset.y, config);
System.out.println("π Best Model Performance:");
System.out.println(" Algorithm: " + result.getBestAlgorithm());
System.out.println(" CV Score: " + String.format("%.4f", result.getBestScore()));
System.out.println(" Parameters: " + result.getBestParams());
// Get ensemble if available
if (result.hasEnsemble()) {
System.out.println("π€ Ensemble Performance: " +
String.format("%.4f", result.getEnsembleScore()));
}
}
}
Production Inference with Monitoring
import org.superml.inference.InferenceEngine;
import org.superml.persistence.ModelPersistence;
import org.superml.drift.DriftDetector;
public class ProductionInference {
public static void main(String[] args) {
// Load trained model
var model = ModelPersistence.load("my_iris_model.json");
// Setup inference engine
var engine = new InferenceEngine()
.setModelCache(true)
.setPerformanceMonitoring(true)
.setBatchSize(100);
// Register model
engine.registerModel("iris_classifier", model);
// Setup drift monitoring
var driftDetector = new DriftDetector("iris_classifier")
.setThreshold(0.05)
.setAlertCallback(alert -> {
System.out.println("π¨ Drift detected: " + alert.getMessage());
});
// Make predictions with monitoring
double[][] newData = 5.1;
double[] predictions = engine.predict("iris_classifier", newData);
// Monitor for drift
driftDetector.checkDrift(newData, predictions);
System.out.println("π― Prediction: " + predictions[0]);
System.out.println("β‘ Inference time: " + engine.getLastInferenceTime() + "ΞΌs");
}
}
Kaggle Competition Integration
import org.superml.kaggle.KaggleTrainingManager;
import org.superml.kaggle.KaggleIntegration.KaggleCredentials;
public class KaggleCompetition {
public static void main(String[] args) {
// Setup Kaggle credentials
var credentials = KaggleCredentials.fromDefaultLocation();
var manager = new KaggleTrainingManager(credentials);
// One-line training on any Kaggle dataset
var config = new KaggleTrainingManager.TrainingConfig()
.setAlgorithms("logistic", "randomforest", "xgboost")
.setGridSearch(true)
.setSaveModels(true)
.setSubmissionFormat(true);
var results = manager.trainOnDataset(
"titanic", // competition name
"titanic", // dataset name
"survived", // target column
config
);
// Best model results
var bestResult = results.get(0);
System.out.println("π Best Model: " + bestResult.algorithm);
System.out.println("π CV Score: " + String.format("%.4f", bestResult.cvScore));
System.out.println("πΎ Model saved: " + bestResult.modelFilePath);
System.out.println("π€ Submission: " + bestResult.submissionFilePath);
}
}
π Visualization Examples
Professional GUI Charts
import org.superml.visualization.VisualizationFactory;
import org.superml.datasets.Datasets;
public class VisualizationShowcase {
public static void main(String[] args) {
var dataset = Datasets.loadIris();
// 1. Interactive Confusion Matrix (XChart GUI)
VisualizationFactory.createXChartConfusionMatrix(
dataset.y,
someModel.predict(dataset.X),
new String[]{"Setosa", "Versicolor", "Virginica"}
).display();
// 2. Feature Scatter Plot with Clusters
VisualizationFactory.createXChartScatterPlot(
dataset.X,
dataset.y,
"Iris Dataset Features",
"Sepal Length", "Sepal Width"
).display();
// 3. Model Performance Comparison
VisualizationFactory.createModelComparisonChart(
Arrays.asList("LogisticRegression", "RandomForest", "SVM"),
Arrays.asList(0.95, 0.97, 0.94),
"Model Performance Comparison"
).display();
// 4. Automatic fallback to ASCII if no GUI
VisualizationFactory.createDualModeConfusionMatrix(dataset.y, predictions)
.setAsciiMode(true) // Force ASCII mode
.display();
}
}
π§ Module Selection Guide
Minimal Setup (Core ML only)
<!-- Just core algorithms -->
<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-core</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-linear-models</artifactId>
<version>2.0.0</version>
</dependency>
Standard ML Pipeline
<!-- Core + preprocessing + model selection -->
<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-core</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-linear-models</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-preprocessing</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-model-selection</artifactId>
<version>2.0.0</version>
</dependency>
AutoML & Visualization
<!-- Add AutoML and professional visualization -->
<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-autotrainer</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-visualization</artifactId>
<version>2.0.0</version>
</dependency>
Production Deployment
<!-- Add inference engine and model persistence -->
<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-inference</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-persistence</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-drift</artifactId>
<version>2.0.0</version>
</dependency>
Everything (Recommended for Development)
<!-- Complete framework -->
<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-bundle-all</artifactId>
<version>2.0.0</version>
</dependency>
π Next Steps
Learning Path
- Start Here: Run the AutoML example above
- Core Concepts: Try the pipeline example
- Advanced Features: Experiment with visualization
- Production: Explore inference and persistence
- Competitions: Try Kaggle integration
- Custom Solutions: Build your own ML applications
Essential Documentation
- Modular Architecture - Understanding the 21-module system
- Algorithm Reference - Complete guide to all 12+ algorithms
- Examples Collection - 11 comprehensive examples
- API Reference - Complete API documentation
- Production Guide - Deployment and monitoring
Code Examples
All code examples are available in the superml-examples
module:
BasicClassification.java
- Fundamental conceptsAutoMLExample.java
- Automated machine learningXChartVisualizationExample.java
- Professional GUI chartsProductionInferenceExample.java
- High-performance servingKaggleIntegrationExample.java
- Competition workflows
Ready to build amazing ML applications with SuperML Java 2.0.0! π
Start with AutoML for instant results, then dive deeper into the modular architecture for custom solutions.