Skip to main content

Quick Start Guide

Get up and running with SuperML Java 2.0.0 in minutes with AutoML and visualization

Quick Start Guide

Get up and running with SuperML Java 2.0.0 in just a few minutes! This guide will walk you through setting up the framework, training your first model with AutoML, and creating professional visualizations.

πŸš€ 5-Minute Quickstart with AutoML & Visualization

Step 1: Add Dependency

<dependency>
    <groupId>org.superml</groupId>
    <artifactId>superml-bundle-all</artifactId>
    <version>2.0.0</version>
</dependency>

Modular Installation (Advanced)

<!-- Core + Linear Models + Visualization -->
<dependency>
    <groupId>org.superml</groupId>
    <artifactId>superml-core</artifactId>
    <version>2.0.0</version>
</dependency>
<dependency>
    <groupId>org.superml</groupId>
    <artifactId>superml-linear-models</artifactId>
    <version>2.0.0</version>
</dependency>
<dependency>
    <groupId>org.superml</groupId>
    <artifactId>superml-visualization</artifactId>
    <version>2.0.0</version>
</dependency>

Step 2: AutoML - Your First Model (One Line!)

import org.superml.datasets.Datasets;
import org.superml.autotrainer.AutoTrainer;
import org.superml.visualization.VisualizationFactory;

public class QuickStartAutoML {
    public static void main(String[] args) {
        // 1. Load a dataset
        var dataset = Datasets.loadIris();
        
        // 2. AutoML - One line training!
        var result = AutoTrainer.autoML(dataset.X, dataset.y, "classification");
        
        System.out.println("🎯 Best Algorithm: " + result.getBestAlgorithm());
        System.out.println("πŸ“Š Best Score: " + result.getBestScore());
        System.out.println("βš™οΈ Best Parameters: " + result.getBestParams());
        
        // 3. Professional visualization (GUI + ASCII fallback)
        VisualizationFactory.createDualModeConfusionMatrix(
            dataset.y, 
            result.getBestModel().predict(dataset.X),
            new String[]{"Setosa", "Versicolor", "Virginica"}
        ).display();
    }
}

Step 3: Traditional ML Pipeline with Visualization

import org.superml.datasets.Datasets;
import org.superml.linear_model.LogisticRegression;
import org.superml.preprocessing.StandardScaler;
import org.superml.pipeline.Pipeline;
import org.superml.model_selection.ModelSelection;
import org.superml.visualization.VisualizationFactory;

public class QuickStartPipeline {
    public static void main(String[] args) {
        // 1. Load and split data
        var dataset = Datasets.loadIris();
        var split = ModelSelection.trainTestSplit(dataset.X, dataset.y, 0.2, 42);
        
        // 2. Create ML pipeline
        var pipeline = new Pipeline()
            .addStep("scaler", new StandardScaler())
            .addStep("classifier", new LogisticRegression().setMaxIter(1000));
        
        // 3. Train the pipeline
        pipeline.fit(split.XTrain, split.yTrain);
        
        // 4. Make predictions
        double[] predictions = pipeline.predict(split.XTest);
        double[] predictions = model.predict(split.XTest);
        double[][] probabilities = model.predictProba(split.XTest);
        
        // 5. Evaluate performance
        var metrics = Metrics.classificationReport(split.yTest, predictions);
        System.out.println("πŸ“ˆ Accuracy: " + String.format("%.3f", metrics.accuracy));
        System.out.println("πŸ“Š F1-Score: " + String.format("%.3f", metrics.f1Score));
        
        // 6. Create professional confusion matrix (GUI or ASCII)
        VisualizationFactory.createDualModeConfusionMatrix(
            split.yTest, 
            predictions,
            new String[]{"Setosa", "Versicolor", "Virginica"}
        ).display();
        
        // 7. Feature importance visualization
        VisualizationFactory.createRegressionPlot(
            split.yTest, 
            predictions,
            "Iris Classification Results"
        ).display();
    }
}

Step 3: Run and See Results

mvn compile exec:java -Dexec.mainClass="QuickStart"

Expected output:

Accuracy: 100.00%

Class probabilities for first 3 samples:
Sample 1: [0.000, 0.020, 0.980]
Sample 2: [0.980, 0.020, 0.000]
Sample 3: [0.000, 1.000, 0.000]

πŸ”§ Core Concepts

Estimators

All models implement the Estimator interface:

// Training
model.fit(X, y);

// Prediction
double[] predictions = model.predict(X);

// Parameters
Map<String, Object> params = model.getParams();
model.setParams(params);

Datasets

Built-in datasets for quick experimentation:

// Classification datasets
var iris = Datasets.loadIris();
var wine = Datasets.loadWine();

// Regression datasets  
var boston = Datasets.loadBoston();
var diabetes = Datasets.loadDiabetes();

// Synthetic data
var classification = Datasets.makeClassification(1000, 20, 2);
var regression = Datasets.makeRegression(1000, 10);

Model Selection

Split data and validate models:

// Train/test split
var split = ModelSelection.trainTestSplit(X, y, 0.2, 42);

// Cross-validation
double[] scores = ModelSelection.crossValidate(model, X, y, 5);
double meanScore = Arrays.stream(scores).average().orElse(0.0);

πŸ—οΈ Building Pipelines

Chain preprocessing and models together:

import org.superml.pipeline.Pipeline;
import org.superml.preprocessing.StandardScaler;

// Create a pipeline
var pipeline = new Pipeline()
    .addStep("scaler", new StandardScaler())
    .addStep("classifier", new LogisticRegression());

// Train the entire pipeline
pipeline.fit(X, y);

// Make predictions (automatically applies preprocessing)
double[] predictions = pipeline.predict(X);

πŸ“Š Model Evaluation

Comprehensive metrics for model evaluation:

// Classification metrics
double accuracy = Metrics.accuracy(yTrue, yPred);
double precision = Metrics.precision(yTrue, yPred);
double recall = Metrics.recall(yTrue, yPred);
double f1 = Metrics.f1Score(yTrue, yPred);

// Confusion matrix
int[][] confMatrix = Metrics.confusionMatrix(yTrue, yPred);

// Regression metrics
double mse = Metrics.meanSquaredError(yTrue, yPred);
double mae = Metrics.meanAbsoluteError(yTrue, yPred);
double r2 = Metrics.r2Score(yTrue, yPred);

πŸ” Hyperparameter Tuning

Automatically find the best parameters:

import org.superml.model_selection.GridSearchCV;

// Define parameter grid
Map<String, Object[]> paramGrid = Map.of(
    "maxIterations", new Object[]{500, 1000, 1500},
    "learningRate", new Object[]{0.001, 0.01, 0.1}
);

// Create grid search
var gridSearch = new GridSearchCV(
    new LogisticRegression(), paramGrid, 5);

// Find best parameters
gridSearch.fit(X, y);

// Get results
System.out.println("Best score: " + gridSearch.getBestScore());
System.out.println("Best params: " + gridSearch.getBestParams());

🌐 Kaggle Integration

Train models on real Kaggle datasets with one line:

import org.superml.datasets.KaggleTrainingManager;
import org.superml.datasets.KaggleIntegration.KaggleCredentials;

// Setup Kaggle credentials (see Kaggle Integration guide)
var credentials = KaggleCredentials.fromDefaultLocation();
var trainer = new KaggleTrainingManager(credentials);

// Train on any Kaggle dataset
var results = trainer.trainOnDataset("titanic", "titanic", "survived");

// Get best model
var bestResult = results.get(0);
System.out.println("Best algorithm: " + bestResult.algorithm);
System.out.println("Best score: " + bestResult.score);

πŸ“ˆ Available Algorithms

Supervised Learning

Classification:

  • LogisticRegression - Binary and multiclass classification
  • Ridge - L2 regularized classification (when used with discrete targets)

Regression:

  • LinearRegression - Ordinary least squares
  • Ridge - L2 regularized regression
  • Lasso - L1 regularized regression with feature selection

Unsupervised Learning

Clustering:

  • KMeans - K-means clustering with k-means++ initialization

Preprocessing

  • StandardScaler - Feature standardization (z-score normalization)

πŸ“ Project Structure

src/main/java/com/superml/
β”œβ”€β”€ core/                    # Base interfaces
β”œβ”€β”€ linear_model/           # Linear algorithms
β”œβ”€β”€ cluster/                # Clustering algorithms
β”œβ”€β”€ preprocessing/          # Data preprocessing
β”œβ”€β”€ metrics/               # Evaluation metrics
β”œβ”€β”€ model_selection/       # Cross-validation & tuning
β”œβ”€β”€ pipeline/              # ML pipelines
└── datasets/              # Data loading & Kaggle integration

🎯 Next Steps

  1. Try More Examples: Check out Basic Examples
  2. Learn Pipelines: Read the Pipeline System guide
  3. Explore Kaggle: Try Kaggle Integration
  4. Optimize Models: Learn Hyperparameter Tuning
  5. Production Ready: Study Performance Optimization

πŸ’‘ Tips for Success

  • Start Simple: Begin with basic models before complex pipelines
  • Use Built-in Datasets: Great for learning and testing
  • Validate Everything: Always use cross-validation for model evaluation
  • Log Performance: Use the logging framework to track training progress
  • Read the Examples: Real code examples are in the examples/ folder

Ready to build amazing ML applications? Let’s go! πŸš€

🎯 Algorithm Quick Examples

Tree-Based Algorithms

// Decision Tree
DecisionTree dt = new DecisionTree("gini", 10);
dt.fit(XTrain, yTrain);
double[] predictions = dt.predict(XTest);

// Random Forest  
RandomForest rf = new RandomForest(100, 15);
rf.fit(XTrain, yTrain);
double[] rfPredictions = rf.predict(XTest);

// Gradient Boosting
GradientBoosting gb = new GradientBoosting(100, 0.1, 6);
gb.fit(XTrain, yTrain);
double[] gbPredictions = gb.predict(XTest);

Multiclass Classification

// One-vs-Rest with any binary classifier
LogisticRegression base = new LogisticRegression();
OneVsRestClassifier ovr = new OneVsRestClassifier(base);
ovr.fit(XTrain, yTrain);

// Direct multinomial approach
SoftmaxRegression softmax = new SoftmaxRegression();
softmax.fit(XTrain, yTrain);
double[][] probabilities = softmax.predictProba(XTest);

// Enhanced LogisticRegression (auto multiclass)
LogisticRegression lr = new LogisticRegression().setMultiClass("auto");
lr.fit(XTrain, yTrain);  // Automatically handles multiclass

Linear Models

// Logistic Regression
LogisticRegression lr = new LogisticRegression()
    .setMaxIter(1000)
    .setRegularization("l2")
    .setC(1.0);

// Ridge Regression
Ridge ridge = new Ridge()
    .setAlpha(1.0)
    .setNormalize(true);

// Lasso Regression
Lasso lasso = new Lasso()
    .setAlpha(0.1)
    .setMaxIter(1000);

πŸš€ 30-Second Examples

Binary Classification

var data = Datasets.makeClassification(1000, 10, 2);
var split = DataLoaders.trainTestSplit(data.X, 
    Arrays.stream(data.y).asDoubleStream().toArray(), 0.2, 42);

RandomForest rf = new RandomForest(50, 10);
rf.fit(split.XTrain, split.yTrain);
System.out.println("Accuracy: " + rf.score(split.XTest, split.yTest));

Multiclass Classification

var data = Datasets.loadIris();  // 3-class problem
var split = DataLoaders.trainTestSplit(data.X, 
    Arrays.stream(data.y).asDoubleStream().toArray(), 0.3, 42);

SoftmaxRegression softmax = new SoftmaxRegression();
softmax.fit(split.XTrain, split.yTrain);
double[][] probas = softmax.predictProba(split.XTest);

Regression

var data = Datasets.makeRegression(800, 5, 1, 0.1);
var split = DataLoaders.trainTestSplit(data.X, data.y, 0.2, 42);

GradientBoosting gb = new GradientBoosting(100, 0.05, 6);
gb.fit(split.XTrain, split.yTrain);
System.out.println("RΒ² Score: " + gb.score(split.XTest, split.yTest));

🎯 Advanced Features Showcase

AutoML with Hyperparameter Optimization

import org.superml.datasets.Datasets;
import org.superml.autotrainer.AutoTrainer;
import org.superml.model_selection.GridSearchCV;

public class AdvancedAutoML {
    public static void main(String[] args) {
        // Load dataset
        var dataset = Datasets.makeClassification(1000, 20, 5, 42);
        
        // Advanced AutoML with custom configuration
        var config = new AutoTrainer.Config()
            .setAlgorithms("logistic", "randomforest", "gradientboosting")
            .setSearchStrategy("random")  // or "grid", "bayesian"
            .setCrossValidationFolds(5)
            .setMaxEvaluationTime(300)  // 5 minutes max
            .setEnsembleMethods(true);
        
        var result = AutoTrainer.autoMLWithConfig(dataset.X, dataset.y, config);
        
        System.out.println("πŸ† Best Model Performance:");
        System.out.println("   Algorithm: " + result.getBestAlgorithm());
        System.out.println("   CV Score: " + String.format("%.4f", result.getBestScore()));
        System.out.println("   Parameters: " + result.getBestParams());
        
        // Get ensemble if available
        if (result.hasEnsemble()) {
            System.out.println("πŸ€– Ensemble Performance: " + 
                String.format("%.4f", result.getEnsembleScore()));
        }
    }
}

Production Inference with Monitoring

import org.superml.inference.InferenceEngine;
import org.superml.persistence.ModelPersistence;
import org.superml.drift.DriftDetector;

public class ProductionInference {
    public static void main(String[] args) {
        // Load trained model
        var model = ModelPersistence.load("my_iris_model.json");
        
        // Setup inference engine
        var engine = new InferenceEngine()
            .setModelCache(true)
            .setPerformanceMonitoring(true)
            .setBatchSize(100);
        
        // Register model
        engine.registerModel("iris_classifier", model);
        
        // Setup drift monitoring
        var driftDetector = new DriftDetector("iris_classifier")
            .setThreshold(0.05)
            .setAlertCallback(alert -> {
                System.out.println("🚨 Drift detected: " + alert.getMessage());
            });
        
        // Make predictions with monitoring
        double[][] newData = 5.1;
        double[] predictions = engine.predict("iris_classifier", newData);
        
        // Monitor for drift
        driftDetector.checkDrift(newData, predictions);
        
        System.out.println("🎯 Prediction: " + predictions[0]);
        System.out.println("⚑ Inference time: " + engine.getLastInferenceTime() + "μs");
    }
}

Kaggle Competition Integration

import org.superml.kaggle.KaggleTrainingManager;
import org.superml.kaggle.KaggleIntegration.KaggleCredentials;

public class KaggleCompetition {
    public static void main(String[] args) {
        // Setup Kaggle credentials
        var credentials = KaggleCredentials.fromDefaultLocation();
        var manager = new KaggleTrainingManager(credentials);
        
        // One-line training on any Kaggle dataset
        var config = new KaggleTrainingManager.TrainingConfig()
            .setAlgorithms("logistic", "randomforest", "xgboost")
            .setGridSearch(true)
            .setSaveModels(true)
            .setSubmissionFormat(true);
        
        var results = manager.trainOnDataset(
            "titanic",           // competition name
            "titanic",           // dataset name  
            "survived",          // target column
            config
        );
        
        // Best model results
        var bestResult = results.get(0);
        System.out.println("πŸ† Best Model: " + bestResult.algorithm);
        System.out.println("πŸ“Š CV Score: " + String.format("%.4f", bestResult.cvScore));
        System.out.println("πŸ’Ύ Model saved: " + bestResult.modelFilePath);
        System.out.println("πŸ“€ Submission: " + bestResult.submissionFilePath);
    }
}

πŸ“Š Visualization Examples

Professional GUI Charts

import org.superml.visualization.VisualizationFactory;
import org.superml.datasets.Datasets;

public class VisualizationShowcase {
    public static void main(String[] args) {
        var dataset = Datasets.loadIris();
        
        // 1. Interactive Confusion Matrix (XChart GUI)
        VisualizationFactory.createXChartConfusionMatrix(
            dataset.y,
            someModel.predict(dataset.X),
            new String[]{"Setosa", "Versicolor", "Virginica"}
        ).display();
        
        // 2. Feature Scatter Plot with Clusters
        VisualizationFactory.createXChartScatterPlot(
            dataset.X,
            dataset.y,
            "Iris Dataset Features",
            "Sepal Length", "Sepal Width"
        ).display();
        
        // 3. Model Performance Comparison
        VisualizationFactory.createModelComparisonChart(
            Arrays.asList("LogisticRegression", "RandomForest", "SVM"),
            Arrays.asList(0.95, 0.97, 0.94),
            "Model Performance Comparison"
        ).display();
        
        // 4. Automatic fallback to ASCII if no GUI
        VisualizationFactory.createDualModeConfusionMatrix(dataset.y, predictions)
            .setAsciiMode(true)  // Force ASCII mode
            .display();
    }
}

πŸ”§ Module Selection Guide

Minimal Setup (Core ML only)

<!-- Just core algorithms -->
<dependency>
    <groupId>org.superml</groupId>
    <artifactId>superml-core</artifactId>
    <version>2.0.0</version>
</dependency>
<dependency>
    <groupId>org.superml</groupId>
    <artifactId>superml-linear-models</artifactId>
    <version>2.0.0</version>
</dependency>

Standard ML Pipeline

<!-- Core + preprocessing + model selection -->
<dependency>
    <groupId>org.superml</groupId>
    <artifactId>superml-core</artifactId>
    <version>2.0.0</version>
</dependency>
<dependency>
    <groupId>org.superml</groupId>
    <artifactId>superml-linear-models</artifactId>
    <version>2.0.0</version>
</dependency>
<dependency>
    <groupId>org.superml</groupId>
    <artifactId>superml-preprocessing</artifactId>
    <version>2.0.0</version>
</dependency>
<dependency>
    <groupId>org.superml</groupId>
    <artifactId>superml-model-selection</artifactId>
    <version>2.0.0</version>
</dependency>

AutoML & Visualization

<!-- Add AutoML and professional visualization -->
<dependency>
    <groupId>org.superml</groupId>
    <artifactId>superml-autotrainer</artifactId>
    <version>2.0.0</version>
</dependency>
<dependency>
    <groupId>org.superml</groupId>
    <artifactId>superml-visualization</artifactId>
    <version>2.0.0</version>
</dependency>

Production Deployment

<!-- Add inference engine and model persistence -->
<dependency>
    <groupId>org.superml</groupId>
    <artifactId>superml-inference</artifactId>
    <version>2.0.0</version>
</dependency>
<dependency>
    <groupId>org.superml</groupId>
    <artifactId>superml-persistence</artifactId>
    <version>2.0.0</version>
</dependency>
<dependency>
    <groupId>org.superml</groupId>
    <artifactId>superml-drift</artifactId>
    <version>2.0.0</version>
</dependency>
<!-- Complete framework -->
<dependency>
    <groupId>org.superml</groupId>
    <artifactId>superml-bundle-all</artifactId>
    <version>2.0.0</version>
</dependency>

πŸŽ“ Next Steps

Learning Path

  1. Start Here: Run the AutoML example above
  2. Core Concepts: Try the pipeline example
  3. Advanced Features: Experiment with visualization
  4. Production: Explore inference and persistence
  5. Competitions: Try Kaggle integration
  6. Custom Solutions: Build your own ML applications

Essential Documentation

Code Examples

All code examples are available in the superml-examples module:

  • BasicClassification.java - Fundamental concepts
  • AutoMLExample.java - Automated machine learning
  • XChartVisualizationExample.java - Professional GUI charts
  • ProductionInferenceExample.java - High-performance serving
  • KaggleIntegrationExample.java - Competition workflows

Ready to build amazing ML applications with SuperML Java 2.0.0! πŸš€

Start with AutoML for instant results, then dive deeper into the modular architecture for custom solutions.