SuperML Java Performance Guide

This guide provides comprehensive performance optimization strategies for SuperML Java v3.1.2, helping you achieve maximum throughput and efficiency in production environments.

🎯 Performance Overview

SuperML Java v3.1.2 delivers exceptional performance across all algorithm categories:

Training Performance

Linear Models: 15% faster than v3.0.1, supporting 1M+ samples/minute
Tree Models: 10% memory reduction, enabling larger datasets
Transformers: 20% faster attention computation for NLP tasks
PMML Export: 50% faster model conversion for deployment

Prediction Performance

Real-time Inference: 400K+ predictions/second on standard hardware
Batch Processing: Optimized vectorization for bulk predictions
Memory Efficiency: 10% reduction in peak memory usage
Thread Safety: Concurrent predictions after model training

⚡ Algorithm-Specific Optimizations

Linear Models Performance

LinearRegression and LogisticRegression

import org.superml.linear_model.LinearRegression;
import org.superml.linear_model.LogisticRegression;

// Optimized for large datasets
LinearRegression model = new LinearRegression()
    .setNormalize(false)          // Skip if data already normalized
    .setFitIntercept(true);       // Use when needed

// Performance tip: Pre-normalize data for better performance
StandardScaler scaler = new StandardScaler();
double[][] X_scaled = scaler.fitTransform(X);
model.fit(X_scaled, y);  // 15% faster training

// Batch predictions for optimal throughput
double[] predictions = model.predict(X_test);  // Vectorized operations

Ridge and Lasso Regression

import org.superml.linear_model.Ridge;
import org.superml.linear_model.Lasso;

// Optimal alpha values for performance
Ridge ridge = new Ridge()
    .setAlpha(1.0)               // Start with 1.0, tune as needed
    .setMaxIter(1000)            // Increase for complex datasets
    .setTolerance(1e-4);         // Balance accuracy vs speed

// Lasso performance optimization
Lasso lasso = new Lasso()
    .setAlpha(0.1)               // Lower alpha for denser solutions
    .setMaxIter(1000)            // Coordinate descent iterations
    .setPositive(false);         // Use when coefficients can be negative

// Performance monitoring
long startTime = System.currentTimeMillis();
ridge.fit(X_train, y_train);
long trainingTime = System.currentTimeMillis() - startTime;
System.out.println("Training time: " + trainingTime + "ms");

Tree-Based Models Performance

Decision Trees

import org.superml.tree_models.DecisionTree;

// Optimized tree configuration
DecisionTree tree = new DecisionTree()
    .setMaxDepth(10)             // Balance between accuracy and overfitting
    .setMinSamplesLeaf(5)        // Reduce overfitting, improve generalization
    .setMinSamplesSplit(10)      // Speed up training on large datasets
    .setSplitter("best");        // vs "random" for different speed/accuracy tradeoffs

// Memory-efficient training for large datasets
tree.fit(X_train, y_train);

// Fast predictions with decision trees
double[] predictions = tree.predict(X_test);  // O(log n) per prediction

Random Forest

import org.superml.tree_models.RandomForest;

// Production-optimized Random Forest
RandomForest rf = new RandomForest()
    .setNumEstimators(100)       // Balance accuracy vs training time
    .setMaxDepth(15)             // Deeper trees for complex patterns
    .setMaxFeatures("sqrt")      // Feature sampling strategy
    .setMinSamplesLeaf(2)        // Reduce overfitting
    .setBootstrap(true)          // Enable bagging
    .setRandomState(42);         // Reproducible results

// Parallel training (utilizes multiple cores)
rf.fit(X_train, y_train);       // Automatically uses available cores

// Batch prediction optimization
double[] predictions = rf.predict(X_test);  // Parallelized tree voting

Transformer Models Performance

Memory-Optimized Transformer Training

import org.superml.transformers.TransformerEncoder;
import org.superml.transformers.config.TransformerConfig;

// Configure for optimal memory usage
TransformerConfig config = new TransformerConfig()
    .setModelDimension(512)      // Reduce if memory is limited
    .setNumLayers(6)             // Start with fewer layers
    .setNumAttentionHeads(8)     // Must divide model dimension evenly
    .setFeedForwardDimension(2048) // Typically 4x model dimension
    .setDropoutRate(0.1)         // Regularization
    .setMaxSequenceLength(512);  // Limit sequence length for memory

TransformerEncoder encoder = new TransformerEncoder(config);

// Performance monitoring for transformer training
MemoryProfiler profiler = new MemoryProfiler();
profiler.start();

encoder.train(tokenizedData);

MemoryReport report = profiler.stop();
System.out.println("Peak memory usage: " + report.getPeakUsage() + " MB");

Optimized Attention Computation

import org.superml.transformers.MultiHeadAttention;

// Configure attention for optimal performance
MultiHeadAttention attention = new MultiHeadAttention()
    .setModelDimension(512)
    .setNumHeads(8)              // 20% faster in v3.1.2
    .setDropoutRate(0.1);

// Batch processing for efficiency
String[][] batchTokens = prepareBatch(inputs, batchSize=32);
double[][][] attentionOutput = attention.forward(batchTokens);

🏗️ Pipeline Performance Optimization

Efficient Pipeline Construction

import org.superml.pipeline.Pipeline;
import org.superml.preprocessing.StandardScaler;
import org.superml.linear_model.LogisticRegression;

// Optimized pipeline with minimal overhead
Pipeline pipeline = new Pipeline()
    .addStep("scaler", new StandardScaler())
    .addStep("classifier", new LogisticRegression());

// Single fit call for entire pipeline
pipeline.fit(X_train, y_train);  // Optimized execution path

// Batch predictions through pipeline
double[] predictions = pipeline.predict(X_test);
double accuracy = pipeline.score(X_test, y_test);

Memory-Efficient Preprocessing

import org.superml.preprocessing.StandardScaler;
import org.superml.preprocessing.MinMaxScaler;

// Choose appropriate scaler for your data
StandardScaler scaler = new StandardScaler()
    .setWithMean(true)           // Center data (removes mean)
    .setWithStd(true);           // Scale to unit variance

// Fit and transform in separate steps for large datasets
scaler.fit(X_train);            // Learn parameters from training data
double[][] X_train_scaled = scaler.transform(X_train);
double[][] X_test_scaled = scaler.transform(X_test);

📊 Performance Monitoring and Profiling

Built-in Performance Monitoring

import org.superml.utils.PerformanceMonitor;

// Comprehensive performance monitoring
PerformanceMonitor monitor = new PerformanceMonitor();

// Monitor training performance
monitor.startTiming("model_training");
model.fit(X_train, y_train);
long trainingTime = monitor.stopTiming("model_training");

// Monitor prediction performance
monitor.startTiming("batch_prediction");
double[] predictions = model.predict(X_test);
long predictionTime = monitor.stopTiming("batch_prediction");

// Calculate throughput metrics
double predictionsPerSecond = X_test.length * 1000.0 / predictionTime;
System.out.println("Training time: " + trainingTime + "ms");
System.out.println("Prediction throughput: " + predictionsPerSecond + " predictions/sec");

Memory Usage Profiling

import org.superml.utils.MemoryProfiler;

// Track memory usage during training
MemoryProfiler profiler = new MemoryProfiler();
profiler.start();

// Train your model
RandomForest model = new RandomForest().setNumEstimators(100);
model.fit(X_train, y_train);

// Get memory statistics
MemoryReport report = profiler.stop();
System.out.println("Peak memory usage: " + report.getPeakUsage() + " MB");
System.out.println("Memory efficiency improvement in v3.1.2: ~10%");

🚀 Production Deployment Optimization

High-Throughput Inference Setup

import org.superml.inference.InferenceEngine;
import org.superml.inference.ModelCache;

// Configure inference engine for production
InferenceEngine engine = new InferenceEngine.Builder()
    .setCacheSize(1000)          // Cache frequently used models
    .setBatchSize(1000)          // Optimize batch predictions
    .setNumThreads(8)            // Parallel prediction threads
    .setTimeoutMs(5000)          // Request timeout
    .build();

// Load model into cache for fast access
engine.loadModel("production_model", trainedModel);

// High-throughput predictions
double[] predictions = engine.predict("production_model", X_batch);

Model Serving Optimization

import org.superml.serving.ModelServer;
import org.superml.serving.config.ServerConfig;

// Configure model server for optimal throughput
ServerConfig config = new ServerConfig()
    .setPort(8080)
    .setMaxConcurrentRequests(100)
    .setModelCacheSize(50)
    .setRequestTimeoutMs(3000)
    .setEnableMetrics(true);

ModelServer server = new ModelServer(config);
server.deployModel("classifier", trainedModel);
server.start();

// Server automatically handles:
// - Connection pooling
// - Request batching  
// - Model caching
// - Performance metrics

💾 PMML Export Performance

Optimized PMML Generation

import org.superml.pmml.PMMLConverter;
import org.superml.pmml.PMMLConfig;

// Configure PMML export for optimal performance (50% faster in v3.1.2)
PMMLConfig config = new PMMLConfig()
    .setValidateOutput(true)     // Validate generated PMML
    .setIncludeStatistics(false) // Skip for faster generation
    .setCompressOutput(true)     // Reduce file size
    .setPrecision(6);            // Balance precision vs size

PMMLConverter converter = new PMMLConverter(config);

// Benchmark PMML generation
long startTime = System.currentTimeMillis();
String pmml = converter.convertToXML(model, featureNames, targetName);
long exportTime = System.currentTimeMillis() - startTime;

System.out.println("PMML export time: " + exportTime + "ms");
System.out.println("PMML size: " + pmml.length() + " characters");
System.out.println("Performance improvement in v3.1.2: +50%");

📈 Benchmarking and Performance Testing

Standardized Benchmarking

import org.superml.benchmarks.MLBenchmark;
import org.superml.benchmarks.BenchmarkResult;

// Comprehensive performance benchmarking
MLBenchmark benchmark = new MLBenchmark()
    .addDataset("iris", Datasets.loadIris())
    .addDataset("boston", Datasets.loadBoston())
    .addAlgorithm("lr", new LinearRegression())
    .addAlgorithm("rf", new RandomForest())
    .setNumRuns(5)               // Average over multiple runs
    .setTimeoutMs(60000);        // 1 minute timeout per test

// Run comprehensive benchmarks
BenchmarkResult results = benchmark.run();

// Display performance metrics
for (String dataset : results.getDatasets()) {
    for (String algorithm : results.getAlgorithms()) {
        double trainingTime = results.getTrainingTime(dataset, algorithm);
        double predictionTime = results.getPredictionTime(dataset, algorithm);
        double accuracy = results.getAccuracy(dataset, algorithm);
        
        System.out.printf("%s on %s: Training=%.2fms, Prediction=%.2fms, Accuracy=%.3f%n",
                         algorithm, dataset, trainingTime, predictionTime, accuracy);
    }
}

Custom Performance Tests

import org.superml.testing.PerformanceTest;

// Create custom performance tests
PerformanceTest test = new PerformanceTest("LinearRegression Performance")
    .setDataSize(1000000)        // 1M samples
    .setFeatures(100)            // 100 features
    .setIterations(10)           // Average over 10 runs
    .setWarmupRuns(3);          // JVM warmup

// Define test scenario
test.define("linear_regression_training", () -> {
    LinearRegression model = new LinearRegression();
    model.fit(X_large, y_large);
    return model;
});

test.define("linear_regression_prediction", (model) -> {
    return model.predict(X_test_large);
});

// Execute performance test
TestResults results = test.execute();
results.printSummary();

// Expected improvements in v3.1.2:
// Training: +15% faster
// Prediction: +8% faster
// Memory: -10% usage

🔧 Advanced Configuration

JVM Optimization for SuperML

# Optimal JVM settings for SuperML Java applications
java -Xms2g -Xmx8g \
     -XX:+UseG1GC \
     -XX:G1HeapRegionSize=16m \
     -XX:+UseStringDeduplication \
     -XX:+OptimizeStringConcat \
     -XX:+UseCompressedOops \
     -XX:NewRatio=3 \
     -XX:SurvivorRatio=6 \
     -Dsuperml.parallel.threads=8 \
     -Dsuperml.cache.size=1000 \
     MyMLApplication

System-Level Optimizations

// Configure system properties for optimal performance
System.setProperty("superml.parallel.enabled", "true");
System.setProperty("superml.parallel.threads", String.valueOf(Runtime.getRuntime().availableProcessors()));
System.setProperty("superml.cache.models", "true");
System.setProperty("superml.cache.predictions", "true");
System.setProperty("superml.optimize.memory", "true");

// Verify configuration
SuperMLConfig config = SuperMLConfig.getInstance();
System.out.println("Parallel threads: " + config.getParallelThreads());
System.out.println("Model caching: " + config.isModelCachingEnabled());

📊 Real-World Performance Benchmarks

Hardware Configuration

CPU: Intel i7-10700K (8 cores, 16 threads)
Memory: 32GB DDR4 3200MHz
Storage: NVMe SSD
Java: OpenJDK 11.0.8

Dataset Specifications

Small: 1K samples, 10 features
Medium: 100K samples, 50 features
Large: 1M samples, 100 features
XLarge: 10M samples, 1000 features

Training Performance Results (v3.1.2)

| Algorithm | Small | Medium | Large | XLarge | |———–|——-|——–|——-|———| | LinearRegression | 2ms | 45ms | 2.0s | 42s | | LogisticRegression | 3ms | 62ms | 2.7s | 58s | | DecisionTree | 5ms | 180ms | 8.2s | 165s | | RandomForest | 25ms | 890ms | 11.2s | 280s | | TransformerEncoder | 150ms | 5.2s | 36.1s | 12min |

Prediction Throughput (predictions/second)

| Algorithm | Small | Medium | Large | XLarge | |———–|——-|——–|——-|———| | LinearRegression | 450K | 420K | 400K | 380K | | LogisticRegression | 380K | 350K | 325K | 300K | | DecisionTree | 280K | 260K | 240K | 220K | | RandomForest | 85K | 78K | 72K | 65K | | TransformerEncoder | 12K | 10K | 8K | 6K |

Memory Usage (Peak)

| Algorithm | Small | Medium | Large | XLarge | |———–|——-|——–|——-|———| | LinearRegression | 50MB | 200MB | 800MB | 8GB | | LogisticRegression | 55MB | 220MB | 900MB | 9GB | | DecisionTree | 80MB | 450MB | 1.1GB | 12GB | | RandomForest | 200MB | 1.2GB | 3.1GB | 35GB | | TransformerEncoder | 300MB | 1.8GB | 2.5GB | 28GB |

🎯 Performance Best Practices

1. Data Preparation

✅ Normalize data before training for 15% speed improvement
✅ Use appropriate data types (double vs float) for memory efficiency
✅ Remove correlated features to reduce computation overhead
✅ Batch process predictions instead of single-sample calls

2. Algorithm Selection

✅ Linear models for high-speed, interpretable results
✅ Tree models for complex patterns with moderate speed requirements
✅ Transformers for NLP tasks requiring state-of-the-art accuracy
✅ Consider ensemble methods for optimal accuracy-speed balance

3. Memory Management

✅ Monitor peak memory usage during training and prediction
✅ Use streaming for very large datasets that don’t fit in memory
✅ Clear unused models from memory in production environments
✅ Configure JVM heap size appropriately for your workload

4. Production Deployment

✅ Use model caching for frequently accessed models
✅ Implement connection pooling for high-concurrency scenarios
✅ Monitor performance metrics continuously in production
✅ Set appropriate timeouts to prevent resource exhaustion

Release Notes v3.1.2 - Performance improvements details
Memory Management Guide - Advanced memory optimization
Production Deployment Guide - Production best practices
API Reference - Complete API documentation
Examples - Performance-optimized code examples

For optimal performance, always use the latest SuperML Java version and follow these guidelines for your specific use case.

Performance Guide