Skip to main content

Performance Guide

Optimize SuperML Java performance for production workloads

SuperML Java Performance Guide

Version Performance

This guide provides comprehensive performance optimization strategies for SuperML Java v3.1.2, helping you achieve maximum throughput and efficiency in production environments.

🎯 Performance Overview

SuperML Java v3.1.2 delivers exceptional performance across all algorithm categories:

Training Performance

  • Linear Models: 15% faster than v3.0.1, supporting 1M+ samples/minute
  • Tree Models: 10% memory reduction, enabling larger datasets
  • Transformers: 20% faster attention computation for NLP tasks
  • PMML Export: 50% faster model conversion for deployment

Prediction Performance

  • Real-time Inference: 400K+ predictions/second on standard hardware
  • Batch Processing: Optimized vectorization for bulk predictions
  • Memory Efficiency: 10% reduction in peak memory usage
  • Thread Safety: Concurrent predictions after model training

⚑ Algorithm-Specific Optimizations

Linear Models Performance

LinearRegression and LogisticRegression

import org.superml.linear_model.LinearRegression;
import org.superml.linear_model.LogisticRegression;

// Optimized for large datasets
LinearRegression model = new LinearRegression()
    .setNormalize(false)          // Skip if data already normalized
    .setFitIntercept(true);       // Use when needed

// Performance tip: Pre-normalize data for better performance
StandardScaler scaler = new StandardScaler();
double[][] X_scaled = scaler.fitTransform(X);
model.fit(X_scaled, y);  // 15% faster training

// Batch predictions for optimal throughput
double[] predictions = model.predict(X_test);  // Vectorized operations

Ridge and Lasso Regression

import org.superml.linear_model.Ridge;
import org.superml.linear_model.Lasso;

// Optimal alpha values for performance
Ridge ridge = new Ridge()
    .setAlpha(1.0)               // Start with 1.0, tune as needed
    .setMaxIter(1000)            // Increase for complex datasets
    .setTolerance(1e-4);         // Balance accuracy vs speed

// Lasso performance optimization
Lasso lasso = new Lasso()
    .setAlpha(0.1)               // Lower alpha for denser solutions
    .setMaxIter(1000)            // Coordinate descent iterations
    .setPositive(false);         // Use when coefficients can be negative

// Performance monitoring
long startTime = System.currentTimeMillis();
ridge.fit(X_train, y_train);
long trainingTime = System.currentTimeMillis() - startTime;
System.out.println("Training time: " + trainingTime + "ms");

Tree-Based Models Performance

Decision Trees

import org.superml.tree_models.DecisionTree;

// Optimized tree configuration
DecisionTree tree = new DecisionTree()
    .setMaxDepth(10)             // Balance between accuracy and overfitting
    .setMinSamplesLeaf(5)        // Reduce overfitting, improve generalization
    .setMinSamplesSplit(10)      // Speed up training on large datasets
    .setSplitter("best");        // vs "random" for different speed/accuracy tradeoffs

// Memory-efficient training for large datasets
tree.fit(X_train, y_train);

// Fast predictions with decision trees
double[] predictions = tree.predict(X_test);  // O(log n) per prediction

Random Forest

import org.superml.tree_models.RandomForest;

// Production-optimized Random Forest
RandomForest rf = new RandomForest()
    .setNumEstimators(100)       // Balance accuracy vs training time
    .setMaxDepth(15)             // Deeper trees for complex patterns
    .setMaxFeatures("sqrt")      // Feature sampling strategy
    .setMinSamplesLeaf(2)        // Reduce overfitting
    .setBootstrap(true)          // Enable bagging
    .setRandomState(42);         // Reproducible results

// Parallel training (utilizes multiple cores)
rf.fit(X_train, y_train);       // Automatically uses available cores

// Batch prediction optimization
double[] predictions = rf.predict(X_test);  // Parallelized tree voting

Transformer Models Performance

Memory-Optimized Transformer Training

import org.superml.transformers.TransformerEncoder;
import org.superml.transformers.config.TransformerConfig;

// Configure for optimal memory usage
TransformerConfig config = new TransformerConfig()
    .setModelDimension(512)      // Reduce if memory is limited
    .setNumLayers(6)             // Start with fewer layers
    .setNumAttentionHeads(8)     // Must divide model dimension evenly
    .setFeedForwardDimension(2048) // Typically 4x model dimension
    .setDropoutRate(0.1)         // Regularization
    .setMaxSequenceLength(512);  // Limit sequence length for memory

TransformerEncoder encoder = new TransformerEncoder(config);

// Performance monitoring for transformer training
MemoryProfiler profiler = new MemoryProfiler();
profiler.start();

encoder.train(tokenizedData);

MemoryReport report = profiler.stop();
System.out.println("Peak memory usage: " + report.getPeakUsage() + " MB");

Optimized Attention Computation

import org.superml.transformers.MultiHeadAttention;

// Configure attention for optimal performance
MultiHeadAttention attention = new MultiHeadAttention()
    .setModelDimension(512)
    .setNumHeads(8)              // 20% faster in v3.1.2
    .setDropoutRate(0.1);

// Batch processing for efficiency
String[][] batchTokens = prepareBatch(inputs, batchSize=32);
double[][][] attentionOutput = attention.forward(batchTokens);

πŸ—οΈ Pipeline Performance Optimization

Efficient Pipeline Construction

import org.superml.pipeline.Pipeline;
import org.superml.preprocessing.StandardScaler;
import org.superml.linear_model.LogisticRegression;

// Optimized pipeline with minimal overhead
Pipeline pipeline = new Pipeline()
    .addStep("scaler", new StandardScaler())
    .addStep("classifier", new LogisticRegression());

// Single fit call for entire pipeline
pipeline.fit(X_train, y_train);  // Optimized execution path

// Batch predictions through pipeline
double[] predictions = pipeline.predict(X_test);
double accuracy = pipeline.score(X_test, y_test);

Memory-Efficient Preprocessing

import org.superml.preprocessing.StandardScaler;
import org.superml.preprocessing.MinMaxScaler;

// Choose appropriate scaler for your data
StandardScaler scaler = new StandardScaler()
    .setWithMean(true)           // Center data (removes mean)
    .setWithStd(true);           // Scale to unit variance

// Fit and transform in separate steps for large datasets
scaler.fit(X_train);            // Learn parameters from training data
double[][] X_train_scaled = scaler.transform(X_train);
double[][] X_test_scaled = scaler.transform(X_test);

πŸ“Š Performance Monitoring and Profiling

Built-in Performance Monitoring

import org.superml.utils.PerformanceMonitor;

// Comprehensive performance monitoring
PerformanceMonitor monitor = new PerformanceMonitor();

// Monitor training performance
monitor.startTiming("model_training");
model.fit(X_train, y_train);
long trainingTime = monitor.stopTiming("model_training");

// Monitor prediction performance
monitor.startTiming("batch_prediction");
double[] predictions = model.predict(X_test);
long predictionTime = monitor.stopTiming("batch_prediction");

// Calculate throughput metrics
double predictionsPerSecond = X_test.length * 1000.0 / predictionTime;
System.out.println("Training time: " + trainingTime + "ms");
System.out.println("Prediction throughput: " + predictionsPerSecond + " predictions/sec");

Memory Usage Profiling

import org.superml.utils.MemoryProfiler;

// Track memory usage during training
MemoryProfiler profiler = new MemoryProfiler();
profiler.start();

// Train your model
RandomForest model = new RandomForest().setNumEstimators(100);
model.fit(X_train, y_train);

// Get memory statistics
MemoryReport report = profiler.stop();
System.out.println("Peak memory usage: " + report.getPeakUsage() + " MB");
System.out.println("Memory efficiency improvement in v3.1.2: ~10%");

πŸš€ Production Deployment Optimization

High-Throughput Inference Setup

import org.superml.inference.InferenceEngine;
import org.superml.inference.ModelCache;

// Configure inference engine for production
InferenceEngine engine = new InferenceEngine.Builder()
    .setCacheSize(1000)          // Cache frequently used models
    .setBatchSize(1000)          // Optimize batch predictions
    .setNumThreads(8)            // Parallel prediction threads
    .setTimeoutMs(5000)          // Request timeout
    .build();

// Load model into cache for fast access
engine.loadModel("production_model", trainedModel);

// High-throughput predictions
double[] predictions = engine.predict("production_model", X_batch);

Model Serving Optimization

import org.superml.serving.ModelServer;
import org.superml.serving.config.ServerConfig;

// Configure model server for optimal throughput
ServerConfig config = new ServerConfig()
    .setPort(8080)
    .setMaxConcurrentRequests(100)
    .setModelCacheSize(50)
    .setRequestTimeoutMs(3000)
    .setEnableMetrics(true);

ModelServer server = new ModelServer(config);
server.deployModel("classifier", trainedModel);
server.start();

// Server automatically handles:
// - Connection pooling
// - Request batching  
// - Model caching
// - Performance metrics

πŸ’Ύ PMML Export Performance

Optimized PMML Generation

import org.superml.pmml.PMMLConverter;
import org.superml.pmml.PMMLConfig;

// Configure PMML export for optimal performance (50% faster in v3.1.2)
PMMLConfig config = new PMMLConfig()
    .setValidateOutput(true)     // Validate generated PMML
    .setIncludeStatistics(false) // Skip for faster generation
    .setCompressOutput(true)     // Reduce file size
    .setPrecision(6);            // Balance precision vs size

PMMLConverter converter = new PMMLConverter(config);

// Benchmark PMML generation
long startTime = System.currentTimeMillis();
String pmml = converter.convertToXML(model, featureNames, targetName);
long exportTime = System.currentTimeMillis() - startTime;

System.out.println("PMML export time: " + exportTime + "ms");
System.out.println("PMML size: " + pmml.length() + " characters");
System.out.println("Performance improvement in v3.1.2: +50%");

πŸ“ˆ Benchmarking and Performance Testing

Standardized Benchmarking

import org.superml.benchmarks.MLBenchmark;
import org.superml.benchmarks.BenchmarkResult;

// Comprehensive performance benchmarking
MLBenchmark benchmark = new MLBenchmark()
    .addDataset("iris", Datasets.loadIris())
    .addDataset("boston", Datasets.loadBoston())
    .addAlgorithm("lr", new LinearRegression())
    .addAlgorithm("rf", new RandomForest())
    .setNumRuns(5)               // Average over multiple runs
    .setTimeoutMs(60000);        // 1 minute timeout per test

// Run comprehensive benchmarks
BenchmarkResult results = benchmark.run();

// Display performance metrics
for (String dataset : results.getDatasets()) {
    for (String algorithm : results.getAlgorithms()) {
        double trainingTime = results.getTrainingTime(dataset, algorithm);
        double predictionTime = results.getPredictionTime(dataset, algorithm);
        double accuracy = results.getAccuracy(dataset, algorithm);
        
        System.out.printf("%s on %s: Training=%.2fms, Prediction=%.2fms, Accuracy=%.3f%n",
                         algorithm, dataset, trainingTime, predictionTime, accuracy);
    }
}

Custom Performance Tests

import org.superml.testing.PerformanceTest;

// Create custom performance tests
PerformanceTest test = new PerformanceTest("LinearRegression Performance")
    .setDataSize(1000000)        // 1M samples
    .setFeatures(100)            // 100 features
    .setIterations(10)           // Average over 10 runs
    .setWarmupRuns(3);          // JVM warmup

// Define test scenario
test.define("linear_regression_training", () -> {
    LinearRegression model = new LinearRegression();
    model.fit(X_large, y_large);
    return model;
});

test.define("linear_regression_prediction", (model) -> {
    return model.predict(X_test_large);
});

// Execute performance test
TestResults results = test.execute();
results.printSummary();

// Expected improvements in v3.1.2:
// Training: +15% faster
// Prediction: +8% faster
// Memory: -10% usage

πŸ”§ Advanced Configuration

JVM Optimization for SuperML

# Optimal JVM settings for SuperML Java applications
java -Xms2g -Xmx8g \
     -XX:+UseG1GC \
     -XX:G1HeapRegionSize=16m \
     -XX:+UseStringDeduplication \
     -XX:+OptimizeStringConcat \
     -XX:+UseCompressedOops \
     -XX:NewRatio=3 \
     -XX:SurvivorRatio=6 \
     -Dsuperml.parallel.threads=8 \
     -Dsuperml.cache.size=1000 \
     MyMLApplication

System-Level Optimizations

// Configure system properties for optimal performance
System.setProperty("superml.parallel.enabled", "true");
System.setProperty("superml.parallel.threads", String.valueOf(Runtime.getRuntime().availableProcessors()));
System.setProperty("superml.cache.models", "true");
System.setProperty("superml.cache.predictions", "true");
System.setProperty("superml.optimize.memory", "true");

// Verify configuration
SuperMLConfig config = SuperMLConfig.getInstance();
System.out.println("Parallel threads: " + config.getParallelThreads());
System.out.println("Model caching: " + config.isModelCachingEnabled());

πŸ“Š Real-World Performance Benchmarks

Hardware Configuration

  • CPU: Intel i7-10700K (8 cores, 16 threads)
  • Memory: 32GB DDR4 3200MHz
  • Storage: NVMe SSD
  • Java: OpenJDK 11.0.8

Dataset Specifications

  • Small: 1K samples, 10 features
  • Medium: 100K samples, 50 features
  • Large: 1M samples, 100 features
  • XLarge: 10M samples, 1000 features

Training Performance Results (v3.1.2)

| Algorithm | Small | Medium | Large | XLarge | |———–|β€”β€”-|——–|β€”β€”-|β€”β€”β€”| | LinearRegression | 2ms | 45ms | 2.0s | 42s | | LogisticRegression | 3ms | 62ms | 2.7s | 58s | | DecisionTree | 5ms | 180ms | 8.2s | 165s | | RandomForest | 25ms | 890ms | 11.2s | 280s | | TransformerEncoder | 150ms | 5.2s | 36.1s | 12min |

Prediction Throughput (predictions/second)

| Algorithm | Small | Medium | Large | XLarge | |———–|β€”β€”-|——–|β€”β€”-|β€”β€”β€”| | LinearRegression | 450K | 420K | 400K | 380K | | LogisticRegression | 380K | 350K | 325K | 300K | | DecisionTree | 280K | 260K | 240K | 220K | | RandomForest | 85K | 78K | 72K | 65K | | TransformerEncoder | 12K | 10K | 8K | 6K |

Memory Usage (Peak)

| Algorithm | Small | Medium | Large | XLarge | |———–|β€”β€”-|——–|β€”β€”-|β€”β€”β€”| | LinearRegression | 50MB | 200MB | 800MB | 8GB | | LogisticRegression | 55MB | 220MB | 900MB | 9GB | | DecisionTree | 80MB | 450MB | 1.1GB | 12GB | | RandomForest | 200MB | 1.2GB | 3.1GB | 35GB | | TransformerEncoder | 300MB | 1.8GB | 2.5GB | 28GB |

🎯 Performance Best Practices

1. Data Preparation

  • βœ… Normalize data before training for 15% speed improvement
  • βœ… Use appropriate data types (double vs float) for memory efficiency
  • βœ… Remove correlated features to reduce computation overhead
  • βœ… Batch process predictions instead of single-sample calls

2. Algorithm Selection

  • βœ… Linear models for high-speed, interpretable results
  • βœ… Tree models for complex patterns with moderate speed requirements
  • βœ… Transformers for NLP tasks requiring state-of-the-art accuracy
  • βœ… Consider ensemble methods for optimal accuracy-speed balance

3. Memory Management

  • βœ… Monitor peak memory usage during training and prediction
  • βœ… Use streaming for very large datasets that don’t fit in memory
  • βœ… Clear unused models from memory in production environments
  • βœ… Configure JVM heap size appropriately for your workload

4. Production Deployment

  • βœ… Use model caching for frequently accessed models
  • βœ… Implement connection pooling for high-concurrency scenarios
  • βœ… Monitor performance metrics continuously in production
  • βœ… Set appropriate timeouts to prevent resource exhaustion

For optimal performance, always use the latest SuperML Java version and follow these guidelines for your specific use case.