Python Machine Learning Frameworks: Scikit-learn, TensorFlow, and PyTorch Compared

If you ask 10 ML engineers which framework to use, you’ll get 12 opinions. I’ve shipped production code in all the major ones, and I’m going to give you the unfiltered truth about each.

Spoiler: there’s no “best” framework. There’s only “best for your situation.” Let me help you figure out yours.

Series Progress: Part 1: Foundations → Part 2: Types of ML → Part 3: Python Frameworks (You are here) → Part 4: MLOps → Part 5: Enterprise Apps

Python ML Frameworks Comparison - Scikit-learn, TensorFlow, PyTorch, XGBoost — Figure 1: Comparing Python’s major ML frameworks

The Big Three (And When I Actually Use Each)

Framework	Best For	My Honest Take
Scikit-learn	Tabular data, quick prototypes, production classical ML	My default. If it works here, stop.
TensorFlow	Large-scale deep learning, mobile/edge, Google shops	Great ecosystem, verbose code
PyTorch	Research, custom architectures, NLP/vision	Most Pythonic, debugger-friendly

Scikit-learn: The One You Should Start With

I don’t care how hyped deep learning is—80% of production ML is still tabular data solved with classical algorithms. And for that, scikit-learn is unbeatable.

Why I Love It

Consistent API: Every model has fit(), predict(), score(). Learn once, use everywhere.
Batteries included: Preprocessing, model selection, metrics, pipelines—all there.
Battle-tested: The algorithms work. They’re not bleeding edge, but they’re reliable.

Production Pipeline Example

# production_pipeline.py
# This pattern has served me well across dozens of projects

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV
import joblib

# Define column groups
numeric_cols = ['age', 'income', 'credit_score']
categorical_cols = ['employment_type', 'region', 'product_type']

# Preprocessing for numeric: fill missing, then scale
numeric_transformer = Pipeline([
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

# Preprocessing for categorical: fill missing, then encode
categorical_transformer = Pipeline([
    ('imputer', SimpleImputer(strategy='constant', fill_value='_missing_')),
    ('encoder', OneHotEncoder(handle_unknown='ignore', sparse_output=False))
])

# Combine preprocessing
preprocessor = ColumnTransformer([
    ('num', numeric_transformer, numeric_cols),
    ('cat', categorical_transformer, categorical_cols)
])

# Full pipeline: preprocessing + model
pipeline = Pipeline([
    ('prep', preprocessor),
    ('clf', GradientBoostingClassifier(random_state=42))
])

# Hyperparameter tuning
param_grid = {
    'clf__n_estimators': [100, 200],
    'clf__max_depth': [3, 5, 7],
    'clf__learning_rate': [0.05, 0.1]
}

search = GridSearchCV(pipeline, param_grid, cv=5, scoring='roc_auc', n_jobs=-1)

# Fit and save
# search.fit(X_train, y_train)
# joblib.dump(search.best_estimator_, 'model_v1.joblib')

# Loading in production:
# model = joblib.load('model_v1.joblib')
# predictions = model.predict(new_data)  # handles preprocessing!

TensorFlow: The Enterprise Deep Learning Tank

TensorFlow 1.x was painful. I’ll say it. Session graphs, tf.placeholder, cryptic error messages—it was a rite of passage.

TensorFlow 2.x with Keras is… actually pleasant. They fixed a lot.

Medical Imaging Example

# medical_imaging_classifier.py
# Transfer learning for X-ray classification

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

def build_model(input_shape=(224, 224, 3), num_classes=2):
    """
    Build a transfer learning model for medical imaging.
    EfficientNet base - good accuracy/size tradeoff.
    """
    
    # Data augmentation - CRITICAL for medical imaging
    augmentation = keras.Sequential([
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.1),
        layers.RandomContrast(0.1),
    ], name='augmentation')
    
    # Pre-trained base - freeze initially
    base = keras.applications.EfficientNetB0(
        weights='imagenet',
        include_top=False,
        input_shape=input_shape
    )
    base.trainable = False
    
    # Build the full model
    inputs = keras.Input(shape=input_shape)
    x = augmentation(inputs)
    x = keras.applications.efficientnet.preprocess_input(x)
    x = base(x, training=False)
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dropout(0.3)(x)
    x = layers.Dense(256, activation='relu')(x)
    x = layers.Dropout(0.2)(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)
    
    model = keras.Model(inputs, outputs)
    
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=1e-4),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy', keras.metrics.AUC(name='auc')]
    )
    
    return model

# Training with proper callbacks
callbacks = [
    keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True),
    keras.callbacks.ModelCheckpoint('best_model.keras', save_best_only=True),
    keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=3),
]

model = build_model()
print(model.summary())

PyTorch: The Researcher’s Choice (That’s Production Ready Now)

PyTorch used to be “the research framework.” You’d prototype in PyTorch, then rewrite in TensorFlow for production. That’s changed.

With PyTorch 2.0 and TorchServe, it’s now a first-class production option.

Financial Time Series Example

# financial_lstm.py
# LSTM for time series prediction

import torch
import torch.nn as nn

class FinancialLSTM(nn.Module):
    """LSTM with attention for financial time series."""
    
    def __init__(self, input_size, hidden_size=128, num_layers=2, dropout=0.2):
        super().__init__()
        
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            dropout=dropout if num_layers > 1 else 0
        )
        
        # Simple attention mechanism
        self.attention = nn.Sequential(
            nn.Linear(hidden_size, hidden_size),
            nn.Tanh(),
            nn.Linear(hidden_size, 1),
            nn.Softmax(dim=1)
        )
        
        self.fc = nn.Sequential(
            nn.Linear(hidden_size, 64),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(64, 1)
        )
    
    def forward(self, x):
        lstm_out, _ = self.lstm(x)
        weights = self.attention(lstm_out)
        context = torch.sum(weights * lstm_out, dim=1)
        return self.fc(context)

# Training loop
def train_epoch(model, dataloader, optimizer, criterion, device):
    model.train()
    total_loss = 0
    
    for X_batch, y_batch in dataloader:
        X_batch, y_batch = X_batch.to(device), y_batch.to(device)
        
        optimizer.zero_grad()
        predictions = model(X_batch)
        loss = criterion(predictions, y_batch)
        loss.backward()
        
        # Gradient clipping - important for LSTMs
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        optimizer.step()
        total_loss += loss.item()
    
    return total_loss / len(dataloader)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = FinancialLSTM(input_size=10).to(device)
print(f"Parameters: {sum(p.numel() for p in model.parameters()):,}")

What About XGBoost/LightGBM?

For tabular data competitions, gradient boosting libraries often beat everything else:

# These are my go-to for tabular classification/regression
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier

# XGBoost: More mature, slightly better accuracy often
xgb = XGBClassifier(
    n_estimators=200,
    max_depth=6,
    learning_rate=0.1,
    n_jobs=-1,
    random_state=42
)

# LightGBM: Faster training, handles categoricals natively
lgbm = LGBMClassifier(
    n_estimators=200,
    max_depth=6,
    learning_rate=0.1,
    n_jobs=-1,
    random_state=42,
    verbose=-1
)

# Both work in sklearn pipelines

The Decision Matrix

Start with Scikit-learn if:

Your data fits in memory and is tabular

You’re prototyping

Interpretability matters

Move to TensorFlow if:

You need mobile/edge deployment

Your infrastructure is GCP-heavy

You need the TFX ecosystem

Choose PyTorch if:

You’re doing NLP with Hugging Face

You’re implementing custom architectures

Research-to-production speed matters

Hard-Won Lessons

Don’t use deep learning for tabular data unless you have millions of rows.

Framework lock-in is real. ONNX helps with portability.

Version everything. Python, libraries, CUDA.

GPUs are expensive. Most production models run on CPU.

What’s Next

In Part 4, we’ll dig into MLOps—experiment tracking, model versioning, CI/CD, monitoring, and all the things that make ML actually work at scale.

References & Further Reading

Scikit-learn User Guide – scikit-learn.org
TensorFlow Documentation – tensorflow.org
PyTorch Tutorials – pytorch.org
XGBoost Documentation – xgboost.readthedocs.io
Hugging Face Transformers – huggingface.co
ONNX Runtime – onnxruntime.ai – Cross-framework deployment
MLflow – mlflow.org – Framework-agnostic ML lifecycle

What framework do you use and why? Hit me up on GitHub or leave a comment.

Discover more from C4: Container, Code, Cloud & Context

Subscribe to get the latest posts sent to your email.

Searching in