🔗 GraphFeatureAggregation

🟡 Intermediate ✅ Stable 🔥 Popular

🎯 Overview

The GraphFeatureAggregation layer treats each input feature as a node in a graph and uses self-attention mechanisms to learn relationships between features. It projects features into an embedding space, computes pairwise attention scores, and aggregates feature information based on these scores.

This layer is particularly powerful for tabular data where features have inherent relationships, providing a way to learn and exploit these relationships automatically without manual feature engineering.

🔍 How It Works

The GraphFeatureAggregation processes data through a graph-based transformation:

Feature Embedding: Projects each scalar feature to an embedding
Pairwise Scoring: Computes pairwise concatenated embeddings and scores them
Attention Matrix: Normalizes scores with softmax to create dynamic adjacency matrix
Feature Aggregation: Aggregates neighboring features via weighted sum
Output Projection: Projects back to original dimension with residual connection

graph TD
    A[Input Features] --> B[Feature Embedding]
    B --> C[Pairwise Scoring]
    C --> D[Attention Matrix]
    D --> E[Feature Aggregation]
    E --> F[Output Projection]
    A --> G[Residual Connection]
    F --> G
    G --> H[Transformed Features]

    style A fill:#e6f3ff,stroke:#4a86e8
    style H fill:#e8f5e9,stroke:#66bb6a
    style B fill:#fff9e6,stroke:#ffb74d
    style C fill:#f3e5f5,stroke:#9c27b0
    style D fill:#e1f5fe,stroke:#03a9f4
    style E fill:#fff3e0,stroke:#ff9800

💡 Why Use This Layer?

Challenge	Traditional Approach	GraphFeatureAggregation's Solution
Feature Relationships	Manual feature engineering	🎯 Automatic learning of feature relationships
Graph Structure	No graph structure	⚡ Graph-based feature processing
Attention Mechanisms	No attention	🧠 Self-attention for feature interactions
Dynamic Adjacency	Static relationships	🔗 Dynamic adjacency matrix learning

📊 Use Cases

Tabular Data: Learning feature relationships in tabular data
Graph Neural Networks: Graph-based processing for tabular data
Feature Engineering: Automatic feature relationship learning
Attention Mechanisms: Self-attention for feature interactions
Dynamic Relationships: Learning dynamic feature relationships

🚀 Quick Start

Basic Usage

import keras
from kerasfactory.layers import GraphFeatureAggregation

# Create sample input data
batch_size, num_features = 32, 10
x = keras.random.normal((batch_size, num_features))

# Apply graph feature aggregation
graph_layer = GraphFeatureAggregation(embed_dim=8, dropout_rate=0.1)
output = graph_layer(x, training=True)

print(f"Input shape: {x.shape}")           # (32, 10)
print(f"Output shape: {output.shape}")     # (32, 10)

In a Sequential Model

import keras
from kerasfactory.layers import GraphFeatureAggregation

model = keras.Sequential([
    keras.layers.Dense(32, activation='relu'),
    GraphFeatureAggregation(embed_dim=16, dropout_rate=0.1),
    keras.layers.Dense(16, activation='relu'),
    GraphFeatureAggregation(embed_dim=8, dropout_rate=0.1),
    keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In a Functional Model

import keras
from kerasfactory.layers import GraphFeatureAggregation

# Define inputs
inputs = keras.Input(shape=(20,))  # 20 features

# Apply graph feature aggregation
x = GraphFeatureAggregation(embed_dim=16, dropout_rate=0.1)(inputs)

# Continue processing
x = keras.layers.Dense(32, activation='relu')(x)
x = GraphFeatureAggregation(embed_dim=16, dropout_rate=0.1)(x)
x = keras.layers.Dense(16, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs, outputs)

Advanced Configuration

# Advanced configuration with multiple graph layers
def create_graph_network():
    inputs = keras.Input(shape=(25,))  # 25 features

    # Multiple graph layers with different configurations
    x = GraphFeatureAggregation(
        embed_dim=16,
        dropout_rate=0.1,
        leaky_relu_alpha=0.2
    )(inputs)

    x = keras.layers.Dense(32, activation='relu')(x)
    x = keras.layers.BatchNormalization()(x)

    x = GraphFeatureAggregation(
        embed_dim=12,
        dropout_rate=0.1,
        leaky_relu_alpha=0.2
    )(x)

    x = keras.layers.Dense(24, activation='relu')(x)
    x = keras.layers.Dropout(0.2)(x)

    # Multi-task output
    classification = keras.layers.Dense(3, activation='softmax', name='classification')(x)
    regression = keras.layers.Dense(1, name='regression')(x)

    return keras.Model(inputs, [classification, regression])

model = create_graph_network()
model.compile(
    optimizer='adam',
    loss={'classification': 'categorical_crossentropy', 'regression': 'mse'},
    loss_weights={'classification': 1.0, 'regression': 0.5}
)

📖 API Reference

kerasfactory.layers.GraphFeatureAggregation

This module implements a GraphFeatureAggregation layer that treats features as nodes in a graph and uses attention mechanisms to learn relationships between features. This approach is useful for tabular data where features have inherent relationships.

Classes

GraphFeatureAggregation

GraphFeatureAggregation(
    embed_dim: int = 8,
    dropout_rate: float = 0.0,
    leaky_relu_alpha: float = 0.2,
    name: str | None = None,
    **kwargs: Any
)

Graph-based feature aggregation layer with self-attention for tabular data.

This layer treats each input feature as a node and projects it into an embedding space. It then computes pairwise attention scores between features and aggregates feature information based on these scores. Finally, it projects the aggregated features back to the original feature space and adds a residual connection.

The process involves

Projecting each scalar feature to an embedding (shape: [batch, num_features, embed_dim]).
Computing pairwise concatenated embeddings and scoring them via a learnable attention vector.
Normalizing the scores with softmax to yield a dynamic adjacency (attention) matrix.
Aggregating neighboring features via weighted sum.
Projecting back to a vector of original dimension, then adding a residual connection.

Parameters:

Name	Type	Description	Default
`embed_dim`	`int`	Dimensionality of the projected feature embeddings. Default is 8.	`8`
`dropout_rate`	`float`	Dropout rate to apply on attention weights. Default is 0.0.	`0.0`
`leaky_relu_alpha`	`float`	Alpha parameter for the LeakyReLU activation. Default is 0.2.	`0.2`
`name`	`str \| None`	Optional name for the layer.	`None`

Input shape

2D tensor with shape: (batch_size, num_features)

Output shape

2D tensor with shape: (batch_size, num_features) (same as input)

Example

import keras
from kerasfactory.layers import GraphFeatureAggregation

# Tabular data with 10 features
x = keras.random.normal((32, 10))

# Create the layer with an embedding dimension of 8 and dropout rate of 0.1
graph_layer = GraphFeatureAggregation(embed_dim=8, dropout_rate=0.1)
y = graph_layer(x, training=True)
print("Output shape:", y.shape)  # Expected: (32, 10)

Initialize the GraphFeatureAggregation layer.

Parameters:

Name	Type	Description	Default
`embed_dim`	`int`	Embedding dimension.	`8`
`dropout_rate`	`float`	Dropout rate.	`0.0`
`leaky_relu_alpha`	`float`	Alpha parameter for LeakyReLU.	`0.2`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Source code in kerasfactory/layers/GraphFeatureAggregation.py

def __init__(
    self,
    embed_dim: int = 8,
    dropout_rate: float = 0.0,
    leaky_relu_alpha: float = 0.2,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the GraphFeatureAggregation layer.

    Args:
        embed_dim: Embedding dimension.
        dropout_rate: Dropout rate.
        leaky_relu_alpha: Alpha parameter for LeakyReLU.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set public attributes
    self.embed_dim = embed_dim
    self.dropout_rate = dropout_rate
    self.leaky_relu_alpha = leaky_relu_alpha

    # Initialize instance variables
    self.num_features: int | None = None
    self.projection: layers.Dense | None = None
    self.attention_a: layers.Dense | None = None
    self.attention_bias: layers.Dense | None = None
    self.leaky_relu: layers.LeakyReLU | None = None
    self.dropout_layer: layers.Dropout | None = None
    self.out_proj: layers.Dense | None = None

    # Validate parameters during initialization
    self._validate_params()
    # Call parent's __init__
    super().__init__(name=name, **kwargs)

🔧 Parameters Deep Dive

`embed_dim` (int)

Purpose: Dimensionality of the projected feature embeddings
Range: 4 to 64+ (typically 8-32)
Impact: Larger values = more expressive embeddings but more parameters
Recommendation: Start with 8-16, scale based on data complexity

`dropout_rate` (float)

Purpose: Dropout rate applied to attention weights
Range: 0.0 to 0.5 (typically 0.1-0.2)
Impact: Higher values = more regularization
Recommendation: Use 0.1-0.2 for regularization

`leaky_relu_alpha` (float)

Purpose: Alpha parameter for LeakyReLU activation
Range: 0.0 to 1.0 (typically 0.2)
Impact: Controls the negative slope of LeakyReLU
Recommendation: Use 0.2 for most applications

📈 Performance Characteristics

Speed: ⚡⚡⚡ Fast for small to medium models, scales with features²
Memory: 💾💾💾 Moderate memory usage due to attention computation
Accuracy: 🎯🎯🎯🎯 Excellent for feature relationship learning
Best For: Tabular data with inherent feature relationships

🎨 Examples

Example 1: Feature Relationship Learning

import keras
import numpy as np
from kerasfactory.layers import GraphFeatureAggregation

# Create a model for feature relationship learning
def create_relationship_learning_model():
    inputs = keras.Input(shape=(20,))  # 20 features

    # Multiple graph layers for different relationship levels
    x = GraphFeatureAggregation(
        embed_dim=16,
        dropout_rate=0.1,
        leaky_relu_alpha=0.2
    )(inputs)

    x = keras.layers.Dense(32, activation='relu')(x)
    x = keras.layers.BatchNormalization()(x)

    x = GraphFeatureAggregation(
        embed_dim=12,
        dropout_rate=0.1,
        leaky_relu_alpha=0.2
    )(x)

    x = keras.layers.Dense(16, activation='relu')(x)
    x = keras.layers.Dropout(0.2)(x)

    # Output
    outputs = keras.layers.Dense(1, activation='sigmoid')(x)

    return keras.Model(inputs, outputs)

model = create_relationship_learning_model()
model.compile(optimizer='adam', loss='binary_crossentropy')

# Test with sample data
sample_data = keras.random.normal((100, 20))
predictions = model(sample_data)
print(f"Relationship learning predictions shape: {predictions.shape}")

Example 2: Graph Structure Analysis

# Analyze graph structure behavior
def analyze_graph_structure():
    # Create model with graph layer
    inputs = keras.Input(shape=(15,))
    x = GraphFeatureAggregation(embed_dim=8, dropout_rate=0.1)(inputs)
    outputs = keras.layers.Dense(1, activation='sigmoid')(x)

    model = keras.Model(inputs, outputs)

    # Test with different input patterns
    test_inputs = [
        keras.random.normal((10, 15)),  # Random data
        keras.random.normal((10, 15)) * 2,  # Scaled data
        keras.random.normal((10, 15)) + 1,  # Shifted data
    ]

    print("Graph Structure Analysis:")
    print("=" * 40)

    for i, test_input in enumerate(test_inputs):
        prediction = model(test_input)
        print(f"Test {i+1}: Prediction mean = {keras.ops.mean(prediction):.4f}")

    return model

# Analyze graph structure
# model = analyze_graph_structure()

Example 3: Attention Pattern Analysis

# Analyze attention patterns in graph layer
def analyze_attention_patterns():
    # Create model with graph layer
    inputs = keras.Input(shape=(12,))
    x = GraphFeatureAggregation(embed_dim=8, dropout_rate=0.1)(inputs)
    outputs = keras.layers.Dense(1, activation='sigmoid')(x)

    model = keras.Model(inputs, outputs)

    # Test with sample data
    sample_data = keras.random.normal((50, 12))
    predictions = model(sample_data)

    print("Attention Pattern Analysis:")
    print("=" * 40)
    print(f"Input shape: {sample_data.shape}")
    print(f"Output shape: {predictions.shape}")
    print(f"Model parameters: {model.count_params()}")

    return model

# Analyze attention patterns
# model = analyze_attention_patterns()

💡 Tips & Best Practices

Embedding Dimension: Start with 8-16, scale based on data complexity
Dropout Rate: Use 0.1-0.2 for regularization
LeakyReLU Alpha: Use 0.2 for most applications
Feature Relationships: Works best when features have inherent relationships
Residual Connections: Built-in residual connections for gradient flow
Attention Patterns: Monitor attention patterns for interpretability

⚠️ Common Pitfalls

Embedding Dimension: Must be positive integer
Dropout Rate: Must be between 0 and 1
Memory Usage: Scales quadratically with number of features
Overfitting: Monitor for overfitting with complex configurations
Feature Count: Consider feature pre-selection for very large feature sets

AdvancedGraphFeature - Advanced graph feature layer
MultiHeadGraphFeaturePreprocessor - Multi-head graph preprocessing
TabularAttention - Tabular attention mechanisms
VariableSelection - Variable selection

📚 Further Reading

Graph Neural Networks - Graph neural network concepts
Self-Attention - Self-attention mechanism
Feature Relationships - Feature relationship concepts
KerasFactory Layer Explorer - Browse all available layers
Feature Engineering Tutorial - Complete guide to feature engineering

🔗 GraphFeatureAggregation

🔗 GraphFeatureAggregation

🎯 Overview

🔍 How It Works

💡 Why Use This Layer?

📊 Use Cases

🚀 Quick Start

Basic Usage

In a Sequential Model

In a Functional Model

Advanced Configuration

📖 API Reference

kerasfactory.layers.GraphFeatureAggregation

Classes

GraphFeatureAggregation

🔧 Parameters Deep Dive

embed_dim (int)

dropout_rate (float)

leaky_relu_alpha (float)

📈 Performance Characteristics

🎨 Examples

Example 1: Feature Relationship Learning

Example 2: Graph Structure Analysis

Example 3: Attention Pattern Analysis

💡 Tips & Best Practices

⚠️ Common Pitfalls

🔗 Related Layers

📚 Further Reading

`embed_dim` (int)

`dropout_rate` (float)

`leaky_relu_alpha` (float)