π GraphFeatureAggregation
π GraphFeatureAggregation
π― Overview
The GraphFeatureAggregation layer treats each input feature as a node in a graph and uses self-attention mechanisms to learn relationships between features. It projects features into an embedding space, computes pairwise attention scores, and aggregates feature information based on these scores.
This layer is particularly powerful for tabular data where features have inherent relationships, providing a way to learn and exploit these relationships automatically without manual feature engineering.
π How It Works
The GraphFeatureAggregation processes data through a graph-based transformation:
- Feature Embedding: Projects each scalar feature to an embedding
- Pairwise Scoring: Computes pairwise concatenated embeddings and scores them
- Attention Matrix: Normalizes scores with softmax to create dynamic adjacency matrix
- Feature Aggregation: Aggregates neighboring features via weighted sum
- Output Projection: Projects back to original dimension with residual connection
graph TD
A[Input Features] --> B[Feature Embedding]
B --> C[Pairwise Scoring]
C --> D[Attention Matrix]
D --> E[Feature Aggregation]
E --> F[Output Projection]
A --> G[Residual Connection]
F --> G
G --> H[Transformed Features]
style A fill:#e6f3ff,stroke:#4a86e8
style H fill:#e8f5e9,stroke:#66bb6a
style B fill:#fff9e6,stroke:#ffb74d
style C fill:#f3e5f5,stroke:#9c27b0
style D fill:#e1f5fe,stroke:#03a9f4
style E fill:#fff3e0,stroke:#ff9800
π‘ Why Use This Layer?
| Challenge | Traditional Approach | GraphFeatureAggregation's Solution |
|---|---|---|
| Feature Relationships | Manual feature engineering | π― Automatic learning of feature relationships |
| Graph Structure | No graph structure | β‘ Graph-based feature processing |
| Attention Mechanisms | No attention | π§ Self-attention for feature interactions |
| Dynamic Adjacency | Static relationships | π Dynamic adjacency matrix learning |
π Use Cases
- Tabular Data: Learning feature relationships in tabular data
- Graph Neural Networks: Graph-based processing for tabular data
- Feature Engineering: Automatic feature relationship learning
- Attention Mechanisms: Self-attention for feature interactions
- Dynamic Relationships: Learning dynamic feature relationships
π Quick Start
Basic Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
In a Sequential Model
1 2 3 4 5 6 7 8 9 10 11 12 | |
In a Functional Model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
Advanced Configuration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | |
π API Reference
kerasfactory.layers.GraphFeatureAggregation
This module implements a GraphFeatureAggregation layer that treats features as nodes in a graph and uses attention mechanisms to learn relationships between features. This approach is useful for tabular data where features have inherent relationships.
Classes
GraphFeatureAggregation
1 2 3 4 5 6 7 | |
Graph-based feature aggregation layer with self-attention for tabular data.
This layer treats each input feature as a node and projects it into an embedding space. It then computes pairwise attention scores between features and aggregates feature information based on these scores. Finally, it projects the aggregated features back to the original feature space and adds a residual connection.
The process involves
- Projecting each scalar feature to an embedding (shape: [batch, num_features, embed_dim]).
- Computing pairwise concatenated embeddings and scoring them via a learnable attention vector.
- Normalizing the scores with softmax to yield a dynamic adjacency (attention) matrix.
- Aggregating neighboring features via weighted sum.
- Projecting back to a vector of original dimension, then adding a residual connection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embed_dim |
int
|
Dimensionality of the projected feature embeddings. Default is 8. |
8
|
dropout_rate |
float
|
Dropout rate to apply on attention weights. Default is 0.0. |
0.0
|
leaky_relu_alpha |
float
|
Alpha parameter for the LeakyReLU activation. Default is 0.2. |
0.2
|
name |
str | None
|
Optional name for the layer. |
None
|
Input shape
2D tensor with shape: (batch_size, num_features)
Output shape
2D tensor with shape: (batch_size, num_features) (same as input)
Example
1 2 3 4 5 6 7 8 9 10 | |
Initialize the GraphFeatureAggregation layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embed_dim |
int
|
Embedding dimension. |
8
|
dropout_rate |
float
|
Dropout rate. |
0.0
|
leaky_relu_alpha |
float
|
Alpha parameter for LeakyReLU. |
0.2
|
name |
str | None
|
Name of the layer. |
None
|
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Source code in kerasfactory/layers/GraphFeatureAggregation.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | |
π§ Parameters Deep Dive
embed_dim (int)
- Purpose: Dimensionality of the projected feature embeddings
- Range: 4 to 64+ (typically 8-32)
- Impact: Larger values = more expressive embeddings but more parameters
- Recommendation: Start with 8-16, scale based on data complexity
dropout_rate (float)
- Purpose: Dropout rate applied to attention weights
- Range: 0.0 to 0.5 (typically 0.1-0.2)
- Impact: Higher values = more regularization
- Recommendation: Use 0.1-0.2 for regularization
leaky_relu_alpha (float)
- Purpose: Alpha parameter for LeakyReLU activation
- Range: 0.0 to 1.0 (typically 0.2)
- Impact: Controls the negative slope of LeakyReLU
- Recommendation: Use 0.2 for most applications
π Performance Characteristics
- Speed: β‘β‘β‘ Fast for small to medium models, scales with featuresΒ²
- Memory: πΎπΎπΎ Moderate memory usage due to attention computation
- Accuracy: π―π―π―π― Excellent for feature relationship learning
- Best For: Tabular data with inherent feature relationships
π¨ Examples
Example 1: Feature Relationship Learning
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | |
Example 2: Graph Structure Analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | |
Example 3: Attention Pattern Analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | |
π‘ Tips & Best Practices
- Embedding Dimension: Start with 8-16, scale based on data complexity
- Dropout Rate: Use 0.1-0.2 for regularization
- LeakyReLU Alpha: Use 0.2 for most applications
- Feature Relationships: Works best when features have inherent relationships
- Residual Connections: Built-in residual connections for gradient flow
- Attention Patterns: Monitor attention patterns for interpretability
β οΈ Common Pitfalls
- Embedding Dimension: Must be positive integer
- Dropout Rate: Must be between 0 and 1
- Memory Usage: Scales quadratically with number of features
- Overfitting: Monitor for overfitting with complex configurations
- Feature Count: Consider feature pre-selection for very large feature sets
π Related Layers
- AdvancedGraphFeature - Advanced graph feature layer
- MultiHeadGraphFeaturePreprocessor - Multi-head graph preprocessing
- TabularAttention - Tabular attention mechanisms
- VariableSelection - Variable selection
π Further Reading
- Graph Neural Networks - Graph neural network concepts
- Self-Attention - Self-attention mechanism
- Feature Relationships - Feature relationship concepts
- KerasFactory Layer Explorer - Browse all available layers
- Feature Engineering Tutorial - Complete guide to feature engineering