πͺ GatedLinearUnit
πͺ GatedLinearUnit
π― Overview
The GatedLinearUnit applies a gated linear transformation to input tensors, controlling information flow in neural networks. It multiplies the output of a dense linear transformation with the output of a dense sigmoid transformation, creating a gating mechanism that filters information based on learned weights and biases.
This layer is particularly powerful for controlling information flow, implementing attention-like mechanisms, and creating sophisticated feature transformations in neural networks.
π How It Works
The GatedLinearUnit processes data through a gated transformation:
- Linear Transformation: Applies dense linear transformation to input
- Sigmoid Transformation: Applies dense sigmoid transformation to input
- Gating Mechanism: Multiplies linear output with sigmoid output
- Information Filtering: The sigmoid output acts as a gate controlling information flow
- Output Generation: Produces gated and filtered features
graph TD
A[Input Features] --> B[Linear Dense Layer]
A --> C[Sigmoid Dense Layer]
B --> D[Linear Output]
C --> E[Sigmoid Output (Gate)]
D --> F[Element-wise Multiplication]
E --> F
F --> G[Gated Output]
style A fill:#e6f3ff,stroke:#4a86e8
style G fill:#e8f5e9,stroke:#66bb6a
style B fill:#fff9e6,stroke:#ffb74d
style C fill:#f3e5f5,stroke:#9c27b0
style F fill:#e1f5fe,stroke:#03a9f4
π‘ Why Use This Layer?
| Challenge | Traditional Approach | GatedLinearUnit's Solution |
|---|---|---|
| Information Flow | No control over information flow | π― Gated control of information flow |
| Feature Filtering | All features treated equally | β‘ Selective filtering based on learned gates |
| Attention Mechanisms | Separate attention layers | π§ Built-in gating for attention-like behavior |
| Feature Transformation | Simple linear transformations | π Sophisticated gated transformations |
π Use Cases
- Information Flow Control: Controlling how information flows through networks
- Feature Filtering: Filtering features based on learned importance
- Attention Mechanisms: Implementing attention-like behavior
- Feature Transformation: Sophisticated feature processing
- Ensemble Learning: As a component in ensemble architectures
π Quick Start
Basic Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
In a Sequential Model
1 2 3 4 5 6 7 8 9 10 11 12 | |
In a Functional Model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
Advanced Configuration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
π API Reference
kerasfactory.layers.GatedLinearUnit
This module implements a GatedLinearUnit layer that applies a gated linear transformation to input tensors. It's particularly useful for controlling information flow in neural networks.
Classes
GatedLinearUnit
1 2 3 | |
GatedLinearUnit is a custom Keras layer that implements a gated linear unit.
This layer applies a dense linear transformation to the input tensor and multiplies the result with the output of a dense sigmoid transformation. The result is a tensor where the input data is filtered based on the learned weights and biases of the layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
units |
int
|
Positive integer, dimensionality of the output space. |
required |
name |
str
|
Name for the layer. |
None
|
Input shape
Tensor with shape: (batch_size, ..., input_dim)
Output shape
Tensor with shape: (batch_size, ..., units)
Example
1 2 3 4 5 6 7 8 9 10 | |
Initialize the GatedLinearUnit layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
units |
int
|
Number of units in the layer. |
required |
name |
str | None
|
Name of the layer. |
None
|
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Source code in kerasfactory/layers/GatedLinearUnit.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | |
π§ Parameters Deep Dive
units (int)
- Purpose: Dimensionality of the output space
- Range: 1 to 1000+ (typically 8-128)
- Impact: Determines the size of the gated output
- Recommendation: Start with 16-32, scale based on data complexity
π Performance Characteristics
- Speed: β‘β‘β‘β‘ Very fast - simple mathematical operations
- Memory: πΎπΎ Low memory usage - minimal additional parameters
- Accuracy: π―π―π―π― Excellent for information flow control
- Best For: Networks requiring sophisticated information flow control
π¨ Examples
Example 1: Information Flow Control
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | |
Example 2: Feature Filtering Analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | |
Example 3: Attention-like Behavior
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | |
π‘ Tips & Best Practices
- Units: Start with 16-32 units, scale based on data complexity
- Information Flow: Use GLU to control how information flows through networks
- Feature Filtering: GLU can act as a learned feature filter
- Attention: GLU can implement attention-like mechanisms
- Combination: Works well with other Keras layers
- Regularization: Consider adding dropout after GLU layers
β οΈ Common Pitfalls
- Units: Must be positive integer
- Output Size: Output size is determined by units parameter
- Gradient Flow: GLU can affect gradient flow - monitor training
- Overfitting: Can overfit on small datasets - use regularization
- Memory Usage: Scales with units parameter
π Related Layers
- GatedResidualNetwork - GRN using GLU
- VariableSelection - Variable selection with gating
- SparseAttentionWeighting - Sparse attention weighting
- TabularAttention - Attention mechanisms
π Further Reading
- Gated Linear Units - Original GLU paper
- Information Flow in Neural Networks - Information flow concepts
- Attention Mechanisms - Attention mechanism concepts
- KerasFactory Layer Explorer - Browse all available layers
- Feature Engineering Tutorial - Complete guide to feature engineering