[Avg. reading time: 12 minutes]

Anomaly Detection

Anomaly detection and predictive maintenance are critical components of the IoT upper stack, focusing on analyzing data to identify unusual patterns and prevent equipment failures before they occur.

Anomaly Detection in IoT

Anomaly detection in IoT systems identifies unusual patterns or outliers in data streams that deviate from expected behavior. This is particularly valuable in industrial and enterprise IoT deployments.

Key approaches include:

Statistical Methods - Using statistical models to establish normal behavior baselines and flag significant deviations.

Machine Learning Techniques - Employing supervised or unsupervised learning to recognize patterns and identify anomalies.

Deep Learning Models - Leveraging neural networks for complex pattern recognition in high-dimensional IoT data.

Isolation Forest

Isolation Forest is a powerful machine learning algorithm specifically designed for anomaly detection. Unlike many other algorithms that identify anomalies based on density or distance measures, Isolation Forest takes a fundamentally different approach.

The key insight behind Isolation Forest is remarkably intuitive: anomalies are few and different, so they’re easier to isolate than normal data points.

100+ trees will be built here demonstrating 4 for understanding.

Dataset: [-100, 2, 11, 13, 100]

                Root (Tree 1)
                 |
        [Split at value = 7]
        /                 \
    [-100, 2]        [11, 13, 100]
        |                  |
[Split at value = -49]  [Split at value = 56]
    /         \          /         \
[-100]       [2]    [11, 13]      [100]

Path Length 2,2,3,2


                Root (Tree 2)
                 |
        [Split at value = 1]
        /                 \
    [-100]          [2, 11, 13, 100]
Path length: 1          |
                  [Split at value = 50]
                  /                 \
            [2, 11, 13]           [100]
                 |               Path length: 2
         [Further splits needed]
         Path length: 3-4


                Root (Tree 3)
                 |
        [Split at value = 12]
        /                 \
[-100, 2, 11]         [13, 100]
        |                  |
[Split at value = -40]  [Split at value = 57]
    /         \          /         \
[-100]     [2, 11]     [13]       [100]
Path length: 2   3+      2       Path length: 2



                Root (Tree 4)
                 |
        [Split at value = 80]
        /                 \
[-100, 2, 11, 13]        [100]
        |              Path length: 1
[Split at value = -50]
    /         \         
[-100]    [2, 11, 13]  
Path length: 2    [Further splits]
                 Path length: 3+

Avg Path Length (across 4 trees)

For -100 (2+1+2+2)/4 = 1.75
For 2 (2+3+3+3)/4 = 2.75
For 11: (3 + 3 + 3 + 3) ÷ 4 = 3.0
For 13: (3 + 3 + 2 + 3) ÷ 4 = 2.75
For 100: (3 + 2 + 2 + 1) ÷ 4 = 2.0

When normalized (mathematically adjusted formula) 100 & -100 will be closer to 1 and other numbers will be closer to 0.

Isolation Forest key points to remember.

Core principle: Anomalies are “few and different” and thus require fewer splits to isolate than normal data points.

Random partitioning: The algorithm randomly selects features and split values to create isolation trees.

Tree construction: Each tree recursively partitions data until all points are isolated.

Path length: The number of splits needed to isolate a point is its path length in that tree.

Anomaly detection: Anomalies have shorter average path lengths across multiple trees.

Ensemble approach: Many isolation trees are built (typically 100+) using random subsamples of data.

Averaging: The average path length for each point is calculated across all trees.

Normalization: Path lengths are normalized using a factor based on dataset size.

Scoring: Anomaly scores are calculated using the formula: s(x,n) = 2^(-E[h(x)]/c(n))

Higher scores (closer to 1) indicate anomalies Lower scores (closer to 0) indicate normal points

Threshold: Points with scores above a threshold (typically 0.5) are classified as anomalies.

No distance calculation: Unlike many anomaly detection algorithms, Isolation Forest doesn’t require distance or density calculations.

Efficiency: The algorithm has linear time complexity and low memory requirements.

Robustness: Works well with high-dimensional data and is resistant to the “curse of dimensionality”.

Limitations: May struggle with clustered anomalies or when anomalies are close to normal data boundaries.Ver 6.0.5

Adv - IoT Upper Stack

Anomaly Detection