[Avg. reading time: 0 minutes] Ver 6.0.5

[Avg. reading time: 0 minutes]

Disclaimer

Ver 6.0.5

[Avg. reading time: 4 minutes]

Required Tools

Common Tools (Windows & Mac)

Docker Personal
Visual Studio Code
Install this VS Code Extension**

Remote Development

Configure Env using Dev Container

Goto Terminal / Command Prompt

git clone https://github.com/gchandra10/workspace-iot-upperstack.git

Make sure Docker is running
Open VSCode
Goto File > Open Workspace from File
Goto workspace-rust-de folder and choose the workspace.
When VS Code prompts to “Reopen in Container” click it.

If VSCode doesnt prompt, then click the “Remote Connection” button at the Left Bottom of the screen.

Cloud Tools

[Avg. reading time: 1 minute]

Overview of IOT

[Avg. reading time: 6 minutes]

Introduction

The Internet of Things is a system where physical objects are equipped with sensors, software, and network connectivity so they can collect data, communicate over the network, and trigger actions without continuous human involvement.

IoT is not just the device.

IoT is devices + data + connectivity + action.

Why IoT Matters

Operational Efficiency

Automates repetitive and time sensitive tasks
Reduces manual monitoring and human error
Enables real time visibility into systems

Data Driven Decisions

Sensors generate continuous time series data
Decisions shift from intuition to measurable signals
Analytics and ML sit on top of IoT, not the other way around

Quality of Life

Healthcare monitoring, smart homes, traffic systems
Problems are detected earlier, not after failure
Convenience is a side effect, reliability is the real win

Economic Impact

New products, new services, new pricing models
Hardware vendors become data companies
Entire industries move from reactive to predictive

What is not IOT

Devices that work only locally

A USB temperature sensor dumping values to a laptop
An electronic thermostat controlling temperature locally
No network, no IoT

Systems with no outward data flow

Hardware that performs an action but emits no telemetry
If data never leaves the device, it is automation, not IoT

What MUST exist for something to be IoT

Continuous or event based data generation
Network communication
Backend ingestion
Storage, usually time series oriented
Processing or decision making
Optional but important feedback or control loop

Examples

Watch vs Smart Watch

CO Detector vs Smart CO Detector

Senses CO locally
Triggers a buzzer or alarm
Operates entirely offline

Transmits CO readings or alarm events
Uses a network to communicate
Notifies an external system such as a phone app, home hub, or fire department

Smart Fridge

Local intelligence is embedded systems. Networked intelligence is IoT.

#IOT #Importance #smart #network Ver 6.0.5

[Avg. reading time: 3 minutes]

Use Cases

Every IoT use case follows the same pattern

sense → transmit → store → decide → act

1. Smart Homes

Use Case Home automation for comfort, security, and energy efficiency.

Example Smart thermostats like Nest adjust temperature based on occupancy and behavior. Smart locks and cameras like Ring stream events and alerts.

Temperature or motion sensed > data sent > rule applied > device reacts.

2. Healthcare

Use Case Remote patient monitoring and early intervention.

Example Wearables such as Fitbit and Apple Watch track vitals and activity and trigger alerts.

Vitals sensed > transmitted > analyzed > alert raised.

3. Industrial IoT (IIoT)

Use Case Predictive maintenance and factory automation.

Example Sensors monitor vibration, temperature, and pressure to predict failures before they occur using platforms like GE Predix.

Machine signals sensed > streamed > modeled > maintenance action triggered.

Similarly Smart Shelves inventory update, Amazon Go, Tesla Cars, Smart meters, Air Quality and so on.

Why IoT Works Across All Fields

Sensors are cheap
Networks already exist
Storage is inexpensive
Compute and analytics are mature

#iotusecases #logistics #environmentalVer 6.0.5

[Avg. reading time: 3 minutes]

JOBS

Role	What They Actually Do	Core Skills
IoT Application Developer	Build web or mobile apps that display IoT data and trigger actions	APIs, REST, MQTT, Web or Mobile frameworks
IoT Solutions Architect	Design the full IoT system from devices to cloud and apps	Architecture, cloud IoT services, security
Cloud Integration Engineer	Connect devices to cloud storage, pipelines, and services	AWS or Azure, MQTT, REST, data pipelines
IoT Data Analyst	Analyze sensor and event data to extract insights	Python, SQL, time series data, dashboards
IoT Product Manager	Decide what gets built and why from a business angle	Product thinking, requirements, communication
IoT Security Specialist	Secure data, APIs, devices, and cloud integrations	Encryption, auth, IAM, threat modeling
IoT Test Engineer	Validate reliability, scale, and failure scenarios	Testing, automation, system validation
IoT Support or Operations	Keep systems running and debug failures	Monitoring, logs, troubleshooting

#jobs #iotdevelopers #iotarchitects #dataecosystemVer 6.0.5

[Avg. reading time: 15 minutes]

Computing Types

Modern software systems use different computing approaches depending on where computation happens, how systems are structured, and when decisions are made.

There is no single “best” computing model. Each type exists to solve a specific class of problems related to scale, latency, reliability, cost, and complexity.

As systems evolved from single machines to globally distributed platforms and IoT systems, computing models also evolved:

From centralized to distributed
From monolithic to microservices
From cloud-only to edge and fog
From reactive to proactive

Understanding these computing types helps you:

Choose the right architecture for a problem
Understand why IoT systems cannot rely on cloud alone
See how modern data and IoT platforms fit together

Centralized Computing

Single computer or location handles all processing and storage. All resources and decisions are managed from one central point.

Characteristics

Single point of control
Centralized decision making
Consistent data
Simpler security
Easier maintenance

Examples

Traditional banking systems
Library systems
School management systems

Typical setup

Central server or mainframe
All branches connect to HQ
Single database
Centralized processing
One place for updates

Major drawback

Single point of failure

Distributed Computing

Multiple computers work together as one logical system. Processing, storage, and management are spread across multiple machines or locations.

Characteristics

Shared resources
Fault tolerance
High availability
Horizontal scalability
Load balancing

Example

Google Search
- Multiple data centers
- Distributed query processing
- Replication and redundancy

Monolithic

Single application where all functionality is packaged into one codebase.

Characteristics

One deployment unit
Shared database
Tightly coupled components
Single technology stack
All-or-nothing scaling

Advantages

Simple to build
Easy to deploy
Good performance
Lower initial cost

Disadvantages

Hard to scale selectively
Technology lock-in

Examples

WordPress
Early-stage applications (many start monolithic)

Microservices

Application built as independent, small services that communicate via APIs.

Characteristics

Independent services
Separate databases (often)
Loosely coupled
Different tech stacks possible
Individual scaling

Advantages

Scale only what is needed
Team autonomy
Technology flexibility

Disadvantages

Operational overhead
Higher complexity
Latency and distributed failures
Tooling sprawl if unmanaged

Cloud Computing

Cloud computing provides compute resources (servers, storage, databases, networking, software) over the internet with pay-as-you-go pricing.

Benefits

Cost savings
- No upfront infrastructure
- Pay for usage
- Reduced maintenance
Scalability
- Scale up or down on demand
- Handle traffic spikes
Accessibility
- Access from anywhere
- Global reach
Reliability
- Backups and disaster recovery
- Multi-region options
Automatic updates
- Security patches
- Managed services reduce ops work

Examples

Cloud storage
OTT streaming platforms

Service Models

SaaS (Software as a Service)
- Ready-to-use apps
- Examples: Gmail, Dropbox, Slack
PaaS (Platform as a Service)
- App runtime and developer platforms
- Examples: Heroku, Google App Engine
IaaS (Infrastructure as a Service)
- Compute, network, storage building blocks
- Examples: AWS EC2, Azure VMs

Edge Computing

Edge computing moves computation and storage closer to where data is generated, near or on IoT devices.

Benefits

Lower latency
Works with limited internet
Reduces bandwidth cost
Better privacy (data stays local)

Simple examples

Smart camera doing motion detection locally
Smart thermostat adjusting temperature locally
Factory robot making real-time decisions from sensors

Examples

Smart Home Security
- Local video processing
- Only sends alerts or clips to cloud
Tesla cars
- Local sensor fusion and obstacle detection
- Split-second decisions on device

Fog Computing

What it does

Aggregates data from multiple edge devices
Provides more compute than individual devices
Filters and enriches data before sending to cloud
Keeps latency lower than cloud-only systems

Examples

Smart building local server processing many sensors
Factory gateway analyzing multiple machines
Farm gateway coordinating multiple sensors and controllers

Cloud vs Edge vs Fog

Aspect	Cloud	Edge	Fog
Location	Central data centers	On/near device	Local network
Latency	High	Very low	Medium
Compute	Very high	Low	Medium
Storage	High	Very limited	Limited
Internet dependency	Required	Optional	Local network required
Data scope	Global	Single device	Multiple local devices
Typical use	Analytics, long-term storage	Real-time decisions	Aggregation, coordination
Example	AWS	Smart camera	Factory gateway

Computing Evolution

Manual Computing

Calculations and decisions performed by humans.

Drawbacks

Slow
Error-prone
Not scalable

Automated Computing

Computers execute workflows with minimal human involvement.

Faster processing
Higher accuracy
Efficient resource use

Reactive Computing

System responds after events happen.

Examples

Incident response
Support tickets
After-the-fact troubleshooting

Proactive Computing

System predicts and acts before failures happen.

Examples

Predictive maintenance
Capacity planning
Anomaly detection

Idea

Prevention is better than cure

Remember the saying “Prevention is better than cure”

#iot #computing #centralizedVer 6.0.5

[Avg. reading time: 10 minutes]

Evolution of IoT

IoT evolved from isolated device communication to distributed, event-driven systems where intelligence is shared across edge, fog, and cloud.

Early Phase (2000–2010): Machine-to-Machine Era

Characteristics

Direct device-to-system communication
Mostly industrial use cases
Proprietary protocols
Vendor-locked implementations

Limitations

No standardization
Poor interoperability
High cost
Difficult to scale

Example: OnStar Vehicle Communication

Direct vehicle to control-center connection
Proprietary cellular network
Centralized command system

Capabilities

Emergency alerts
Vehicle tracking
Remote diagnostics

Limitations

Closed ecosystem
Single-vendor dependency
High operational cost

Implementation: General Motors’ OnStar system (2000s)

Initial IoT Phase (2010–2015): Three-Layer Architecture

Architecture Layers

Perception Layer

Sensors and actuators
Data collection from physical world

Network Layer

Connectivity
Data transmission

Application Layer

Basic analytics
Visualization
User interfaces

Key Advances

Cloud computing adoption
Open protocols emerge
Improved interoperability

Example 1: Nest Learning Thermostat (1st Generation)

Temperature and motion sensors
Wi-Fi connectivity
Cloud-backed mobile application

Impact

Mainstream smart home adoption
Remote monitoring and automation

Intermediate Phase (2015-2018): Five-Layer Architecture

The five-layer model emerged because cloud-only processing could not meet latency, scale, and enterprise integration needs.

Additional Layers

Transport Layer: reliable data movement
Processing Layer: analytics and rule engines
Business Layer: enterprise integration and monetization

Improvements

Better security models
Edge computing introduced
Improved scalability
Structured data management

Example: Smart City - Barcelona

Architecture

City-wide sensor networks
High-speed transport networks
Central data platforms
Multiple city applications
Business and governance layer

Results

Reduced water consumption
Improved traffic flow
Optimized waste management

Modern Phase (2018-Present): Service-Oriented Architecture

Core Characteristics

Microservices-based systems
Edge–Cloud continuum
Event-driven architecture
Zero-trust security
AI and ML integration

Key Capabilities

Distributed Intelligence

Edge processing
Fog computing
Autonomous decision-making

Advanced Integration

API-first design
Event mesh
Digital twins

Security

Identity-based access
End-to-end encryption
Continuous threat detection

Scalability

Containers
Serverless computing
Auto-scaling

Example: Tesla Vehicle Platform

Architecture

Edge computing inside vehicles
Cloud-based OTA updates
AI-driven autopilot
Digital vehicle twins

Impact

Continuous improvement
Predictive maintenance
Fleet-level intelligence

Example : Amazon Go Stores

Technologies

Computer vision
Sensor fusion
Edge AI
Deep learning

Results

Cashierless retail
Reduced operational cost
Improved customer experience

Emerging Trends in IoT

Autonomous IoT

Self-healing systems
Self-optimizing networks
Cognitive decision-making

Sustainable IoT

Energy-efficient design
Green computing
Resource optimization

Resilient IoT

Fault tolerance
Disaster recovery
Business continuity

Example: Smart Agriculture

Autonomous machinery
Drone integration
Soil and weather sensors
Precision farming

Example: Smart Grids

Grid sensors
Smart meters
Edge intelligence
Automated fault recovery
Demand response

Key Architectural Shifts Over Time:

From Centralized → Distributed
From Monolithic → Microservices
From Cloud-centric → Edge-centric
From Static → Dynamic
From Manual → Automated
From Reactive → Proactive

Impact on Design Considerations

Scalability

Vertical → Horizontal
Static → Elastic

Security

Perimeter-based → Zero trust
Reactive → Preventive

Integration

Point-to-point → Event-driven
Tight coupling → Loose coupling

Operations

Manual → Automated
Centralized → Distributed

#iot #evolutionVer 6.0.5

[Avg. reading time: 14 minutes]

Protocols

A protocol in the context of computing and communications refers to a set of rules and conventions that dictate how data is transmitted and received over a network. Protocols ensure that different devices and systems can communicate with each other reliably and effectively. They define the format, timing, sequencing, and error checking mechanisms used in data exchange.

Importance of Protocols

Interoperability: Allows different systems and devices from various manufacturers to work together.

Reliability: Ensures data is transmitted accurately and efficiently.

Standardization: Provides a common framework that developers can follow, leading to consistent implementations.

Commonly used Protocols

HTTP (HyperText Transfer Protocol): Used for transmitting web pages over the internet.

FTP (File Transfer Protocol): Used for transferring files between computers.

TCP/IP (Transmission Control Protocol/Internet Protocol): A suite of communication protocols used to interconnect network devices on the internet.

UDP (User Datagram Protocol): UDP, or User Datagram Protocol, is a communication protocol used across the Internet. It is part of the Internet Protocol Suite, which is used by networked devices to send short messages known as datagrams but with minimal protocol mechanisms. Used in VoIP & Live Streaming.

Key Characteristics of Protocols

Syntax:

Defines the structure or format of the data.

Example: How data packets are formatted or how headers are structured.

Semantics:

Describes the meaning of each section of bits in the data.

Example: What specific bits represent, such as addressing information or control flags.

Timing:

Controls the sequencing and speed of data exchange.

Example: When data should be sent, how fast it should be sent, and how to handle synchronization.

Popular IoT Protocols

1. Bluetooth

Description: A short-range wireless technology standard used for exchanging data between fixed and mobile devices. Its a Key protocol in the IoT ecosystem.

Use Cases:

Wearable devices (e.g., fitness trackers, smartwatches)
Wireless peripherals (e.g., keyboards, mice, headphones)
Home automation (e.g., smart locks, lighting control)
Health monitoring devices

2. Zigbee

Description: A low-power, low data rate wireless mesh network standard ideal for IoT applications. It can handle larger networks in 1000’s of nodes compared to Bluetooch with a limit of 5 to 30 devices. Lower Latency compared to Bluetooth. Needs a hub / controller to communicate. (Google Nest, Apple HomePod)

Use Cases:

Smart home devices (e.g., smart bulbs, thermostats, security systems)
Industrial automation
Smart energy applications (e.g., smart meters)
Wireless sensor networks

3. NFC (Near Field Communication)

Description: Direct Peer to Peer communication system. A set of communication protocols for communication between two electronic devices over a distance of 4 cm (1.6 in) or less. No pairing or controller is needed.

Use Cases:

Contactless payments (e.g., Apple Pay, Google Wallet)
Access control (e.g., NFC-enabled door locks, Yubi Keys)
Data exchange (e.g., transferring contacts, photos)
Smart posters and advertising

Payment Terminal

Phone → Terminal (direct) Terminal → Payment processor (separate connection)

Door Access

Card → Reader (direct) Reader → Access control system (separate connection)

4. LoRaWAN (Long Range Wide Area Network)

Description: A low-power, long-range wireless protocol designed for IoT applications.

Use Cases:

Smart cities (e.g., parking sensors, street lighting)
Agriculture (e.g., soil moisture sensors)
Asset tracking
Environmental monitoring

5. MQTT (Message Queuing Telemetry Transport)

Description: A lightweight messaging protocol for small sensors and mobile devices optimized for high-latency or unreliable networks.

It’s a lightweight messaging protocol designed for devices with limited resources
Works like a postal service for IoT devices
Uses a publish/subscribe model instead of direct device-to-device communication
Perfect for IoT because it’s:
- Low bandwidth
- Battery efficient
- Reliable even with poor connections

Use Cases:

Home automation (e.g., smart home controllers)
Industrial automation.
Telemetry data collection.
Remote monitoring.

6. CoAP (Constrained Application Protocol)

Description: A specialized web transfer protocol for use with constrained nodes and networks in the IoT.

Key Characteristics

It’s a specialized web transfer protocol for resource-constrained IoT devices
Works similarly to HTTP but optimized for IoT needs
Uses UDP (User Datagram Protocol) instead of TCP, making it lighter and faster
Built for machine-to-machine (M2M) applications

Use Cases:

Smart energy and utility metering
Building automation
Environmental monitoring
Resource-constrained devices

Main Features

Built-in Resource Discovery
Support for multicast and broadcast messages
Simple proxy and caching capabilities
Low overhead and parsing complexity
Asynchronous message exchange
URI support similar to HTTP (coap://endpoint/path)

Apart from this there are few more Z-Wave, LTE-M, RFID

#protocol #http #mqttVer 6.0.5

[Avg. reading time: 4 minutes]

IoT Protocol Stack Overview

Many IoT protocols span multiple layers.
This stack is a conceptual view used to understand responsibilities, not a strict OSI mapping.

Layer	Purpose	Examples
Physical Layer	Handles hardware-level transmission such as sensors, actuators, radios, and modulation.	LoRa, BLE (PHY), Zigbee (PHY), Wi-Fi, Cellular (NB-IoT, LTE-M)
Data Link Layer	Manages MAC addressing, framing, error detection, and local delivery.	IEEE 802.15.4, BLE Link Layer, LoRaWAN
Network Layer	Handles addressing and routing across networks (IP or adapted IP).	IPv6, 6LoWPAN, RPL
Transport Layer	Provides end-to-end data delivery and reliability where required.	UDP, TCP
Security Layer	Ensures encryption, authentication, and integrity.	DTLS, TLS
Application Layer	Defines messaging, device interaction, and application semantics.	MQTT, CoAP, HTTP, LwM2M, AMQP

IoT Stack Preferred Languages

Stack Layer	Preferred Languages	Why
Lower Stack (Firmware / Device)	C / C++ / Rust (emerging)	Direct hardware access, deterministic performance, low memory footprint, real-time constraints, zero-cost abstractions.
Middle Stack (Gateway / Edge)	Rust / Python	Protocol translation, buffering, edge analytics, balance of performance and developer productivity.
Upper Stack (Cloud / Data)	Rust / Python	Large-scale data processing, APIs, stream processing, ML orchestration, cloud-native services.

#protocol #stackVer 6.0.5

[Avg. reading time: 3 minutes]

Layers of IoT - Lower Stack

IoT architecture typically consists of several layers, each serving a specific function in the overall system. These layers can be broadly divided into the lower stack and the upper stack.

The lower stack focuses on the physical and network aspects of IoT systems. It includes the following layers:

Physical Devices and Sensors:

Devices and sensors that collect data from the environment. Examples: Smart thermostats, industrial sensors, wearable health monitors.

Device Hardware and Firmware:

Microcontrollers, processors, and firmware that manage device operations. Ensures proper functioning and communication of IoT devices.

Connectivity and Network Layer:

Communication protocols (Wi-Fi, Bluetooth, Zigbee, LoRaWAN, etc.) that transmit data. Network hardware like routers and gateways that facilitate data transmission.

Edge Computing:

Edge devices that process data locally to reduce latency and bandwidth usage. Edge analytics for real-time decision-making without relying on cloud processing.

Power Management:

Battery technologies and energy harvesting methods to power IoT devices. Ensures prolonged operational life of remote and portable devices.

#lowerstack #physicaldevicesVer 6.0.5

[Avg. reading time: 5 minutes]

Layers of IoT - Upper Stack

IoT architecture typically consists of several layers, each serving a specific function in the overall system. These layers can be broadly divided into the lower stack and the upper stack.

The upper stack deals with application, data processing, and user interaction aspects of IoT systems. It includes the following layers:

Data Ingestion Layer

Different Data formats (JSON, Binary)
Message Brokers and queuing systems (RabbitMQ, Apache Kafka)

Data Processing & Storage

Time Series Databases like InfluxDB / TimescaleDB.
Hot vs Cold storage strategies.
Data aggregation techniques.
Edge vs Cloud processing decisions.

Analytical Layer

Realtime analytics
Vizualization frameworks and tools
Anomaly detection systems

Application Interface / Enablement

API (RESTful services)
User authentication / authorization

Enterprise Integration

Data transformation and mapping
Integration with legacy systems

#upperstack #data #integrationlayerVer 6.0.5

[Avg. reading time: 3 minutes]

Puzzle

1. For each of the following IoT components, identify whether it belongs to the upper stack or the lower stack and explain why.

1.1. A mobile app that allows users to control their home lighting system.
1.2. A sensor that measures soil moisture levels in a farm.
1.3. A gateway that translates Zigbee protocol data to Wi-Fi for transmission to the cloud.
1.4. A cloud-based analytics platform that processes data from smart meters.
1.5. Firmware running on a smart thermostat that controls HVAC systems.

2. Determine whether the following statements are true or false.

2.1 Edge computing is part of the upper stack in IoT systems.
2.2 User authentication and data encryption are important aspects of the lower stack.
2.3 A smart refrigerator that sends notifications to your phone about expired food items involves both upper and lower stack components.
2.4 Zigbee and Bluetooth are commonly used for high-bandwidth IoT applications.
2.5 Predictive maintenance in industrial IoT primarily utilizes data from the upper stack.

#puzzle #iotVer 6.0.5

[Avg. reading time: 2 minutes]

Data Processing

Application Layer
1. MQTT
2. JSON
3. CBOR
4. XML
5. TCP-UDP
6. MessagePack
7. Protocol Buffers
8. HTTP & REST API
9. CoAP
Python Environment
Containers
Time Series Databases
1. InfluxDB
Data Visualization libraries
1. GrafanaVer 6.0.5

[Avg. reading time: 5 minutes]

Application Layer

Application Protocols

Lightweight protocols designed for IoT communication:

MQTT (Message Queuing Telemetry Transport):

Device → MQTT Broker → Server
Publish-subscribe model over TCP/IP.
Ideal for unreliable networks (e.g., remote sensors).

CoAP (Constrained Application Protocol):

RESTful, UDP-based protocol for low-power devices.
Features: Observe mode, resource discovery, DTLS security.

HTTP/HTTPS:

Used for cloud integration (less efficient than CoAP/MQTT).

LwM2M (Lightweight M2M):

Device management protocol built on CoAP.

Data Formats

JSON: Human-readable format for APIs and web services.

CBOR (Concise Binary Object Representation): Binary format for efficiency (used with CoAP).

XML: Less common due to larger payload size.

APIs and Services

RESTful APIs: Enable integration with cloud platforms (e.g., AWS IoT, Azure IoT).

WebSocket: Real-time bidirectional communication.

Device Management: Firmware updates, remote configuration (via LwM2M).

Security Mechanisms

DTLS (Datagram TLS): Secures CoAP communications.

TLS/SSL: Used for MQTT and HTTP.

Authentication: OAuth, API keys, X.509 certificates.

Why the Application Layer Matters

Efficiency: Protocols like CoAP minimize overhead for low-power devices.

Scalability: Supports thousands of devices in large-scale deployments.

Interoperability: Enables integration with existing web infrastructure (e.g., HTTP).

Security: Ensures data integrity and confidentiality in sensitive applications.

Challenges in IoT Application Layers

Fragmentation: Multiple protocols (CoAP, MQTT, HTTP) complicate interoperability.

Resource Constraints: Limited compute/memory on devices restricts protocol choices.

Latency: Real-time applications require optimized data formats and protocols.

#applicationlayer #protocols #formats #api #servicesVer 6.0.5

[Avg. reading time: 18 minutes]

MQTT - Message Queuing Telemetry Transport

MQTT is one of the most widely used messaging protocols in the Internet of Things (IoT).

It was originally developed by IBM in 1999 and later standardized by OASIS. MQTT became popular in IoT because it is simple, lightweight, and designed for unreliable networks.

MQTT works well on:

Low bandwidth networks
High latency connections
Intermittent or unreliable connectivity

Unlike HTTP, MQTT uses a binary message format, making it far more efficient for constrained devices such as sensors and embedded systems.

Why MQTT Exists

Traditional request–response protocols like HTTP are inefficient for IoT devices.

MQTT was designed to:

Minimize network usage
Reduce device CPU and memory consumption
Support asynchronous, event-driven communication

Work reliably even when devices disconnect frequently.

Core MQTT Concepts

Publish–Subscribe Model
MQTT uses a publish–subscribe architecture.
Devices publish messages to a broker
Devices subscribe to topics they are interested in
The broker routes messages to matching subscribers
Devices never communicate directly with each other.

MQTT Components

MQTT Broker

The broker is the central message hub.
Think of it like a post office:
Receives messages from publishers
Filters messages by topic
Delivers messages to subscribers

Common brokers:

Open source: Mosquitto
Commercial: HiveMQ

Register with hivemq cloud

Publishers

Devices that send data

Example:

Temperature sensor publishing readings
Garage door device publishing open or close status

Subscribers

Devices that receive data

Example:

Mobile app receiving temperature updates
Backend system monitoring device health

Topics

Topics are hierarchical strings used to route messages.

Example:

home/livingroom/temperature

Publishers send messages to a topic
Subscribers subscribe to topics of interest
The broker matches topics and delivers messages

Topic Wildcards

MQTT supports topic wildcards for flexible subscriptions.

Single-level wildcard

Matches exactly one level

Example:

home/+/temperature

Multi-level wildcard

Matches all remaining levels

Example:

home/#

Key Features of MQTT

Lightweight and Efficient

Small packet size
Minimal protocol overhead
Ideal for constrained devices

Bidirectional Communication

Devices can both publish and subscribe
Enables real-time updates and control

Highly Scalable

Supports thousands to millions of devices
Widely used in large IoT and IIoT deployments

Configurable Reliability

Supports different Quality of Service levels
Lets you trade reliability for performance

Session Persistence and Buffering

Brokers can store messages when clients disconnect
Messages are delivered when clients reconnect

Security Support

MQTT itself has no built-in security
Security is added using:
TLS encryption
Client authentication
Access control at the broker

Git Hub Example Code

graph LR
    B[MQTT Broker]
    CD1[Client Device]
    CD2[Client Device]
    CD3[Client Device]
    CD4[Client Device]
    CD5[Client Device]

    CD1 -->|Topic 2| B
    CD1 -->|Topic 1| B
    CD2 -->|Topic 2| B
    
    B -->|Topic 2| CD3
    B -->|Topic 1 
    Topic 3| CD4
    B -->|Topic 3| CD5

Quality of Service (QoS)

MQTT defines three QoS levels for message delivery. QoS is coordinated by the broker.

QoS 0 – At most once

No acknowledgment
Messages may be lost
Lowest latency
Use when message loss is acceptable
Example: Temperature sensor every 2 seconds. High volume of data.

QoS 1 – At least once

Message delivery is acknowledged
Messages may be duplicated
Commonly used in IoT
Use when Message loss is unacceptable and duplicate messages can be handled
Deduplication handled by message id.
Example: Smart meter readings. Door open/close.

QoS 2 – Exactly once

Guarantees single delivery
Highest overhead
Increased latency
Use only when message loss and duplication are both unacceptable.
Example: control commands, critical alerts, factory machine shutdown.

Higher QoS levels consume more network and compute resources.

Pub QoS 1, Sub QoS 0 → delivered as QoS 0
Pub QoS 2, Sub QoS 1 → delivered as QoS 1
Pub QoS 0, Sub QoS 2 → delivered as QoS 0

Message Persistence

Message persistence ensures messages are not lost when clients disconnect.

Non-persistent (Default)

Messages are not stored
Lost if subscriber is offline
Suitable for non-critical data

Queued Persistent

Broker stores messages for offline clients
Messages delivered when client reconnects

Similar to: Emails waiting on a server until you connect

Persistent with Acknowledgment

Messages stored until acknowledged
Messages resent until confirmation

Used when: Guaranteed processing is required

Persistent Session Stores

When persistence is enabled, brokers may store:

Client ID
Subscription list
Unacknowledged QoS messages
Queued messages

CONN Car Company

Vehicles are shifting from hardware to Software Defined Vehicles. (EVs like Tesla)

MQTT is used for:

Telemetry streaming
Remote diagnostics
Over-the-air updates
Feature enablement

EV companies use MQTT to connect vehicles, cloud systems, and mobile apps reliably.

MQTT doesn’t stop here

MQTT integrates with:

Cloud platforms
Data pipelines
Streaming systems
Analytics and monitoring tools

Source YouTube Links

(https://www.youtube.com/watch?v=brUsw_H9Gq8)

(https://www.youtube.com/watch?v=k103_LhF05w)

Advanced Learning about Brokers

https://www.hivemq.com/blog/mqtt-brokers-beginners-guide/

Download the Open Source Broker to learn more https://mosquitto.org/

#mqtt #http #broker #publisher #subscriber

1: http://hivemq.comVer 6.0.5

[Avg. reading time: 8 minutes]

JSON

JSON (JavaScript Object Notation) is a lightweight, text-based data format that’s easy to read for both humans and machines. It was derived from JavaScript but is now language-independent, making it one of the most popular formats for data exchange between applications. Key Concepts:

What is JSON Used For?

Storing configuration settings
Exchanging data between web servers and browsers
APIs (Application Programming Interfaces)
Storing structured data in files or databases
Mobile app data storage

JSON Data Types:

Strings: Text wrapped in double quotes

{"name": "Rachel Green"}

Numbers: Integer or floating-point

{"age": 27, "height": 5.5}

Booleans: true or false

{"isStudent": true}

null: Represents no value

{"middleName": null}

Arrays: Ordered lists of values

{
  "hobbies": ["shopping", "singing", "swimming"]
}

Objects: Collections of key-value pairs

{
  "address": {
    "street": "123 Main St",
    "city": "NYC",
    "zipCode": "10001"
  }
}

Important Rules:

All property names must be in double quotes
Values can be strings, numbers, objects, arrays, booleans, or null
Commas separate elements in arrays and properties in objects
No trailing commas allowed
No comments allowed in JSON
Must use UTF-8 encoding

Example

{
  "studentInfo": {
    "firstName": "Monica",
    "lastName": "Geller",
    "age": 22,
    "isEnrolled": true,
    "courses": [
      {
        "name": "Web Development",
        "code": "CS101",
        "grade": 95.5
      },
      {
        "name": "Database Design",
        "code": "CS102",
        "grade": 88.0
      }
    ],
    "contact": {
      "email": "monica.g@friends.com",
      "phone": null,
      "address": {
        "street": "456 College Ave",
        "city": "Columbia",
        "state": "NY",
        "zipCode": "13357"
      }
    }
  }
}

Dont’s with JSON

Using single quotes instead of double quotes
Not enclosing property names in quotes
Adding trailing commas
Missing closing brackets or braces
Using undefined or functions (not allowed in JSON)
Adding comments (not supported in JSON)

Best Practices

Always validate JSON using a JSON validator tool
Pay attention to proper nesting of objects and arrays
Ensure all opening brackets/braces have matching closing ones
Check for proper use of commas

camelCase (e.g., firstName):

Most popular in JavaScript/JSON
Easy to read and type
Matches JavaScript convention

Example:

{
  "firstName": "John",
  "lastLoginDate": "2024-12-20",
  "phoneNumber": "555-0123"
}

snake_case (underscores, e.g., first_name):

Popular in Python and SQL
Very readable
Clear word separation

Example:

{
  "first_name": "John",
  "last_login_date": "2024-12-20",
  "phone_number": "555-0123"
}

kebab-case (hyphens, e.g., first-name):

Common in URLs and HTML attributes
NOT recommended for JSON
Can cause issues because hyphen is also the subtraction operator
Requires bracket notation to access in JavaScript

Example of why it’s problematic:

// This won't work
data.first-name  // JavaScript interprets as data.first minus name

// Must use bracket notation
data["first-name"]  // Works but less convenient

#json #dataformatVer 6.0.5

[Avg. reading time: 9 minutes]

CBOR (Concise Binary Object Representation)

CBOR is a compact binary data format designed for efficiency, speed, and low overhead. It keeps JSON’s simplicity while delivering 30–50% smaller payloads and faster serialization, making it ideal for IoT, embedded systems, and high-throughput APIs.

https://cbor.dev

Why CBOR

JSON is human-friendly but wasteful for machines.

CBOR is Binary

Binary encoding instead of text
Smaller payloads
Faster parsing
Native binary support
Better fit for constrained environments

Use CBOR when:

Bandwidth is expensive
Latency matters
Devices are constrained
Message rates are high

Key Features

Binary Format

Compact payloads
Lower bandwidth usage
Faster transmission

Self-Describing

Encodes type information directly
No external schema required to decode

Schema-Less (Schema Optional)

Works like JSON
Supports validation using CDDL (Consise Data Definition Language)

Fast Serialization & Parsing

No expensive string parsing
Lower CPU overhead

Extensible

Supports semantic tags for:
Date / Time
URIs
Application-specific meanings

Data Types & Structure

CBOR natively supports JSON-like data structures:

Primitive Types:

Integers (positive, negative)
Byte strings (bstr)
text strings (tstr)
Floating-point numbers (16,32,64 bit)
Booleans (true, false)
null, and undefined values.

Composite Types:

Arrays (ordered lists)
Maps (key-value pairs, similar to JSON objects)

Semantic Tags:

Optional tags to add meaning (e.g., Tag 0 for date/time strings, Tag 32 for URIs).

Example: CBOR vs. JSON

JSON Object

{
  "id": 123,
  "name": "Temperature Sensor",
  "value": 25.5,
  "active": true
}

CBOR to/from JSON

cbor.williamchong.cloud

CBOR Playground

cbor.me

CBOR Encoding (Hex Representation)

B9 0004                                 # map(4)
   62                                   # text(2)
   6964                                 # "id"
   18 7B                                # unsigned(123)
   64                                   # text(4)
   6E616D65                             # "name"
   72                                   # text(18)
   54656D70657261747572652053656E736F72 # "Temperature Sensor"
   65                                   # text(5)
   76616C7565                           # "value"
   FB 4039800000000000                  # primitive(4627870829588250624)
   66                                   # text(6)
   616374697665                         # "active"
   F5                                   # primitive(21)

Size Comparison:

JSON: ~70 bytes.
CBOR: ~45 bytes (35% smaller)

Feature	CBOR	JSON/XML
Payload Size	Compact binary encoding (~30-50% smaller).	Verbose text-based encoding
Parsing Speed	Faster (no string parsing).	Slower (text parsing required).
Data Types	Rich (supports bytes, floats, tags).	Limited (no native byte strings).
Schema Flexibility	Optional schemas (CDDL).	Often requires external schemas.
Human Readability	Requires tools to decode.	Easily readable.

Limitations

Human-Unreadable: Requires tools (e.g., CBOR Playground) to decode.

Schema Validation: While optional, validation requires external tools like CDDL (Concise Data Definition Language).

When to Use CBOR

Low-bandwidth networks (e.g., IoT over LoRaWAN or NB-IoT).
High-performance systems needing fast serialization.
Interoperability between devices and web services.

Demo Code

git clone https://github.com/gchandra10/python_cbor_examples

CBOR + MQTT = Perfect Match

CBOR is ideal for MQTT payloads

Demonstrate how cbor can be used with mqtt.

#cbor #dataformatVer 6.0.5

[Avg. reading time: 4 minutes]

XML

XML (eXtensible Markup Language) is moderately popular in IoT.

With JSON gaining popularity, XML is still used in Legacy systems and regulated environments such as Govt/Military systems.

It uses XSD (Extended Schema Definition) to enforce strict data validation, ensuring integrity in critical applications like healthcare.

Legacy systesm use SOAP-based web services (newer ones use REST API) often use XML, rquiring IoT devices to adopt XML for compatibility.

<sensorData>
    <deviceId>TEMP_SENSOR_01</deviceId>
    <location>living_room</location>
    <reading>
        <temperature>23.5</temperature>
        <unit>Celsius</unit>
        <timestamp>2025-01-29T14:30:00</timestamp>
    </reading>
</sensorData>

Limitations of XML in IoT

Verbosity: Larger payloads increase bandwidth and storage costs.
Processing Overhead: Parsing XML can strain low-power IoT devices.
Modern Alternatives: JSON and binary formats (e.g., Protocol Buffers) are more efficient for most IoT use cases.

Here’s the XML vs. JSON Trade-offs comparison formatted as a markdown table:

Factor	XML	JSON
Payload Size	Verbose (larger files)	Compact (better for low-bandwidth IoT)
Parsing Speed	Slower (complex structure)	Faster (lightweight parsing)
Validation	Mature (XSD)	Growing (JSON Schema)
Adoption in New Projects	Rare (outside legacy/regulated use cases)	Dominant (preferred for new IoT systems)

#xml #dataformatVer 6.0.5

[Avg. reading time: 6 minutes]

TCP & UDP

Transmission Control Protocol
User Datagram Protocol

TCP and UDP are transport protocols. Their only job is to decide how data moves across the network.

Common IoT problems

Sensors generate data continuously
Networks are unreliable
Devices are constrained
Some data losses are acceptable and some are not.

UDP

Sends data without confirmation
No retries
No ordering
No connection
Very low overhead

UDP Usecases in IOT

Battery powered devices
High frequency telemetry
Small payloads
Occasional loss is acceptable
Speed matters more than accuracy

Typical IoT usage

CoAP
Device discovery
Heartbeats
Periodic measurements of Environmental sampling

Example

Smart street lighting

Each lamp sends a heartbeat every 5 to 10 seconds
Payload: device_id, status, battery, signal strength
If ‘n’ heartbeats are missed, mark lamp as offline
Losing one packet changes nothing.

Vehicle Telematics

Fleet vehicles send location and health pings
One ping every few seconds
Next ping overrides the previous

TCP

Confirms delivery
Retries lost data
Preserves order
Maintains a connection
Higher overhead

TCP use cases in IoT

Data must not be lost
Order matters
Sessions last minutes or hours

Typical IoT usage

MQTT
HTTP
HTTPS
TLS secured pipelines

With MQTT

Ordered messages
Delivery guarantees using QoS
Persistent sessions
Broker side buffering
Fan out to many subscribers

UDP vs TCP

Question	UDP	TCP
Is delivery guaranteed	No	Yes
Is ordering preserved	No	Yes
Is it lightweight	Yes	No
Does MQTT use it	No	Yes
Does CoAP use it	Yes	No
Best for battery devices	Yes	Sometimes
Best for critical data	No	Yes

          SENSOR
            |
            |
     -----------------
     |               |
   UDP Path         TCP Path
     |               |
 No confirmation   Confirmed delivery
 No retry          Retry on failure
 Possible loss     Ordered messages
     |               |
   CoAP           MQTT Broker
                     |
               Persistent sessions
                     |
                 Cloud Applications
````<span id='footer-class'>Ver 6.0.5</span>
<footer id="last-change">Last change: 2026-02-05</footer>````

[Avg. reading time: 8 minutes]

MessagePack

A compact binary data interchange format

What is MessagePack

MessagePack is an efficient binary serialization format designed for fast and compact data exchange between systems.

Core properties

Compact compared to text formats like JSON
Fast serialization and deserialization
Cross-language support across many ecosystems
Flexible data model with optional extensions

Why MessagePack

MessagePack solves a very specific problem:

JSON is easy to read but inefficient on the wire
IoT and distributed systems care about bytes, latency, and CPU
MessagePack keeps JSON-like simplicity but removes text overhead

In short, JSON Data model with Binary efficiency.

Key Use Cases

IoT telemetry and device data
Edge gateways aggregating high-frequency events
Microservice-to-microservice communication
Caching layers like Redis and Memcached
Distributed systems logging and checkpoints

MessagePack vs JSON

Binary and compact
Faster to parse
Smaller payloads for most data
Not human-readable
Debugging requires tooling

MessagePack vs CBOR

MessagePack is simpler and lighter
CBOR supports semantic tags like datetime and URI
CBOR supports deterministic encoding for hashing and signatures
Size differences are workload-dependent, not guaranteed

Comparison with Similar Formats

Feature	MessagePack	JSON	CBOR
Encoding	Binary	Text	Binary
Human-readable	No	Yes	No
Data Size	Small (varies)	Large	Small (varies)
Schema Required	No	No	No
Standardization	Community	RFC 8259	RFC 8949
Binary Data Support	Native	Base64	Native
Semantic Tags	No	No	Yes
Deterministic Encoding	No	No	Yes

Key Differences:

vs JSON: 20-30% smaller payloads, faster parsing, but not human-readable
vs CBOR: More compact for simple types, CBOR has better semantic tagging

Basic Operations

packb() converts Python objects to MessagePack bytes
unpackb() converts MessagePack bytes back to Python objects

Python Example

git clone https://github.com/gchandra10/python_messagepack_examples.git

MessagePack in IoT and Edge Systems

Commonly used in edge gateways and ingestion pipelines
Efficient for short, frequent telemetry messages
Suitable for MQTT payloads where the broker is payload-agnostic
Rarely used directly in regulated firmware layers

Important:

MQTT does not care about payload format
MessagePack is an application-layer choice, not a protocol requirement

Summary

When to Choose MessagePack

Bandwidth or memory is constrained
JSON is too verbose
Binary data is common
Speed matters more than readability
Schema flexibility is acceptable

What MessagePack Does Not Do

No schema enforcement
No backward compatibility guarantees
No semantic meaning for fields
No built-in validation
No deterministic encoding

Devices like AppleWatch, Fitbit use Protocol Buffers for strict schema FDA regulated enforcement.

#dataformat #messagepackVer 6.0.5

[Avg. reading time: 12 minutes]

Protocol Buffers

What are Protocol Buffers

A method to serialize structured data into binary format
Created by Google
Its like JSON, but smaller and faster.
Protocol Buffers are more commonly used in industrial IoT scenarios.

Why Protobuf is great for IoT

Smaller size: Uses binary format instead of text, saving bandwidth
Faster processing: Binary format means less CPU usage on IoT devices
Strict schema: Helps catch errors early
Language neutral: Works across different programming languages
Great for limited devices: Uses less memory and battery power
Extensibility: Add new fields to your message definitions without breaking existing code.

Industrial Use Cases

Bridge structural sensors (vibration, stress)
Factory equipment monitors
Power grid sensors
Oil/gas pipeline monitors
Wind turbine telemetry
Industrial HVAC systems

Why Industries prefer Protobuf:

High data volume (thousands of readings per second)
Need for efficient bandwidth usage
Complex data structures
Multiple systems need to understand the data
Long-term storage requirements
Cross-platform compatibility needs

graph LR
    subgraph Bridge["Bridge Infrastructure"]
        S1[Vibration Sensor] --> GW
        S2[Strain Gauge] --> GW
        S3[Temperature Sensor] --> GW
        subgraph Gateway["Linux Gateway (Solar)"]
            GW[Edge Gateway]
            DB[(Local Storage)]
            GW --> DB
        end
    end
    
    subgraph Communication["Communication Methods"]
        GW --> |4G/LTE| Cloud
        GW --> |LoRaWAN| Cloud
        GW --> |Satellite| Cloud
    end
    
    Cloud[Cloud Server] --> DA[Data Analysis]
    
    style Bridge fill:#87CEEB,stroke:#333,stroke-width:2px,color:black
    style Gateway fill:#90EE90,stroke:#333,stroke-width:2px,color:red
    style Communication fill:#FFA500,stroke:#333,stroke-width:2px,color:black
    style Cloud fill:#4169E1,stroke:#333,stroke-width:2px,color:white
    style DA fill:#4169E1,stroke:#333,stroke-width:2px,color:white
    style GW fill:#000000,stroke:#333,stroke-width:2px,color:white
    style DB fill:#800020,stroke:#333,stroke-width:2px,color:white
    
    classDef sensor fill:#00CED1,stroke:#333,stroke-width:1px,color:black
    class S1,S2,S3 sensor

Consumer IoT devices (in general)

Use simpler formats (JSON, proprietary)
Have lower data volumes
Work within closed ecosystems (Google Home, Apple HomeKit)
Don’t need the optimization Protobuf provides

Data Types in Protobufs

Scalar Types:

int32, int64, uint32, uint64, sint32, sint64, fixed32, fixed64, sfixed32, sfixed64 float, double, bool, string, bytes

Composite Types:

message: Defines a structured collection of other fields.
enum: Defines a set of named integer constants.

Collections:

repeated: Allows you to define a list of values of the same type. Like Array.

Steps involved in creating a Proto Buf data file.

Step 1: Define the Data Structure of your data file as .proto text file. Ex: my_data.proto

syntax = "proto3";

message MyData {
  int32 id = 1;
  string name = 2;
  float value = 3;
}

Step 2: Compile the .proto file to Python Class (.pb) or Java Class (.java) using protoc library.

protoc --python_out=. my_data.proto

Generates my_data_pb2.py

Install Protoc

Step 3: Use the Generated Python Class file and use it to store data.

Note: Remember protoc –version should be same or closer as protobuf minor version number from pypi library.

In my setup protoc –version = 29.3, pypi protobuf = 5.29.2 Minor version of protobuf is 29.2 which is closer to 29.3

See example.

Demo Script

git clone https://github.com/gchandra10/python_protobuf_demo

flowchart LR
    subgraph Sensor["Temperature/Humidity Sensor"]
        S1[DHT22/BME280]
    end
    
    subgraph MCU["Microcontroller"]
        M1[ESP32/Arduino]
    end
    
    subgraph Gateway["Gateway/Edge Device"]
        G1[Raspberry Pi/\nIntel NUC]
    end
    
    subgraph Cloud["Cloud Server"]
        C1[AWS/Azure/GCP]
    end
    
    S1 -->|Raw Data 23.5°C, 45%| M1
    M1 -->|"JSON over MQTT {temp: 23.5,humidity: 45}"| G1
    G1 -->|Protocol Buffers\nover HTTPS| C1

#protobuf #googleVer 6.0.5

[Avg. reading time: 2 minutes]

HTTP Basics

HTTP (HyperText Transfer Protocol) is the foundation of data communication on the web, used to transfer data (such as HTML files and images).

GET - Navigate to a URL or click a link in real life.

POST - Submit a form on a website, like a username and password.

Popular HTTP Status Codes

200 Series (Success): 200 OK, 201 Created.

300 Series (Redirection): 301 Moved Permanently, 302 Found.

400 Series (Client Error): 400 Bad Request, 401 Unauthorized, 404 Not Found.

500 Series (Server Error): 500 Internal Server Error, 503 Service Unavailable.

We already learnt about Monolithic and Microservices.

#http #status #monolithic #microservicesVer 6.0.5

[Avg. reading time: 9 minutes]

Statefulness

The server stores information about the client’s current session in a stateful system. This is common in traditional web applications. Here’s what characterizes a stateful system:

Session Memory: The server remembers past interactions and may store session data like user authentication, preferences, and other activities.

Server Dependency: Since the server holds session data, the same server usually handles subsequent requests from the same client. This is important for consistency.

Resource Intensive: Maintaining state can be resource-intensive, as the server needs to manage and store session data for each client.

Example: A web application where a user logs in, and the server keeps track of their authentication status and interactions until they log out.

sequenceDiagram
    participant C as Client
    participant LB as Load Balancer
    participant S1 as Server 1
    participant S2 as Server 2
    
    Note over C,S2: Initial Session Establishment
    C->>LB: Initial Request
    LB->>S1: Forward Request
    S1-->>LB: Response (Session ID)
    LB-->>C: Response (Session ID)
    
    rect rgb(255, 255, 200)
        Note over C,S2: Sticky Session Established
    end
    
    Note over C,S2: Session Continuation
    C->>LB: Subsequent Request (with Session ID)
    LB->>S1: Forward Request (based on Session ID)
    S1-->>LB: Response (Data)
    LB-->>C: Response (Data)
    
    rect rgb(255, 255, 200)
        Note over C,S2: Session Continues on Server 1
    end
    
    Note over C,S2: Session Termination
    C->>LB: Logout Request
    LB->>S1: Forward Logout Request
    S1-->>LB: Confirmation
    LB-->>C: Confirmation
    
    rect rgb(255, 255, 200)
        Note over C,S2: Session Ended
    end
    
    rect rgb(255, 255, 200)
        Note right of S2: Server 2 remains unused due to stickiness
    end

Stickiness (Sticky Sessions)

Stickiness or sticky sessions are used in stateful systems, particularly in load-balanced environments. It ensures that requests from a particular client are directed to the same server instance. This is important when:

Session Data: The server needs to maintain session data (like login status), and it’s stored locally on a specific server instance.

Load Balancers: In a load-balanced environment, without stickiness, a client’s requests could be routed to different servers, which might not have the client’s session data.

Trade-off: While it helps maintain session continuity, it can reduce the load balancing efficiency and might lead to uneven server load.

Methods of Implementing Stickiness

Cookie-Based Stickiness: The most common method, where the load balancer uses a special cookie to track the server assigned to a client.

IP-Based Stickiness: The load balancer routes requests based on the client’s IP address, sending requests from the same IP to the same server.

Custom Header or Parameter: Some load balancers can use custom headers or URL parameters to track and maintain session stickiness.

#statefulnessVer 6.0.5

[Avg. reading time: 7 minutes]

Statelessness

In a stateless system, each request from the client must contain all the information the server needs to fulfill that request. The server does not store any state of the client’s session. This is a crucial principle of RESTful APIs. Characteristics include:

No Session Memory: The server remembers nothing about the user once the transaction ends. Each request is independent.

Scalability: Stateless systems are generally more scalable because the server doesn’t need to maintain session information. Any server can handle any request.

Simplicity and Reliability: The stateless nature makes the system simpler and more reliable, as there’s less information to manage and synchronize across systems.

Example: An API where each request contains an authentication token and all necessary data, allowing any server instance to handle any request.

sequenceDiagram
    participant C as Client
    participant LB as Load Balancer
    participant S1 as Server 1
    participant S2 as Server 2
    
    C->>LB: Request 1
    LB->>S1: Forward Request 1
    S1-->>LB: Response 1
    LB-->>C: Response 1
    
    C->>LB: Request 2
    LB->>S2: Forward Request 2
    S2-->>LB: Response 2
    LB-->>C: Response 2
    
    rect rgb(255, 255, 200)
        Note over C,S2: Each request is independent
    end

In this diagram:

Request 1: The client sends a request to the load balancer.

Load Balancer to Server 1: The load balancer forwards Request 1 to Server 1.

Response from Server 1: Server 1 processes the request and sends a response back to the client.

Request 2: The client sends another request to the load balancer.

Load Balancer to Server 2: This time, the load balancer forwards Request 2 to Server 2.

Response from Server 2: Server 2 processes the request and responds to the client.

Statelessness: Each request is independent and does not rely on previous interactions. Different servers can handle other requests without needing a shared session state.

Token-Based Authentication

Common in stateless architectures, this method involves passing a token for authentication with each request instead of relying on server-stored session data. JWT (JSON Web Tokens) is a popular example.

#statelessnessVer 6.0.5

[Avg. reading time: 9 minutes]

REST API

REpresentational State Transfer is a software architectural style developers apply to web APIs.

REST APIs provide simple, uniform interfaces because they can be used to make data, content, algorithms, media, and other digital resources available through web URLs. Essentially, REST APIs are the most common APIs used across the web today.

Use of a uniform interface (UI)

HTTP Methods

GET: This method allows the server to find the data you requested and send it back to you.

POST: This method permits the server to create a new entry in the database.

PUT: If you perform the ‘PUT’ request, the server will update an entry in the database.

DELETE: This method allows the server to delete an entry in the database.

Sample REST API

https://api.zippopotam.us/us/08028

http://api.tvmaze.com/search/shows?q=friends

https://jsonplaceholder.typicode.com/posts

https://jsonplaceholder.typicode.com/posts/1

https://jsonplaceholder.typicode.com/posts/1/comments

https://reqres.in/api/users?page=2

https://reqres.in/api/users/2

More examples

http://universities.hipolabs.com/search?country=United+States

https://itunes.apple.com/search?term=pop&limit=1000

https://www.boredapi.com/api/activity

https://techcrunch.com/wp-json/wp/v2/posts?per_page=100&context=embed

CURL

Install curl (Client URL)

curl is a CLI application available for all OS.

https://curl.se/windows/

brew install curl

Usage

curl https://api.zippopotam.us/us/08028

curl https://api.zippopotam.us/us/08028 -o zipdata.json

Browser based

https://httpie.io/app

VS Code based

Get Thunder Client

Using Python

git clone https://github.com/gchandra10/python_read_restapi

Summary

Definition: REST (Representational State Transfer) API is a set of guidelines for building web services. A RESTful API is an API that adheres to these guidelines and allows for interaction with RESTful web services.

How It Works: REST uses standard HTTP methods like GET, POST, PUT, DELETE, etc. It is stateless, meaning each request from a client to a server must contain all the information needed to understand and complete the request.

Data Format: REST APIs typically exchange data in JSON or XML format.

Purpose: REST APIs are designed to be a simple and standardized way for systems to communicate over the web. They enable the backend services to communicate with front-end applications (like SPAs) or other services.

Use Cases: REST APIs are used in web services, mobile applications, and IoT (Internet of Things) applications for various purposes like fetching data, sending commands, and more.

#restapi #restVer 6.0.5

[Avg. reading time: 7 minutes]

CoAP

Introduction

CoAP stands for Constrained Application Protocol, developed by the IETF. (Internet Engineering Task Force).

Constrained Environments

Energy constraints.
Memory limitations.
Processing capibility.
Unattended network operation.
Unreliable networks.
Higher Latency in communication.

Popular Examples:

Small embedded devices.
Runs on small capacity batteries.
Minimum RAM/ROM.

Products

Window / Door sensors.
Healthcare Wearables.

Key Features

Request-Response Model: CoAP operates on a request-response communication model, much like its counterpart HTTP. In this model, a client sends a CoAP request to a server, and the server responds with the requested data or action. CoAP’s request methods mirror those of HTTP, including GET, PUT, POST, and DELETE.

CoAP is like HTTP, but runs on UDP for better efficiency. HTTP is TCP-based, while CoAP targets smaller data exchanges.

Best used for applications wanting to read sensor data effectively.

Multicast Support: CoAP incorporates support for multicast communication, enabling a single CoAP message to be sent to multiple recipients simultaneously. Multicast communication reduces network traffic and efficiently disseminates data, a feature that aligns well with resource-constrained environments.

Layers in CoAP

Application Layer
Request / Response Layer
Message Layer
UDP Layer
Message & UDP are Lower Layers
Application and Request/Response are Upper Layers

Message Types

Confirmable Messaging (CON)
Non Confirmable Messaging (NON)
Acknowledgeable Messaging (ACK)
Reset Messaging (RST)

Features of CoAP

Lightweight, asynchoronous and efficient with a 4-byte header.
Runs connectionless on UDP, allowing for quick message exchanges.
Easily integrates with HTTP for seamless web connectivity.
Supports common methods: GET, POST, PUT, DELETE, dealing with these aynchronously.

Feature	MQTT	CoAP
Base Protocol	TCP	UDP
Security	SSL/TLS	DTLS (less common)
Reliability	High	Medium
Power Usage	Higher	Lower
Scalability	High	Medium
Message Model	Publish-Subscribe	Request-Response
Multicast	Limited	Supported
QoS Levels	Yes (3 levels)	No (basic)
Latency	Medium	Low
Communication Pattern	One-to-Many	One-to-One, One-to-Many
Standardization	OASIS Standard	IETF Standard
Complexity	Higher due to Broker Setup	Lower, simpler implementation
Use Cases	Real-time Data Transfer, Reliable Communication	Low-Power, Low-Bandwidth Applications

Demo

git clone https://github.com/gchandra10/python_coap_examples.git

#protocol #coapVer 6.0.5

[Avg. reading time: 17 minutes]

Python Environment

Prerequisites

For Windows Users

Install GitBash
- Download and install from: https://gitforwindows.org/
- This provides a Unix-like terminal environment

Step 1: Install Poetry

Visit the official installer page: https://python-poetry.org/docs/main/
Follow the installation instructions for your operating system
Verify your installation:

   poetry --version

Step 2: Configure Poetry Shell

Add the shell plugin to enable virtual environment management:

poetry self add poetry-plugin-shell

Step 3: Create Your First Project

Create a new project:

poetry new helloworld

This creates a new directory with the following structure:

helloworld/
├── pyproject.toml
├── README.md
├── helloworld/
│   └── __init__.py
└── tests/
    └── __init__.py

Navigate to your project directory

cd helloworld

Windows Users

Step 4: Working with Virtual Environments

Create and activate a virtual environment:

poetry shell

Switch Python Interpreter to Virtual Environment

poetry env info

poetry env info -e

poetry env <paste the path>

Or Use this one line.

poetry env use  $(poetry env info -e)

Add project dependencies:

poetry add pandas

Create a main.py under hellworld (subfolder)

main.py

print("hello world")

Step 5: Managing Your Project

View all installed dependencies:

poetry show

Update dependencies:

poetry update

Remove a dependency:

poetry remove package-name

Run program

poetry run python main.py

Exit Virtual Environment

exit

Key Benefits for Beginners

Simplified Environment Management

Poetry automatically creates and manages virtual environments
No need to manually manage pip and virtualenv

Clear Dependency Specification

All dependencies are listed in one file (pyproject.toml)
Dependencies are automatically resolved to avoid conflicts

Project Isolation

Each project has its own dependencies
Prevents conflicts between different projects

Easy Distribution

Package your project with a single command:

poetry build

Publish to PyPI when ready:

poetry publish

Best Practices

Always activate virtual environment before working on project
Keep pyproject.toml updated with correct dependencies
Use version constraints wisely
Commit both pyproject.toml and poetry.lock files to version control

![Poetry Setup](https://img.youtube.com/vi/nrm8Lre-x_8/0.jpg)

PEP

PEP, or Python Enhancement Proposal, is the official style guide for Python code. It provides conventions and recommendations for writing readable, consistent, and maintainable Python code.

PEP Conventions

Key Aspects of PEP 8 (Popular ones)

Indentation

Use 4 spaces per indentation level.
Avoid tabs; spaces are preferred.

Line Length

Limit lines to a maximum of 79 characters.
For docstrings and comments, limit lines to 72 characters.

Blank Lines

Use blank lines to separate functions, classes, and code blocks inside functions.
Typically, use two blank lines before defining a class or function.

Imports

Imports should be on separate lines.
Group imports into three sections: standard library, third-party libraries, and local application imports.
Use absolute imports whenever possible.

Naming Conventions

Use snake_case for function and variable names.
Use CamelCase for class names.
Use UPPER_SNAKE_CASE for constants.
Avoid single-character variable names except for counters or indices.

Whitespace

Avoid extraneous whitespace inside parentheses, brackets, or braces.
Use a single space around operators and after commas, but not directly inside function call parentheses.

Comments

Write comments that are clear, concise, and helpful.
Use complete sentences and capitalize the first word.
Use # for inline comments, but avoid them where the code is self-explanatory.

Docstrings

Use triple quotes (“”“) for multiline docstrings.
Describe the purpose, arguments, and return values of functions and methods.

Code Layout

Keep function definitions and calls readable.
Avoid writing too many nested blocks.

Consistency

Be consistent within a project or module, even if it means diverging slightly from PEP 8, as long as it serves the project better.

Linting

Python linting refers to the process of analyzing Python code for potential errors, stylistic inconsistencies, and programming best practices.

Basically review your code to make sure it follows the PEP 8 standard.


# Incorrect
import sys, os

# Correct
import os
import sys

Ruff : Linter and Code Formatter

Ruff is a fast Python linter, formatter, and code analyzer that checks your Python code for common stylistic and syntactical issues.

Install

poetry add ruff

Verify

ruff –version or –help

Demo

example.py

import os, sys 

def greet(name): 
  print(f"Hello, {name}")

def message(name):
    print(f"Hi, {name}")

def calc_sum(a, b): return a+b

greet('World')
greet('Ruff')

poetry run ruff check example.py
poetry run ruff check example.py --fix

poetry run ruff format example.py --check
poetry run ruff format example.py

MyPy : Type Checking Tool

mypy is a static type checker for Python. It checks your code against the type hints you provide, ensuring that the types are consistent throughout the codebase.

It primarily focuses on type correctness—verifying that variables, function arguments, return types, and expressions match the expected types.

Install

    poetry add mypy

    or

    pip install mypy

sample.py

x = 1
x = 1.0
x = True
x = "test"
x = b"test"

print(x)

mypy sample.py

or

poetry run mypy sample.py
````<span id='footer-class'>Ver 6.0.5</span>
<footer id="last-change">Last change: 2026-02-05</footer>````

[Avg. reading time: 11 minutes]

GIL (Global Interpreter Lock)

The Global Interpreter Lock (GIL) is a mutex (mutual exclusion) that allows only one thread to hold control of the Python interpreter. This means only one thread can execute Python bytecode at a time.

By design, GIL affects the performance of Python programs by limiting true parallelism on multi-core processors.

To learn more here is an interesting article https://realpython.com/python-gil/

Thread: Independent sequence of instructions within a program.

Parallel & Concurrent Programming

Concurrent Programming

Managing multiple tasks that can start, run, and complete in overlapping time periods.

Tasks don’t necessarily run at the same exact moment

Like a chef managing multiple dishes in a kitchen - switching between tasks.

Used for: Improved program structure and responsiveness.

Parallel Programming

Multiple tasks or calculations happening simultaneously

Requires multiple processors/cores.

Like multiple chefs working in different kitchen stations simultaneously

Used for: Improved performance through simultaneous execution.

Types of Concurrency

Thread Based

Pros of Threading:

Good for I/O-bound tasks (reading files, network operations)
Shared memory space - easy to share data between threads
Lightweight compared to processes

Cons of Threading:

Global Interpreter Lock (GIL) limits true parallelism
Complex synchronization needed (locks, semaphores)
Race conditions and deadlocks are common pitfalls

Keyword	Description
`threading.Thread()`	Creates a new thread object
`thread.start()`	Starts thread execution
`thread.join()`	Waits for thread completion
`threading.Lock()`	Creates mutex lock for thread synchronization
`threading.Event()`	Event synchronization between threads
`threading.Semaphore()`	Limits access to shared resources
`threading.get_ident()`	Gets current thread identifier

Process Based

Pros of Multiprocessing:

True parallelism (bypasses GIL)
Better for CPU-intensive tasks
Process isolation provides better stability

Cons of Multiprocessing:

Higher memory usage
Slower process creation
More complex data sharing between processes

Keyword	Description
`multiprocessing.Process()`	Creates a new process
`process.start()`	Starts process execution
`process.join()`	Waits for process completion
`multiprocessing.Pool()`	Creates a pool of worker processes
`multiprocessing.Queue()`	Shared queue for inter-process communication
`multiprocessing.Pipe()`	Creates a pipe for process communication
`os.getpid()`	Gets current process ID

Async Programming (Event-Driven)

Single-threaded
Cooperative multitasking
Concurrent Programming

Keyword	Description
`async def`	Defines a coroutine function
`await`	Waits for coroutine completion
`asyncio.create_task()`	Schedules a coroutine for execution
`asyncio.gather()`	Runs multiple coroutines concurrently
`asyncio.run()`	Runs the async program
`asyncio.sleep()`	Non-blocking sleep operation
`asyncio.Queue()`	Queue for async communication
`asyncio.Event()`	Event synchronization between coroutines

async without await: Your function won’t be recognized as a coroutine await without async: You’ll get a syntax error

git clone https://github.com/gchandra10/python_concurrent_pgmming_examples.git

Types of Tasks

Task Type	Description	Best Handled By	Examples
CPU-Bound	Tasks that primarily involve processor and mathematical calculations	Multiprocessing	- Image processing - Machine learning computations - Complex calculations - Scientific simulations
I/O-Bound	Tasks that primarily wait for input/output operations to complete	Threading or Asyncio	- Database queries - File operations - Network requests - API calls
Memory-Bound	Tasks limited by memory access and transfer speed	Multiprocessing with shared memory	- Large dataset operations - Complex data structures - Memory-intensive algorithms
Cache-Bound	Tasks heavily dependent on CPU cache performance	Threading with cache alignment	- Matrix operations - Frequent data access - Array manipulations
Network-Bound	Tasks limited by network communication speed	Asyncio or Threading	- Web scraping - Streaming data - Real-time updates - Cloud operations
Disk-Bound	Tasks limited by disk read/write operations	Asyncio or Threading	- File reading/writing - Log processing - Database file operations

[Avg. reading time: 15 minutes]

Code Quality & Safety

Type Hinting/Annotation

Type Hint

A type hint is a notation that suggests what type a variable, function parameter, or return value should be. It provides hints to developers and tools about the expected type but does not enforce them at runtime. Type hints can help catch type-related errors earlier through static analysis tools like mypy, and they enhance code readability and IDE support.

Type Annotation

Type annotation refers to the actual syntax used to provide these hints. It involves adding type information to variables, function parameters, and return types. Type annotations do not change how the code executes; they are purely for informational and tooling purposes.

Benefits

Improved Readability: Code with type annotations is easier to understand.
Tooling Support: IDEs can provide better autocompletion and error checking.
Static Analysis: Tools like mypy can check for type consistency, catching errors before runtime.

Basic Type Hints

age: int = 25
name: str = "Rachel"
is_active: bool = True
price: float = 19.99

Here, age is annotated as an int, and name is annotated as a str.

Collections

from typing import List, Set, Dict, Tuple

# List type hints
numbers: List[int] = [1, 2, 3]
names: List[str] = ["Alice", "Bob"]

# Set type hints
unique_ids: Set[int] = {1, 2, 3}

# Dictionary type hints
user_scores: Dict[str, int] = {"Alice": 95, "Bob": 87}

# Tuple type hints
point: Tuple[float, float] = (2.5, 3.0)

Function Annotations

def calculate_discount(price: float, discount_percent: float) -> float:
    """Calculate the final price after applying a discount."""
    return price * (1 - discount_percent / 100)

def get_user_names(user_ids: List[int]) -> Dict[int, str]:
    """Return a mapping of user IDs to their names."""
    return {uid: f"User {uid}" for uid in user_ids}

Advanced Type Hints

from typing import Optional, Union

def process_data(data: Optional[str] = None) -> str:
    """Process data with an optional input."""
    if data is None:
        return "No data provided"
    return data.upper()

def format_value(value: Union[int, float, str]) -> str:
    """Format a value that could be integer, float, or string."""
    return str(value)

Best Practices

Consistency: Apply type hints consistently across your codebase.
Documentation: Type hints complement but don’t replace docstrings.
Type Checking: Use static type checkers like mypy.

# Run mypy on your code
mypy your_module.py

Secret Management

Proper secret management is crucial for application security. Secrets include API keys, database credentials, tokens, and other sensitive information that should never be hardcoded in your source code or committed to version control.

Either create them in Shell or .env

Shell

export SECRET_KEY='your_secret_value'

Windows Users

Goto Environment Variables via GUI and create one.

pip install python-dotenv

Create a empty file .env

.env

SECRET_KEY=your_secret_key
DATABASE_URL=your_database_url

main.py


from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Access the environment variables
secret_key = os.getenv("SECRET_KEY")
database_url = os.getenv("DATABASE_URL")

print(f"Secret Key: {secret_key}")
print(f"Database URL: {database_url}")

Best Practices

Never commit secrets to version control

Use .gitignore to exclude .env files
Regularly audit git history for accidental commits

Sample .gitignore

# .gitignore
.env
.env.*
!.env.example
*.pem
*.key
secrets/

Create a .env.example file with dummy values:

# .env.example
SECRET_KEY=your_secret_key_here
DATABASE_URL=postgresql://user:password@localhost:5432/dbname
API_KEY=your_api_key_here
DEBUG=False

Access Control

Restrict environment variable access to necessary processes
Use separate environment files for different environments (dev/staging/prod)

Secret Rotation

Implement procedures for regular secret rotation
Use separate secrets for different environments

Production Environments

Consider using cloud-native secret management services:

AWS Secrets Manager
Google Cloud Secret Manager
Azure Key Vault
HashiCorp Vault

PDOC

Python Documentation

pdoc is an automatic documentation generator for Python libraries. It builds on top of Python’s built-in doc attributes and type hints to create comprehensive API documentation. pdoc automatically extracts documentation from docstrings and generates HTML or Markdown output.

Docstring (Triple-quoted string)

def add(a: float, b: float) -> float:
    """
    Add two numbers.

    Args:
        a (float): The first number to add.
        b (float): The second number to add.

    Returns:
        float: The sum of the two numbers.

    Example:
        >>> add(2.5, 3.5)
        6.0
    """
    return a + b


def divide(a: float, b: float) -> float:
    """
    Divide one number by another.

    Args:
        a (float): The dividend.
        b (float): The divisor, must not be zero.

    Returns:
        float: The quotient of the division.

    Raises:
        ValueError: If the divisor (`b`) is zero.

    Example:
        >>> divide(10, 2)
        5.0
    """
    if b == 0:
        raise ValueError("The divisor (b) must not be zero.")
    return a / b

poetry add pdoc

or 

pip install pdoc

poetry run pdoc filename.py -o ./docs

pdoc.config.json allows customization

{
    "docformat": "google",
    "include": ["your_module"],
    "exclude": ["tests", "docs"],
    "template_dir": "custom_templates",
    "output_dir": "api_docs"
}
````<span id='footer-class'>Ver 6.0.5</span>
<footer id="last-change">Last change: 2026-02-05</footer>````

[Avg. reading time: 8 minutes]

Error Handling

Python uses try/except blocks for error handling.

The basic structure is:

try:
    # Code that may raise an exception
except ExceptionType:
    # Code to handle the exception
finally:
    # Code executes all the time

Uses

Improved User Experience: Instead of the program crashing, you can provide a user-friendly error message.

Debugging: Capturing exceptions can help you log errors and understand what went wrong.

Program Continuity: Allows the program to continue running or perform cleanup operations before terminating.

Guaranteed Cleanup: Ensures that certain operations, like closing files or releasing resources, are always performed.

Some key points

You can catch specific exception types or use a bare except to catch any exception.
Multiple except blocks can be used to handle different exceptions.
An else clause can be added to run if no exception occurs.
A finally clause will always execute, whether an exception occurred or not.

Without Try/Except

x = 10 / 0

Basic Try/Except

try:
    x = 10 / 0 
except ZeroDivisionError:
    print("Error: Division by zero!")

Generic Exception

try:
    file = open("nonexistent_file.txt", "r")
except:
    print("An error occurred!")

Find the exact error

try:
    file = open("nonexistent_file.txt", "r")
except Exception as e:
    print(str(e))

Raise - Else and Finally

try:
    x = -10
    if x <= 0:
        raise ValueError("Number must be positive")
except ValueError as ve:
    print(f"Error: {ve}")
else:
    print(f"You entered: {x}")
finally:
    print("This will always execute")

try:
    x = 10
    if x <= 0:
        raise ValueError("Number must be positive")
except ValueError as ve:
    print(f"Error: {ve}")
else:
    print(f"You entered: {x}")
finally:
    print("This will always execute")

Nested Functions


def divide(a, b):
    try:
        result = a / b
        return result
    except ZeroDivisionError:
        print("Error in divide(): Cannot divide by zero!")
        raise  # Re-raise the exception

def calculate_and_print(x, y):
    try:
        result = divide(x, y)
        print(f"The result of {x} divided by {y} is: {result}")
    except ZeroDivisionError as e:
        print(str(e))
    except TypeError as e:
        print(str(e))

# Test the nested error handling
print("Example 1: Valid division")
calculate_and_print(10, 2)

print("\nExample 2: Division by zero")
calculate_and_print(10, 0)

print("\nExample 3: Invalid type")
calculate_and_print("10", 2)

[Avg. reading time: 22 minutes]

Faker

Faker: A Python Library for Generating Fake Data

Faker is a powerful Python library that generates realistic fake data for various purposes. It’s particularly useful for:

Testing: Populating databases, testing APIs, and stress-testing applications with realistic-looking data.
Development: Creating sample data for prototyping and demonstrations.
Data Science: Generating synthetic datasets for training and testing machine learning models.
Privacy: Anonymizing real data for sharing or testing while preserving data structures and distributions.

Key Features:

Wide Range of Data Types: Generates names, addresses, emails, phone numbers, credit card details, dates, companies, jobs, texts, and much more.
Customization: Allows you to customize the data generated using various parameters and providers.
Locale Support: Supports multiple locales, allowing you to generate data in different languages and regions.
Easy to Use: Simple and intuitive API with clear documentation.

from faker import Faker

fake = Faker()

print(fake.name())  # Output: A randomly generated name
print(fake.email())  # Output: A randomly generated email address
print(fake.address())  # Output: A randomly generated address
print(fake.date_of_birth())  # Output: A randomly generated date of birth

Using Faker in Data World

Data Exploration and Analysis: Generate synthetic datasets with controlled characteristics to explore data analysis techniques and algorithms.

Data Visualization: Create sample data to visualize different data distributions and patterns.

Data Cleaning and Transformation: Test data cleaning and transformation pipelines with realistic-looking dirty data.

Data Modeling: Build and test data models using synthetic data before applying them to real-world data.

Using Faker in IoT World

IoT Device Simulation: Simulate sensor data from various IoT devices, such as temperature, humidity, and location data.

IoT System Testing: Test IoT systems and applications with realistic-looking sensor data streams.

IoT Data Analysis: Generate synthetic IoT data for training and testing machine learning models for tasks like anomaly detection and predictive maintenance.

IoT Data Visualization: Create visualizations of simulated IoT data to gain insights into system behavior.

Luhn Algorithm (pronounced as Loon)

Used to detect accidental errors in data entry or transmission, particularly single-digit errors and transposition of adjacent digits.

The Luhn algorithm, also known as the modulus 10 or mod 10 algorithm, is a simple checksum formula used to validate a variety of identification numbers, such as credit card numbers, IMEI numbers and so on.

Step 1: Starting from the rightmost digit, double the value of every second digit.
Step 2: If doubling of a number results in a two digit number, then add the digts to get a single digit number.
Step 3: Now sum all the final digits.
Step 4: If the sum is divisible by 10 then its a valid number.

Example: 4532015112830366

Key Features

Can detect 100% of single-digit errors
Can detect around 98% of transposition errors
Simple mathematical operations (addition and multiplication)
Low computational overhead

Limitations

Not cryptographically secure
Cannot detect all possible errors
Some error types (like multiple transpositions) might go undetected

Common Use Cases

Device Authentication: Validating device identifiers
Asset Tracking: Verifying equipment serial numbers
Smart Meter Reading Validation: Ensuring meter readings are transmitted correctly
Sensor Data Integrity: Basic error detection in sensor data transmission

git clone https://github.com/gchandra10/python_faker_demo.git

Damm Algorithm

The Damm Algorithm is a check digit algorithm created by H. Michael Damm in 2004. It uses a checksum technique intended to identify mistakes in data entry or transmission, especially when it comes to number sequences.

Perfect Error Detection:

Detects all single-digit errors
Detects all adjacent transposition errors
No false positives or false negatives

To check where 234 is valid number

Start: interim = 0

First digit (2):
- Row = 0 (current interim)
- Column = 2 (current digit)
- table[0][2] = 1
- New interim = 1

Second digit (3):
- Row = 1 (current interim)
- Column = 3 (current digit)
- table[1][3] = 2
- New interim = 2

Third digit (4):
- Row = 2 (current interim)
- Column = 4 (current digit)
- table[2][4] = 8
- Final interim = 8 (this becomes check digit)

As the final interim is not Zero this is not a valid number as per Damm Algorithm.

    [0, 3, 1, 7, 5, 9, 8, 6, 4, 2],
    [7, 0, 9, 2, 1, 5, 4, 8, 6, 3],
    [4, 2, 0, 6, 8, 7, 1, 3, 5, 9],
    [1, 7, 5, 0, 9, 8, 3, 4, 2, 6],
    [6, 1, 2, 3, 0, 4, 5, 9, 7, 8],
    [3, 6, 7, 4, 2, 0, 9, 5, 8, 1],
    [5, 8, 6, 9, 7, 2, 0, 1, 3, 4],
    [8, 9, 4, 5, 3, 6, 2, 0, 1, 7],
    [9, 4, 3, 8, 6, 1, 7, 2, 0, 5],
    [2, 5, 8, 1, 4, 3, 6, 7, 9, 0]

Lets try 57240 and someone entered 57340.

Luhn is like a spell checker and Damm is Grammar checker.

IOT Uses Cases with Algorithms

Use Case	Algorithm Used	Description
Smart Metering (Electricity, Water, Gas)	Luhn	Consumer account numbers and meter IDs can use the Luhn algorithm to validate input during billing and monitoring.
IoT-based Credit Card Transactions	Luhn	When smart vending machines or POS terminals process card payments, Luhn ensures credit card numbers are valid.
IMEI Validation in Smart Devices	Luhn	IoT-enabled mobile and tracking devices use Luhn to validate IMEI numbers for device authentication.
Smart Parking Ticketing Systems	Luhn	Parking meters with IoT sensors can validate vehicle plate numbers or digital parking tickets using the Luhn algorithm.
Industrial IoT (IIoT) Sensor IDs	Damm	Factory sensors and devices generate unique IDs with the Damm algorithm to prevent ID entry errors and misconfigurations.
IoT-based Asset Tracking	Damm	Logistics and supply chain IoT devices use Damm to ensure tracking codes are error-free and resistant to transposition mistakes.
Connected Health Devices (Wearables, ECG Monitors)	Damm	Unique patient monitoring device IDs use Damm for error-free identification in hospital IoT systems.
IoT-enabled Vehicle Identification	Damm	Vehicle chassis numbers and VINs in IoT-based fleet management use Damm for better error detection.

Feature	Luhn Algorithm	Damm Algorithm
Type	Modulus-10 checksum	Noncommutative quasigroup checksum
Use Case	Credit card numbers, IMEI, etc.	Error detection in numeric sequences
Mathematical Basis	Weighted sum with modulus 10	Quasigroup operations
Error Detection	Detects single-digit errors and most transpositions	Detects all single-digit and adjacent transposition errors
Processing Complexity	Simple addition and modulus operation	More complex due to quasigroup operations
Strengths	Simple and widely adopted	Stronger error detection capabilities
Weaknesses	Cannot detect all double transpositions	Less widely used and understood
Performance	Efficient for real-time validation	Slightly more computationally intensive

For Firmware updates we can use SHA-256 or SHA-512 (Hashing Algorithms)Ver 6.0.5

[Avg. reading time: 7 minutes]

Logging

Python’s logging module provides a flexible framework for tracking events in your applications. It’s used to log messages to various outputs (console, files, etc.) with different severity levels like DEBUG, INFO, WARNING, ERROR, and CRITICAL.

Use Cases of Logging

Debugging: Identify issues during development. Monitoring: Track events in production to monitor behavior. Audit Trails: Capture what has been executed for security or compliance. Error Tracking: Store errors for post-mortem analysis. Rotating Log Files: Prevent logs from growing indefinitely using size or time-based rotation.

Python Logging Levels

Level	Usage	Numeric Value	Description
`DEBUG`	Detailed information for diagnosing problems.	10	Useful during development and debugging stages.
`INFO`	General information about program execution.	20	Highlights normal, expected behavior (e.g., program start, process completion).
`WARNING`	Indicates something unexpected but not critical.	30	Warns of potential problems or events to monitor (e.g., deprecated functions, nearing limits).
`ERROR`	An error occurred that prevented some part of the program from working.	40	Represents recoverable errors that might still allow the program to continue running.
`CRITICAL`	Severe errors indicating a major failure.	50	Marks critical issues requiring immediate attention (e.g., system crash, data corruption).

INFO

import logging

logging.basicConfig(level=logging.INFO)  # Set the logging level to INFO

logging.debug("This is a debug message.")
logging.info("This is an info message.")
logging.warning("This is a warning message.")
logging.error("This is an error message.")
logging.critical("This is a critical message.")

Error

import logging

logging.basicConfig(level=logging.ERROR)  # Set the logging level to ERROR

logging.debug("This is a debug message.")
logging.info("This is an info message.")
logging.warning("This is a warning message.")
logging.error("This is an error message.")
logging.critical("This is a critical message.")


import logging

logging.basicConfig(
    level=logging.DEBUG, 
    format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

logging.debug("This is a debug message.")
logging.info("This is an info message.")
logging.warning("This is a warning message.")

More Examples

git clone https://github.com/gchandra10/python_logging_examples.git
````<span id='footer-class'>Ver 6.0.5</span>
<footer id="last-change">Last change: 2026-02-05</footer>````

[Avg. reading time: 9 minutes]

Introduction

World before containers

Physical Machines

1 Physical Server
1 Host Machine (say some Linux)
3 Applications installed

Limitation:

Need of physical server.
Version dependency (Host and related apps).
Patches ”hopefully” not affecting applications.
All apps should work with the same Host OS.
Long startup times.
Complex backup and disaster recovery processes.

3 physical server
3 Host Machine (diff OS)
3 Applications installed

Limitation:

Each application is isolated on its own physical server.
Different OS can be used for each application.
No dependency conflicts between applications.
Higher hardware costs (3 physical servers).
Resource utilization is typically low.
Maintenance requires managing multiple different OS environments.
Complex networking between applications.
Higher operational costs.

Virtual Machines

Virtual Machines emulate a real computer by virtualizing it to execute applications,running on top of a real computer.
To emulate a real computer, virtual machines use a Hypervisor to create a virtual computer.
On top of the Hypervisor, we have a Guest OS that is a Virtualized Operating System where we can run isolated applications, called Guest Operating System.
Applications that run in Virtual Machines have access to Binaries and Libraries on top of the operating system.

( + ) Full Isolation, Full virtualization

( - ) Too many layers, Heavy-duty servers.

Key Benefits of this Architecture:

Better resource utilization than separate physical servers
Strong isolation between applications
Ability to run different OS environments
Easier backup and snapshot capabilities
Better than single OS but still has overhead
Each VM requires its own OS resources
Slower startup times compared to containers
Higher memory usage due to multiple OS instances

Here comes Containers

Containers are lightweight, portable environments that package an application with everything it needs to run—like code, runtime, libraries, and system tools—ensuring consistency across different environments. They run on the same operating system kernel and isolate applications from each other, which improves security and makes deployments easier.

Containers are isolated processes that share resources with their host and, unlike VMs, don’t virtualize the hardware and don’t need a Guest OS.
Containers share resources with other Containers in the same host.
This gives more performance than VMs (no separate guest OS).
Container Engine in place of Hypervisor.

Key Benefits of Container Architecture:

Lightweight compared to VMs (no guest OS)
Fast startup times
Consistent environment across development and production
Better resource utilization
Portable across different platforms
Isolated dependencies prevent conflicts
Easy scaling and updates
Standardized deployment processVer 6.0.5

[Avg. reading time: 9 minutes]

VMs or Containers

VMs are great for running multiple, isolated OS environments on a single hardware platform. They offer strong security isolation and are useful when applications need different OS versions or configurations.

Containers are lightweight and share the host OS kernel, making them faster to start and less resource-intensive. They’re perfect for microservices, CI/CD pipelines, and scalable applications.

Smart engineers focus on the right tool for the job rather than getting caught up in “better or worse” debates.

Use them in combination to make life better.

Popular container technologies

Docker: The most widely used container platform, known for its simplicity, portability, and extensive ecosystem.

Podman: A daemonless container engine that’s compatible with Docker but emphasizes security, running containers as non-root users.

Podman stands out because it aligns well with Kubernetes, which uses pods (groups of one or more containers) as the basic building block. Podman operates directly with pods without needing a daemon, making running containers without root privileges simpler and safer.

This design reduces the risk of privilege escalation attacks, making it a preferred choice in security-sensitive environments. Plus, Podman can run existing Docker containers without modification, making it easy to switch over.

NOTE: INSTALL DOCKER OR PODMAN (Not BOTH)

Podman on Windows

https://podman-desktop.io/docs/installation/windows-install

Once installed, verify the installation by checking the version:

podman info

Podman on MAC

Install Podman

After installing, you need to create and start your first Podman machine:

podman machine init
podman machine start

You can then verify the installation information using:

podman info

Podman on Linux

Install Podman

You can then verify the installation information using:

podman info

Docker Installation

Here is step by step installation

https://docs.docker.com/desktop/setup/install/windows-install/Ver 6.0.5

[Avg. reading time: 3 minutes]

Containers

Images

The image is the prototype or skeleton to create a container, like a recipe to make your favorite food.

It contains

The operating system
Application code
Dependencies
Configuration files
Everything else needed to run your application

Container

A Container is like the actual meal you cook from that recipe - it’s a running instance of an image. When you “run” an image, you create a container.

They are isolated environments that run the application
You can run multiple containers from the same image
Each container has its own storage, network interface, and resources
Containers can be started, stopped, moved, and deleted

If Image = Recipe, then Container = Cooked food.

Where to get the Image from?

Docker Hub

For both Podman and Docker, images are from the Docker Hub.

Docker HubVer 6.0.5

[Avg. reading time: 1 minute]

How Containers help

It brings to us the ability to create applications without worrying about their environment.

Ver 6.0.5

[Avg. reading time: 11 minutes]

Container Examples

If you have installed Docker replace podman with docker.

Syntax

podman pull <imagename>
podman run <imagename>

docker pull <imagename>
docker run <imagename>

Examples:

podman pull hello-world
podman run hello-world
podman container ls
podman container ls -a
podman image ls

docker pull hello-world
docker run hello-world
docker container ls
docker container ls -a
docker image ls

Optional Setting (For PODMAN)

/etc/containers/registries.conf

unqualified-search-registries = ["docker.io"]

Deploy MySQL Database using Containers

Create the following folder

Linux / Mac

mkdir -p container/mysql
cd container/mysql

Windows

md container
cd container
md mysql
cd mysql

Note: If you already have MySQL Server installed in your machine then please change the port to 3307 as given below.

-p 3307:3306 \

Run the container


podman run --name mysql -d \
    -p 3306:3306 \
    -e MYSQL_ROOT_PASSWORD=root-pwd \
    -e MYSQL_ROOT_HOST="%" \
    -e MYSQL_DATABASE=mydb \
    -e MYSQL_USER=remote_user \
    -e MYSQL_PASSWORD=remote_user-pwd \
    docker.io/library/mysql:8.4.4

-d : detached (background mode)
-p : 3306:3306 maps mysql default port 3306 to host machines port 3306
    3307:3306 maps mysql default port 3306 to host machines port 3307

-e MYSQL_ROOT_HOST="%" Allows to login to MySQL using MySQL Workbench

podman exec -it mysql bash

List all the Containers

podman container ls -a

Stop MySQL Container

podman stop mysql

Delete the container**

podman rm mysql

Preserve the Data for future**

Inside container/mysql

mkdir data

podman run --name mysql -d \
    -p 3306:3306 \
    -e MYSQL_ROOT_PASSWORD=root-pwd \
    -e MYSQL_ROOT_HOST="%" \
    -e MYSQL_DATABASE=mydb \
    -e MYSQL_USER=remote_user \
    -e MYSQL_PASSWORD=remote_user-pwd \
    -v ./data:/var/lib/mysql \
    docker.io/library/mysql:8.4.4

-- Create database
CREATE DATABASE IF NOT EXISTS friends_tv_show;
USE friends_tv_show;

-- Create Characters table
CREATE TABLE characters (
    character_id INT AUTO_INCREMENT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    actor_name VARCHAR(100) NOT NULL,
    date_of_birth DATE,
    occupation VARCHAR(100),
    apartment_number VARCHAR(10)
);

INSERT INTO characters (first_name, last_name, actor_name, date_of_birth, occupation, apartment_number) VALUES
('Ross', 'Geller', 'David Schwimmer', '1967-10-02', 'Paleontologist', '3B'),
('Rachel', 'Green', 'Jennifer Aniston', '1969-02-11', 'Fashion Executive', '20'),
('Chandler', 'Bing', 'Matthew Perry', '1969-08-19', 'IT Procurement Manager', '19'),
('Monica', 'Geller', 'Courteney Cox', '1964-06-15', 'Chef', '20'),
('Joey', 'Tribbiani', 'Matt LeBlanc', '1967-07-25', 'Actor', '19'),
('Phoebe', 'Buffay', 'Lisa Kudrow', '1963-07-30', 'Massage Therapist/Musician', NULL);

select * from characters;

Build your own Image


mkdir -p container
cd container

Python Example

Follow the README.md

Forth & Clone

git clone https://github.com/gchandra10/docker_mycalc_demo.git

Web App Demo

Fork & Clone

git clone https://github.com/gchandra10/docker_webapp_demo.git

Publish Image to Docker Hub

Create a Repository “my_faker_calc”
Under Account Settings > Personal Access Token> Create a PAT token with Read/Write access for 1 day

Replace gchandra10 with yours.

podman login docker.io 

enter userid
enter PAT token

Then build the Image with your userid

podman build -t gchandra10/my_faker_calc:1.0  .
podman image ls

Copy the ImageID of gchandra10/my_fake_calc:1.0

Tag the ImageID with necessary version and latest

podman image tag <image_id> gchandra10/my_faker_calc:latest

Push the Images to Docker Hub (version and latest)

podman push gchandra10/my_faker_calc:1.0 
podman push gchandra10/my_faker_calc:latest

Image Security

Open Source tool Trivy

https://trivy.dev/latest/getting-started/installation/

trivy image gchandra10/my_faker_calc

trivy image gchandra10/my_faker_calc --severity CRITICAL,HIGH --format table

trivy image gchandra10/my_faker_calc --severity CRITICAL,HIGH  --output result.txt

````<span id='footer-class'>Ver 6.0.5</span>
<footer id="last-change">Last change: 2026-02-05</footer>````

[Avg. reading time: 5 minutes]

Time Series Database

A time series database (TSDB) is a specialized type of database designed to efficiently store and retrieve time-stamped data.

Key Characteristics:

Optimized for Time: TSDBs are built around the concept of time, allowing for fast querying based on time ranges, intervals, and trends.

High-Volume Data Handling: They are designed to handle large volumes of data points generated rapidly, such as sensor readings, financial data, or website traffic metrics.

Efficient Data Storage: TSDBs often use specialized data structures and compression techniques to store and retrieve data efficiently, minimizing storage space and query latency.

Fast Queries: They enable fast queries for common time-series operations like:

Filtering: Selecting data based on time ranges, tags, and values.

Aggregation: Calculating averages, sums, min/max values over specific time intervals.

Downsampling: Reducing the data resolution for long-term storage or analysis.

Common Use Cases:

IoT (Internet of Things): Storing sensor data from devices like wearables, smart homes, and industrial equipment.

Monitoring: Tracking system performance metrics, network traffic, and application logs.

Financial Markets: Storing stock prices, trading volumes, and other financial data.

Scientific Research: Analyzing time-series data from experiments, simulations, and observations.

Popular Examples:

InfluxDB: A widely used open-source TSDB.

TimescaleDB: An extension for PostgreSQL that adds time-series capabilities.

Prometheus: An open-source monitoring and alerting system with its own TSDB.

Cassandra: A NoSQL database that can be used for time-series data.

In essence, a TSDB provides a specialized and efficient solution for handling the unique characteristics of time-stamped data, making it a valuable tool for a wide range of applications.Ver 6.0.5

[Avg. reading time: 37 minutes]

InfluxDB

InfluxDB is schema-less.

InfluxDB is a purpose-built time series database designed to handle high volumes of timestamped data. It excels at collecting, storing, processing, and visualizing metrics and events from applications, IoT devices, and infrastructure monitoring systems.

Easy to get started - just write data and InfluxDB handles the structure
Flexible for evolving data requirements
No database migrations needed when adding new fields or tags

Key Features

Time Series Optimized: Built from the ground up for time-stamped data, providing efficient storage and fast queries.

High Write Performance: Can handle millions of data points per second, making it ideal for real-time monitoring applications.

Flux Query Language: A powerful functional data scripting language for analytics, monitoring, and alerting.

Bucket Storage: Organizes data into buckets with customizable retention policies.

Downsampling & Retention: Automatically manages data lifecycle with features to aggregate older data and purge expired points.

Built-in UI: Comes with a comprehensive web interface for data visualization, dashboarding, and administration.

Data Model

Measurements: Similar to tables in relational databases (e.g., temperature, cpu_usage)

Tags: Indexed metadata for fast filtering (e.g., location=kitchen)

Fields: The actual data values being stored (e.g., value=22.5)

Timestamps: When each data point was recorded

Common Use Cases

IoT Sensor Data: Tracking environmental conditions, equipment performance

DevOps Monitoring: System metrics, application performance, infrastructure health

Real-time Analytics: Business metrics, user activity, performance trends

Financial Market Data: Stock prices, trading volumes, economic indicators

Epoch Time (https://epochconverter.com)

Installation

Cloud

Influxdata Cloud

Via Docker

mkdir influxdb
cd influxdb

mkdir data
mkdir config

Docker Hub

podman run -d -p 8086:8086 \
  --name myinflux \
  -v "$PWD/data:/var/lib/influxdb2" \
  -v "$PWD/config:/etc/influxdb2" \
  -e DOCKER_INFLUXDB_INIT_MODE=setup \
  -e DOCKER_INFLUXDB_INIT_USERNAME=admin \
  -e DOCKER_INFLUXDB_INIT_PASSWORD=P@ssw0rd1 \
  -e DOCKER_INFLUXDB_INIT_ORG=rowan \
  -e DOCKER_INFLUXDB_INIT_BUCKET=mybucket \
  -e DOCKER_INFLUXDB_INIT_RETENTION=1h \
  influxdb:latest

At this time latest is Version 2.x Minimum Retention period is 1 hr

24h = 1 day
168h = 7 days
720h = 30 days
8760h = 1 year

Format of Sample Data

In InfluxDB, a “measurement” is a fundamental concept that represents the data structure that stores time series data. You can think of a measurement as similar to a table in a traditional relational database.

Note:

Use singular form for measurement names (e.g., “temperature” not “temperatures”)
Be consistent with tag and field names
Consider using a naming convention (e.g., snake_case or camelCase)

Example 1

temperature,location=kitchen value=22.5

temperature -> measurement
location=kitchen -> tags
value=22.5 -> field
if TimeStamp is missing then it assumes current TimeStamp

Example 2

temperature,location=kitchen,sensor=thermometer value=22.5 1614556800000000000

Example 3

Multiple Tags and Multiple Fields

temperature,location=kitchen,sensor=thermometer temp_c=22.5,humidity_pct=45.2

location=kitchen,sensor=thermometer -> Tags
temp_c=22.5,humidity_pct=45.2 -> Field

Example 4

temperature,location=kitchen,sensor=thermometer reading=22.5,battery_level=98,type="smart",active=true

Load Data

Login via UI

http://localhost:8086

Username: admin Password: P@ssw0rd1

Navigate to “Data” in the left sidebar, then select “Buckets”
You should see your “mybucket” listed. Click on it.
Look for “Add Data” button and select “Line Protocol”

Line Protocol

Line protocol, is InfluxDB’s text-based format for writing time series data into the database. It’s designed to be both human-readable and efficient for machine parsing.

temperature,location=kitchen value=22.5
temperature,location=living_room value=21.8
temperature,location=bedroom value=20.3

temperature,location=kitchen value=23.1
temperature,location=living_room value=22.0
temperature,location=bedroom value=20.7

temperature,location=kitchen value=22.8
temperature,location=living_room value=21.5
temperature,location=bedroom value=20.1

temperature,location=kitchen value=23.5
temperature,location=living_room value=21.9
temperature,location=bedroom value=19.8

temperature,location=kitchen value=24.2
temperature,location=living_room value=22.3
temperature,location=bedroom value=20.5

temperature,location=kitchen value=23.7
temperature,location=living_room value=22.8
temperature,location=bedroom value=21.0

temperature,location=kitchen value=22.9
temperature,location=living_room value=22.5
temperature,location=bedroom value=20.8

Click “Write Data” to insert these values
To verify your data was written, go to “Explore” in the left sidebar
Create a simple query using the query builder:
Select your “mybucket” bucket
Filter by measurement: “temperature”
Select fields: “value”
Group by: “location”
Click “Submit”

You should see a graph with your temperature readings for different locations.

Upload bulk data

Create a TEXT file with all the values and choose Upload File

Contents of the text file can be as-is.

humidity,location=kitchen value=45.2
humidity,location=living_room value=42.8
humidity,location=bedroom value=48.3

humidity,location=kitchen value=46.1
humidity,location=living_room value=43.5
humidity,location=bedroom value=49.1

humidity,location=kitchen value=45.8
humidity,location=living_room value=42.3
humidity,location=bedroom value=48.7

humidity,location=kitchen value=46.5
humidity,location=living_room value=44.2
humidity,location=bedroom value=49.8

humidity,location=kitchen value=47.2
humidity,location=living_room value=45.1
humidity,location=bedroom value=50.2

humidity,location=kitchen value=46.8
humidity,location=living_room value=44.8
humidity,location=bedroom value=49.6

humidity,location=kitchen value=45.9
humidity,location=living_room value=43.7
humidity,location=bedroom value=48.5

co2_ppm,location=kitchen value=612
co2_ppm,location=living_room value=578
co2_ppm,location=bedroom value=495

co2_ppm,location=kitchen value=635
co2_ppm,location=living_room value=582
co2_ppm,location=bedroom value=510

co2_ppm,location=kitchen value=621
co2_ppm,location=living_room value=565
co2_ppm,location=bedroom value=488

co2_ppm,location=kitchen value=642
co2_ppm,location=living_room value=595
co2_ppm,location=bedroom value=502

co2_ppm,location=kitchen value=658
co2_ppm,location=living_room value=612
co2_ppm,location=bedroom value=521

co2_ppm,location=kitchen value=631
co2_ppm,location=living_room value=586
co2_ppm,location=bedroom value=508

co2_ppm,location=kitchen value=618
co2_ppm,location=living_room value=572
co2_ppm,location=bedroom value=491

Create a CSV file

create iot.csv inside the data folder.

#datatype measurement,tag,tag,tag,double,long,string,boolean
measurement,location,sensor,type,reading,battery_level,status,active
temperature,kitchen,thermometer,smart,22.5,98,normal,true
temperature,living_room,thermometer,smart,21.8,97,normal,true
temperature,bedroom,thermometer,smart,20.3,96,normal,true
temperature,kitchen,thermometer,smart,23.1,95,normal,true
temperature,living_room,thermometer,smart,22.0,94,normal,true
temperature,bedroom,thermometer,smart,20.7,93,normal,true
humidity,kitchen,hygrometer,smart,45.2,97,normal,true
humidity,living_room,hygrometer,smart,42.8,96,normal,true
humidity,bedroom,hygrometer,smart,48.3,95,normal,true
co2_ppm,kitchen,air_quality,smart,612,99,normal,true
co2_ppm,living_room,air_quality,smart,578,98,normal,true
co2_ppm,bedroom,air_quality,smart,495,97,normal,true

Login to Client CLI

podman exec -it myinflux bash

ls /var/lib/influxdb2

you will see the iot.csv (magic ;))

More Info

https://docs.influxdata.com/influxdb/cloud/write-data/developer-tools/csv/

USE CLI

influx write -b mybucket -f iot.csv --format csv

ISO 8601 time format.

T: Starting for Time
Z: UTC
-05:00: EST

influx delete --bucket mybucket --start 2025-02-26T00:00:00Z --stop 2025-03-06T23:59:59Z

REST API

Uses HTTP-based communication to write data to InfluxDB.

GitBash / Mac / Linux

create another bucket mybucket2

export TOKEN="<your token>"
export BUCKET="mybucket2"
export ORG="rowan"

Note: Bucket and Org properties were defined in Container.

Measurement: CPU

This uses system DateTime when inserting data.

curl -i -XPOST "http://localhost:8086/api/v2/write?org=$ORG&bucket=$BUCKET&precision=s" \
  --header "Authorization: Token $TOKEN" \
  --data-raw "cpu,host=server01,region=us-west cpu_usage=45.2,cpu_temp=62.1,cpu_idle=54.8"

precision=s to specify seconds precision.

Date Time along with data

curl -i -XPOST "http://localhost:8086/api/v2/write?org=$ORG&bucket=$BUCKET" \
  --header "Authorization: Token $TOKEN" \
  --data-raw "cpu,host=server01,region=us-west cpu_usage=44.2,cpu_temp=60,cpu_idle=52 $(date +%s)000000000"

Measurement: Memory

curl -i -XPOST "http://localhost:8086/api/v2/write?org=$ORG&bucket=$BUCKET&precision=s" \
  --header "Authorization: Token $TOKEN" \
  --data-raw "memory,host=server01,region=us-west used_percent=72.1,available_mb=3945.7,total_mb=16384"

Combining Measurements

curl -i -XPOST "http://localhost:8086/api/v2/write?org=$ORG&bucket=$BUCKET&precision=s" \
  --header "Authorization: Token $TOKEN" \
  --data-raw "cpu,host=server02,region=us-central cpu_usage=22.2,cpu_idle=75.8
memory,host=server02,region=us-central used_percent=27.1,available_mb=14945.7,total_mb=24384"

curl -i -XPOST "http://localhost:8086/api/v2/write?org=$ORG&bucket=$BUCKET&precision=s" \
  --header "Authorization: Token $TOKEN" \
  --data-raw "cpu,host=server02,region=us-central cpu_usage=23.2,cpu_idle=76.8
memory,host=server02,region=us-central used_percent=28.1,available_mb=15945.7,total_mb=24384"

InfluxDB CLI

Install CLI

https://docs.influxdata.com/influxdb/cloud/tools/influx-cli/

FLUX Query Language

FLUX Query Language

In InfluxDB 3.0 they are introducing SQL back.


influx query 'from(bucket: "mybucket") 
  |> range(start: -50m)
  |> filter(fn: (r) => r._measurement == "temperature") '

Python Examples

git clone https://github.com/gchandra10/python_influxdb_examples.git

Telegraf

Telegraf, a server-based agent, collects and sends metrics and events from databases, systems, and IoT sensors. Written in Go, Telegraf compiles into a single binary with no external dependencies–requiring very minimal memory.

Install Telegraf Cli

Telegraf Plugins

Telegraf Plugins Github

export MQTT_HOST_NAME=""
export MQTT_PORT=
export MQTT_USER_NAME=""
export MQTT_PASSWORD=""
export INFLUX_TOKEN=""
export INFLUX_DB_ORG=""
export INFLUX_DB_BUCKET=""

telegraf.conf

# Global agent configuration
[agent]
  interval = "5s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = false
  quiet = false
  hostname = ""
  omit_hostname = false

# MQTT Consumer Input Plugin
[[inputs.mqtt_consumer]]
  servers = ["ssl://${MQTT_HOST_NAME}:${MQTT_PORT}"]
  username = "${MQTT_USER_NAME}"
  password = "${MQTT_PASSWORD}"

  # Set custom measurement name
  name_override = "my_python_sensor_temp"
  
  # Topics to subscribe to
  topics = [
    "sensors/temp",
  ]
  
  # Connection timeout
  connection_timeout = "30s"
  
  # TLS/SSL configuration
  insecure_skip_verify = true
  
  # QoS level
  qos = 1
  
  # Client ID
  client_id = "telegraf_mqtt_consumer"
  
  # Data format
  data_format = "value"
  data_type = "float"

# InfluxDB v2 Output Plugin
[[outputs.influxdb_v2]]
  # URL for your local InfluxDB
  urls = ["http://localhost:8086"]
  
  # InfluxDB token
  token = "${INFLUX_TOKEN}"
  
  # Organization name
  organization = "${INFLUX_DB_ORG}"
  
  # Destination bucket
  bucket = "${INFLUX_DB_BUCKET}"

  # Add tags - match the location from your MQTT script
  [outputs.influxdb_v2.tags]
    location = "room1"

Run Telegraph

telegraf --config telegraf.conf --debug

Storing output in InfluxDB and S3

export MQTT_HOST_NAME=""
export MQTT_PORT=
export MQTT_USER_NAME=""
export MQTT_PASSWORD=""
export INFLUX_TOKEN=""
export INFLUX_DB_ORG=""
export INFLUX_DB_BUCKET=""
export S3_BUCKET=""
export AWS_REGION=""
export AWS_ACCESS_KEY_ID=""
export AWS_SECRET_ACCESS_KEY=""

telegraf.conf

# Global agent configuration
[agent]
  interval = "5s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = false
  quiet = false
  hostname = ""
  omit_hostname = false

# MQTT Consumer Input Plugin
[[inputs.mqtt_consumer]]
  servers = ["ssl://${MQTT_HOST_NAME}:${MQTT_PORT}"]
  username = "${MQTT_USER_NAME}"
  password = "${MQTT_PASSWORD}"

  # Set custom measurement name
  name_override = "my_python_sensor_temp"
  
  # Topics to subscribe to
  topics = [
    "sensors/temp",
  ]
  
  # Connection timeout
  connection_timeout = "30s"
  
  # TLS/SSL configuration
  insecure_skip_verify = true
  
  # QoS level
  qos = 1
  
  # Client ID
  client_id = "telegraf_mqtt_consumer"
  
  # Data format
  data_format = "value"
  data_type = "float"

# InfluxDB v2 Output Plugin
[[outputs.influxdb_v2]]
  # URL for your local InfluxDB
  urls = ["http://localhost:8086"]
  
  # InfluxDB token
  token = "${INFLUX_TOKEN}"
  
  # Organization name
  organization = "${INFLUX_DB_ORG}"
  
  # Destination bucket
  bucket = "${INFLUX_DB_BUCKET}"

  # Add tags - match the location from your MQTT script
  [outputs.influxdb_v2.tags]
    location = "room1"

# S3 Output Plugin with CSV format
[[outputs.remotefile]]
  remote = 's3,provider=AWS,access_key_id=${AWS_ACCESS_KEY_ID},secret_access_key=${AWS_SECRET_ACCESS_KEY},region=${AWS_REGION}:${S3_BUCKET}'
 
  # File naming
  files = ['{{.Name}}-{{.Time.Format "2025-03-26"}}']

InfluxDB University

Free TrainingVer 6.0.5

[Avg. reading time: 0 minutes]

Data Visualization libraries

Popular tools

Grafana
Tableau
PowerBI
StreamLit
Python MatplotLib
Python SeabornVer 6.0.5

[Avg. reading time: 10 minutes]

Grafana

Grafana is an open-source analytics and visualization platform that allows you to query, visualize, alert on, and understand your metrics from various data sources through customizable dashboards.

Provides real-time monitoring of IoT device data through intuitive dashboards
Supports visualization of time-series data (which is common in IoT applications)
Offers powerful alerting capabilities for monitoring device health and performance
Enables custom dashboards that can display metrics from multiple IoT devices in one view.
InfluxDB is optimized for storing and querying time-series data generated by IoT sensors.
The combination provides high-performance data ingestion for handling large volumes of IoT telemetry.
InfluxDB’s data retention policies help manage IoT data storage efficiently.
Grafana can easily visualize the time-series data stored in InfluxDB through simple queries.
Both tools are lightweight enough to run on edge computing devices for local IoT monitoring.

Deploy InfluxDB/Grafana

Create a network

Isolation and security - The dedicated network isolates your containers from each other and from the host system, reducing the attack surface.
Container-to-container communication - Containers in the same network can communicate using their container names (like “myinflux” and “mygrafana”) as hostnames, making connections simpler and more reliable.
Port conflict prevention - You avoid potential port conflicts on the host, as multiple applications can use the same internal port numbers within their isolated network.
Simpler configuration - Services can reference each other by container name instead of IP addresses, making configuration more maintainable.

podman network create monitoring-net

podman run -d --name myinflux \
  --network monitoring-net \
  -p 8086:8086 \
  -v "$PWD/influxdb-data:/var/lib/influxdb2" \
  -v "$PWD/influxdb-config:/etc/influxdb2" \
  -e DOCKER_INFLUXDB_INIT_MODE=setup \
  -e DOCKER_INFLUXDB_INIT_USERNAME=admin \
  -e DOCKER_INFLUXDB_INIT_PASSWORD=P@ssw0rd1 \
  -e DOCKER_INFLUXDB_INIT_ORG=rowan \
  -e DOCKER_INFLUXDB_INIT_BUCKET=mybucket \
  influxdb:latest

podman run -d --name mygrafana \
  --network monitoring-net \
  -p 3000:3000 \
  -v "$PWD/grafana-data:/var/lib/grafana" \
  grafana/grafana-oss:latest

InfluxDB

http://localhost:8086

Grafana

http://localhost:3000

podman exec -it myinflux influx write \
  --bucket mybucket \
  --precision s \
  "test_metric,host=server1 value=13.4 $(date +%s)"

podman exec -it myinflux influx write \
  --bucket mybucket \
  --precision s \
  "test_metric,host=server1 value=13.6 $(date +%s)"


 podman exec -it myinflux influx write \
  --bucket mybucket \
  --precision s \
  "test_metric,host=server1 value=12.42 $(date +%s)"


 podman exec -it myinflux influx write \
  --bucket mybucket \
  --precision s \
  "test_metric,host=server1 value=13.1 $(date +%s)"


 podman exec -it myinflux influx write \
  --bucket mybucket \
  --precision s \
  "test_metric,host=server1 value=12.8 $(date +%s)"


 podman exec -it myinflux influx write \
  --bucket mybucket \
  --precision s \
  "test_metric,host=server1 value=12.5 $(date +%s)"

**Grafana > Alerting > Alert Rules **

Select influxdb
Create New Rule

from(bucket: "mybucket")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "test_metric")

Use this query not the one shown in image

from(bucket: "mybucket")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "test_metric")

from(bucket: "mybucket")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "test_metric")
  |> aggregateWindow(every: 5m, fn: mean, createEmpty: false)
  |> yield(name: "mean")

````<span id='footer-class'>Ver 6.0.5</span>
<footer id="last-change">Last change: 2026-02-05</footer>````

[Avg. reading time: 0 minutes]

Machine Learning with IoT

[Avg. reading time: 3 minutes]

Predictive Maintenance

Predictive maintenance uses IoT sensor data to forecast when equipment might fail, allowing for maintenance to be performed just before it’s needed.

Core components:

Sensor Integration - Collecting real-time data on equipment conditions (vibration, temperature, sound, etc.)

Data Processing - Cleaning, normalizing, and preparing sensor data for analysis

Condition Monitoring - Tracking equipment state against established parameters

Failure Prediction Models - Using machine learning to forecast potential failures based on historical patterns

Implementation Architecture

In the IoT, these capabilities typically include:

Edge Processing - Initial anomaly detection at the device level to reduce latency.

Fog Computing - Intermediate processing for time-sensitive analytics.

Cloud Backend - Advanced analytics, model training, and long-term data storage.

Visualization Layer - Dashboards and alerts for maintenance teams.

Benefits

Reduced Downtime - Preventing unexpected equipment failures.

Cost Optimization - Performing maintenance only when necessary.

Extended Asset Lifespan - Addressing issues before they cause permanent damage.

Improved Safety - Preventing catastrophic failures that could pose safety risks.Ver 6.0.5

[Avg. reading time: 12 minutes]

Anomaly Detection

Anomaly detection and predictive maintenance are critical components of the IoT upper stack, focusing on analyzing data to identify unusual patterns and prevent equipment failures before they occur.

Anomaly Detection in IoT

Anomaly detection in IoT systems identifies unusual patterns or outliers in data streams that deviate from expected behavior. This is particularly valuable in industrial and enterprise IoT deployments.

Key approaches include:

Statistical Methods - Using statistical models to establish normal behavior baselines and flag significant deviations.

Machine Learning Techniques - Employing supervised or unsupervised learning to recognize patterns and identify anomalies.

Deep Learning Models - Leveraging neural networks for complex pattern recognition in high-dimensional IoT data.

Isolation Forest

Isolation Forest is a powerful machine learning algorithm specifically designed for anomaly detection. Unlike many other algorithms that identify anomalies based on density or distance measures, Isolation Forest takes a fundamentally different approach.

The key insight behind Isolation Forest is remarkably intuitive: anomalies are few and different, so they’re easier to isolate than normal data points.

100+ trees will be built here demonstrating 4 for understanding.

Dataset: [-100, 2, 11, 13, 100]

                Root (Tree 1)
                 |
        [Split at value = 7]
        /                 \
    [-100, 2]        [11, 13, 100]
        |                  |
[Split at value = -49]  [Split at value = 56]
    /         \          /         \
[-100]       [2]    [11, 13]      [100]

Path Length 2,2,3,2


                Root (Tree 2)
                 |
        [Split at value = 1]
        /                 \
    [-100]          [2, 11, 13, 100]
Path length: 1          |
                  [Split at value = 50]
                  /                 \
            [2, 11, 13]           [100]
                 |               Path length: 2
         [Further splits needed]
         Path length: 3-4


                Root (Tree 3)
                 |
        [Split at value = 12]
        /                 \
[-100, 2, 11]         [13, 100]
        |                  |
[Split at value = -40]  [Split at value = 57]
    /         \          /         \
[-100]     [2, 11]     [13]       [100]
Path length: 2   3+      2       Path length: 2



                Root (Tree 4)
                 |
        [Split at value = 80]
        /                 \
[-100, 2, 11, 13]        [100]
        |              Path length: 1
[Split at value = -50]
    /         \         
[-100]    [2, 11, 13]  
Path length: 2    [Further splits]
                 Path length: 3+

Avg Path Length (across 4 trees)

For -100 (2+1+2+2)/4 = 1.75
For 2 (2+3+3+3)/4 = 2.75
For 11: (3 + 3 + 3 + 3) ÷ 4 = 3.0
For 13: (3 + 3 + 2 + 3) ÷ 4 = 2.75
For 100: (3 + 2 + 2 + 1) ÷ 4 = 2.0

When normalized (mathematically adjusted formula) 100 & -100 will be closer to 1 and other numbers will be closer to 0.

Isolation Forest key points to remember.

Core principle: Anomalies are “few and different” and thus require fewer splits to isolate than normal data points.

Random partitioning: The algorithm randomly selects features and split values to create isolation trees.

Tree construction: Each tree recursively partitions data until all points are isolated.

Path length: The number of splits needed to isolate a point is its path length in that tree.

Anomaly detection: Anomalies have shorter average path lengths across multiple trees.

Ensemble approach: Many isolation trees are built (typically 100+) using random subsamples of data.

Averaging: The average path length for each point is calculated across all trees.

Normalization: Path lengths are normalized using a factor based on dataset size.

Scoring: Anomaly scores are calculated using the formula: s(x,n) = 2^(-E[h(x)]/c(n))

Higher scores (closer to 1) indicate anomalies Lower scores (closer to 0) indicate normal points

Threshold: Points with scores above a threshold (typically 0.5) are classified as anomalies.

No distance calculation: Unlike many anomaly detection algorithms, Isolation Forest doesn’t require distance or density calculations.

Efficiency: The algorithm has linear time complexity and low memory requirements.

Robustness: Works well with high-dimensional data and is resistant to the “curse of dimensionality”.

Limitations: May struggle with clustered anomalies or when anomalies are close to normal data boundaries.Ver 6.0.5

[Avg. reading time: 18 minutes]

ML Models quick intro

Supervised Learning

In supervised learning, classification and regression are two distinct types of tasks, differing primarily in the nature of their output and the problem they solve.

Labeled historical data (e.g., sensor readings with timestamps of past failures).

Classification

Predicts discrete labels (categories or classes).

Example:

Binary: Failure (1) vs. No Failure (0).

Multi-class: Type of failure (bearing_failure, motor_overheat, lubrication_issue).

Regression

Predicts continuous numerical values.

Example:

Remaining Useful Life (RUL): 23.5 days until failure.

Time-to-failure: 15.2 hours.

Use Cases in Predictive Maintenance

Classification:

Answering yes/no questions:

Will this motor fail in the next week?
Is the current vibration pattern abnormal?
Identifying the type of fault (e.g., electrical vs. mechanical).

Regression:

Quantifying degradation:

How many days until the turbine blade needs replacement?
What is the current health score (0–100%) of the compressor?

Algorithms

Category	Algorithm	Description
Classification	Logistic Regression	Models probability of class membership.
	Random Forest	Ensemble of decision trees for classification.
	Support Vector Machines (SVM)	Maximizes margin between classes.
	Neural Networks	Learns complex patterns and nonlinear decision boundaries.

Category	Algorithm	Description
Regression	Linear Regression	Models linear relationship between features and target.
	Decision Trees (Regressor)	Tree-based model for predicting continuous values.
	Gradient Boosting Regressors	Ensemble of weak learners (e.g., XGBoost, LightGBM).
	LSTM Networks	Recurrent neural networks for time-series regression.

Evaluation Metrics

Classification:

Accuracy: % of correct predictions.
Precision/Recall: Trade-off between false positives and false negatives.
- Precision: TP/(TP+FP)
- Recall: TP/(TP+FN)
F1-Score: Harmonic mean of precision and recall.

Regression:

Mean Absolute Error (MAE): Average absolute difference between predicted and actual values.
Mean Squared Error (MSE): Penalizes larger errors.
R² Score: How well the model explains variance in the data.

Unsupervised Learning

In unsupervised learning, clustering and anomaly detection serve distinct purposes and address different problems.

Primary Objective

Clustering

Assigns each data point to a cluster (e.g., Cluster 1, Cluster 2).
Outputs are groups of similar instances.

Goal: Group data points into clusters based on similarity.

Focuses on discovering natural groupings or patterns in the data.

Example: Segmenting customers into groups for targeted marketing.

Anomaly Detection

Labels data points as normal or anomalous (binary classification).
Outputs are scores or probabilities indicating how “outlier-like” a point is.

Goal: Identify rare or unusual data points that deviate from the majority.

Focuses on detecting outliers or unexpected patterns.

Example: Flagging fraudulent credit card transactions.

Algorithms

Category	Algorithm	Description
Clustering	K-Means	Partitions data into k spherical clusters.
	Hierarchical Clustering	Builds nested clusters using dendrograms.
	DBSCAN	Groups dense regions and identifies sparse regions as outliers.
	Gaussian Mixture Models (GMM)	Probabilistic clustering using a mixture of Gaussians.
Anomaly Detection	Isolation Forest	Isolates anomalies using random decision trees.
	One-Class SVM	Learns a boundary around normal data to detect outliers.
	Autoencoders	Reconstructs input data; anomalies yield high reconstruction error.
	Local Outlier Factor (LOF)	Detects anomalies by comparing local density of data points.

Time Series

Forecasting and Anomaly Detection are two fundamental but distinct tasks, differing in their objectives, data assumptions, and outputs.

Model	Type	Strengths	Limitations
ARIMA/SARIMA	Classical	Simple, interpretable, strong for univariate, seasonal data	Requires stationary data, manual tuning
Facebook Prophet	Additive model	Easy to use, handles holidays/seasonality, works with missing data	Slower for large datasets, limited to trend/seasonality modeling
Holt-Winters (Exponential Smoothing)	Classical	Lightweight, works well with level/trend/seasonality	Not good with irregular time steps or complex patterns
LSTM (Recurrent Neural Network)	Deep Learning	Learns long-term dependencies, supports multivariate	Requires lots of data, training is resource-intensive
XGBoost + Lag Features	Machine Learning	High performance, flexible with engineered features	Requires feature engineering, not “true” time series model
NeuralProphet	Hybrid (Prophet + NN)	Better performance than Prophet, supports regressors/events	Heavier than Prophet, still maturing
Temporal Fusion Transformer (TFT)	Deep Learning	SOTA for multivariate forecasts with interpretability	Overkill for small/medium IoT data, very heavy

Layer	Model(s)	Why
Edge	Holt-Winters, thresholds, micro-LSTM (TinyML), Prophet (inference)	Extremely lightweight, low latency
Fog	Prophet, ARIMA, Isolation Forest, XGBoost	Moderate compute, supports both real-time + near-real-time
Cloud	LSTM, TFT, NeuralProphet, Prophet (training), XGBoost	Can handle heavy training, multivariate data, batch scoring

git clone https://github.com/gchandra10/python_iot_ml_demo.git
````<span id='footer-class'>Ver 6.0.5</span>
<footer id="last-change">Last change: 2026-02-05</footer>````

[Avg. reading time: 1 minute]

Security

Ver 6.0.5

[Avg. reading time: 6 minutes]

Introduction to IoT Security Challenges

Why IoT is More Vulnerable Than Traditional Systems

Reason	Explanation
Resource Constraints	IoT devices often have limited CPU, memory, and storage, making it harder to implement standard security practices like encryption, antivirus, or firewalls.
Scale & Diversity	Tens of thousands of devices across varied vendors, architectures, and protocols – managing patches, certs, or configs becomes overwhelming.
Physical Exposure	Devices are often in uncontrolled environments – they can be physically accessed, tampered with, or stolen (e.g., smart meters, parking sensors).
Long Lifespan, Poor Updates	Devices may stay deployed for years with no update mechanism, or vendors may no longer support them. Many lack OTA update capabilities.
Default/Insecure Configurations	Hardcoded credentials, open ports, outdated firmware, and unnecessary services expose systems by default.
Lack of Standardization	There’s no universal security standard across the IoT ecosystem, leading to fragmented and inconsistent implementations.

Device-Level vs Upper-Stack Security

Layer	Focus	Security Concerns
Device-Level Security	Hardware + Embedded Software	Secure boot, firmware integrity, physical tampering, storage encryption, JTAG lock, TPM
Upper-Stack Security	Data → Middleware → Application → Cloud	AuthZ/AuthN, encrypted communication, API protection, logging, identity management, cloud IAM

Attack Surfaces - Upper Stack

Application Layer
├── Insecure APIs
├── Poor session management
├── Weak input validation (XSS, injection)
├── No rate limiting or abuse detection

Data Layer
├── Data in transit (no encryption)
├── Data at rest (unencrypted databases)
├── Insecure cloud storage (e.g., public S3 buckets)
├── Lack of data integrity checks

Communication Layer
├── MITM on MQTT/CoAP
├── Replay attacks due to lack of freshness
├── Weak cipher suites

Attack Surfaces - Lower Stack


Device Layer
├── Firmware modification
├── Physical access (port access, memory dumps)
├── Insecure boot process

Network Layer
├── Unsecured local network (e.g., Zigbee, BLE)
├── Lack of segmentation
├── Open ports/services

Supply Chain
├── Malicious firmware
├── Compromised third-party libraries
├── Fake device clones


````<span id='footer-class'>Ver 6.0.5</span>
<footer id="last-change">Last change: 2026-02-05</footer>````

[Avg. reading time: 7 minutes]

Application Layer

Insecure APIs

💥 Problem: APIs in IoT apps often expose device functionality or data but may lack proper authentication, rate limiting, or encryption.

🧪 Real-World Use Case: In 2017, hackers exploited Teddy Bear smart toys (CloudPets) via insecure APIs, exposing voice recordings of children because the API lacked auth and proper access control.

🛡️ Mitigation:

Enforce authentication (OAuth2, mutual TLS)
Implement rate limiting and throttling
Validate user authorization per endpoint
Avoid exposing internal APIs to public networks

Poor Session Management

💥 Problem: Sessions may never expire, be predictable, or lack protection (e.g., no secure flags or TLS), making them prone to hijacking.

🧪 Real-World Use Case: An IoT thermostat mobile app reused the same session token across devices. Attackers could replay the token and control home devices remotely.

🛡️ Mitigation:

Use short-lived session tokens with refresh tokens
Store tokens securely (e.g., not in localStorage)
Invalidate tokens on logout or when re-authenticating
Use secure, HTTP-only cookies with CSRF protection

Weak Input Validation (XSS, Injection)

💥 Problem: IoT devices or dashboards often allow user-defined configurations or display dynamic content — which becomes a vector for injection attacks if not sanitized.

🧪 Real-World Use Case: An attacker injected a script via an IoT smart fridge’s web UI that auto-ran on the admin dashboard, exploiting stored XSS to steal session cookies.

🛡️ Mitigation:

Sanitize all inputs on both client and server
Use parameterized queries for DB operations
Escape output in HTML/JS contexts
Validate inputs against a strict schema (e.g., using Cerberus, pydantic)

No Rate Limiting or Abuse Detection

💥 Problem: APIs or device endpoints without rate limits can be brute-forced (e.g., password guessing, replay attacks), or abused in DDoS campaigns.

🧪 Real-World Use Case: The Mirai botnet used insecure, rate-unrestricted login endpoints on IoT devices to build a massive army of bots for DDoS attacks.

🛡️ Mitigation:

Apply rate limiting per IP and user
Implement CAPTCHA or challenge-responses
Track abnormal patterns via API analytics
Use device fingerprinting or reputation servicesVer 6.0.5

[Avg. reading time: 7 minutes]

Data Layer

Data in Transit (No Encryption)

💥 Problem: IoT devices often transmit data over protocols like MQTT, CoAP, or HTTP without TLS/DTLS. Attackers on the network can sniff or manipulate this data.

🧪 Real-World Use Case: A smart water meter system in a municipality was transmitting usage data over plain HTTP. Attackers intercepted and altered readings, affecting billing.

🛡️ Mitigation:

Use TLS/DTLS for all device-server communication
Enforce certificate pinning on clients
Avoid legacy or plaintext protocols unless encapsulated securely (e.g., MQTT over TLS)

Data at Rest (Unencrypted Databases)

💥 Problem: Data stored on devices, gateways, or in the cloud might not be encrypted, making it easy for attackers with access to extract sensitive info.

🧪 Real-World Use Case: In 2020, a smart door lock vendor left unencrypted SQLite DBs in devices. Attackers extracted access logs and user PINs directly from flash memory.

🛡️ Mitigation:

Enable AES-based encryption for device-side storage
Use full-disk encryption on gateways or fog nodes
Enforce encryption at rest (e.g., AWS KMS, Azure SSE) in cloud databases

Insecure Cloud Storage (e.g., Public S3 Buckets)

💥 Problem: Cloud object storage like AWS S3 or Azure Blob often gets misconfigured as public, leaking logs, firmware, or user data.

🧪 Real-World Use Case: A fitness tracker company exposed terabytes of GPS and health data by leaving their S3 bucket public and unprotected — affecting thousands of users.

🛡️ Mitigation:

Use least privilege IAM roles for all cloud resources
Audit and scan for public buckets (AWS Macie, Prowler)
Enable object-level encryption and access logging
Set up guardrails and policies (e.g., SCPs, Azure Blueprints)

Lack of Data Integrity Checks

💥 Problem: Without integrity checks, even if data is encrypted, an attacker can alter it in transit or at rest without detection.

🧪 Real-World Use Case: A smart agriculture system relied on soil sensor readings to trigger irrigation. An attacker tampered with packets to falsify dry-soil readings, wasting water.

🛡️ Mitigation:

Use HMAC or digital signatures with shared secrets
Implement checksums or hashes (SHA-256) on stored data
Validate data consistency across nodes/cloud with audit trailsVer 6.0.5

[Avg. reading time: 4 minutes]

Communication Layer

MITM on MQTT/CoAP

💥 Problem: MQTT and CoAP are lightweight protocols often used without encryption or auth. A Man-in-the-Middle (MITM) attacker can intercept or alter traffic between device and broker/server.

🧪 Real-World Use Case: A smart lighting system using MQTT over TCP (no TLS) was hacked in a hotel. Attackers spoofed the broker and sent messages to turn off all lights remotely.

🛡️ Mitigation:

Always use MQTT over TLS (port 8883) and CoAP over DTLS
Use broker/client certificates for mutual authentication
Enable server identity verification and certificate pinning
Disable anonymous access on MQTT brokers

Replay Attacks Due to Lack of Freshness

💥 Problem: IoT protocols often lack proper timestamping, sequence numbers. An attacker can record legitimate messages and replay them to cause unintended actions.

🧪 Real-World Use Case: A smart lock system accepted the same unlock command repeatedly. An attacker replayed a recorded “unlock” MQTT message to gain unauthorized entry.

🛡️ Mitigation:

Add timestamps, or message counters to each request
Ensure servers/devices track and reject duplicates
Implement challenge-response mechanisms for critical commands
Use token expiration and freshness validation

Example

{
  "device_id": "lock01",
  "command": "unlock",
  "nonce": "839275abc123",
  "timestamp": "2025-04-01T10:23:00Z"
}
````<span id='footer-class'>Ver 6.0.5</span>
<footer id="last-change">Last change: 2026-02-05</footer>````

[Avg. reading time: 8 minutes]

Number Systems

Binary

0 and 1

Octal

0-7

Decimal

Standard Number system.

Hex

0 to 9 and A to F

Base36

A-Z & 0-9

Great for generating short uniqut IDs. Packs more information into fewer characters.

An epoch time stamp 1602374487561 (14 characters long) will be converted to 8 character long Base36 string “kg4cebk9”

Popular Use Cases:

Base 36 is used for Dell Express Service Codes and many other applications which have a need to minimize human error.

Base 36 Converter

Example : Processing 1 billion rows each hour for a day

Billion rows x 14 = 14 billion bytes = 14 GB x 24 hrs = 336 GB Billion rows x 8 = 8 billion bytes = 8 GB x 24 hrs = 192 GB

pip install base36

import base36
base36.dumps(1602374487561)
base36.loads('kg4cebk9') == 1602374487561

Base 64:

Base64 encoding schemes are commonly used when there is a need to encode binary data that needs be stored and transferred over media that are designed to deal with textual data. This is to ensure that the data remains intact without modification during transport.

Base64 is a way to encode binary data into an ASCII character set known to pretty much every computer system, in order to transmit the data without loss or modification of the contents itself.

2 power 6 = 64

So Base64 Binary values are six bits not 8 bits.

Base64 encoding converts every three bytes of data (three bytes is 3*8=24 bits) into four base64 characters.

Example:

Convert Hi! to Base64

Character - Ascii - Binary

H= 72 = 01001000

i = 105 = 01101001

! = 33 = 00100001

Hi! = 01001000 01101001 00100001

010010 000110 100100 100001 = S G k h

https://www.base64encode.org/

How about converting Hi to Base64

010010 000110 1001

Add zeros in the end so its 6 characters long

010010 000110 100100

Base 64 is SGk=

= is the padding character so the result is always multiple of 4.

Another Example

convert f to Base64

102 = 01100110

011001 100000

Zg==

Think about sending Image (binary) as JSON, binary wont work. But sending as Base64 works the best.

Image to Base64

https://elmah.io/tools/base64-image-encoder/

View Base64 online

https://html.onlineviewer.net/Ver 6.0.5

[Avg. reading time: 14 minutes]

Encryption in IoT Upper Stack

Two foundational concepts that help protect data are hashing and encryption.

Hashing & Encryption

Hashing: One-Way Fingerprint

Hashing is like creating a digital fingerprint of data. It takes input (e.g., a message or file) and produces a fixed-length hash value.

One-way function: You can’t reverse a hash to get the original data.
Deterministic: Same input = same hash.
Common use: Password storage, data integrity checks.

Use-case: When sending firmware updates to IoT devices, the server also sends a hash. The device re-hashes the update and compares — if it matches, the data wasn’t tampered with.

import hashlib
print(hashlib.sha256(b"iot-data").hexdigest())

Online Hash Generator

Encryption

Encryption transforms readable data (plaintext) into an unreadable format (ciphertext) using a key. Only those with the key can decrypt it back.

Two Types

Symmetric

Same key to encrypt and decrypt. Example: AES

ASymmetric

Public key to encrypt, private key to decrypt. Example: RSA

Use-case: Secure communication between sensors and cloud, protecting sensitive telemetry, encrypting data at rest.

sequenceDiagram
    participant Sensor
    participant Network
    participant Cloud

    Sensor->>Network: Temp = 28.5 (Plaintext)
    Network-->>Cloud: Temp = 28.5

    Note over Network: Data can be intercepted

    Sensor->>Network: AES(TLS): Encrypted Payload
    Network-->>Cloud: Encrypted Payload (TLS)
    Cloud-->>Cloud: Decrypt & Store

https://github.com/gchandra10/python_encryption_example.git

Encryption plays a critical role in securing IoT systems beyond the device level. Here’s how it applies across the upper layers of the stack:

📶 Communication Layer – Data in Transit

Purpose: Protect data from eavesdropping or tampering during transmission.

Protocol	Encryption	Use
MQTT	TLS (Port 8883)	Encrypt telemetry/control messages between device and broker
CoAP	DTLS	Lightweight encryption for constrained devices
HTTPS	TLS	Secure REST API calls between apps/cloud
WebSockets	TLS	Used in dashboards, real-time apps

Best Practices:

Enforce TLS 1.2 or higher
Use certificate pinning for mutual authentication
Disable weak ciphers (e.g., RC4, SSLv3)

Refer to our MQTT Publisher.py example

sequenceDiagram
    participant Client (paho)
    participant Broker

    Note over Client (paho): Initiates TLS connection

    Client (paho)->>Broker: ClientHello (supported TLS versions, random, cipher suites)
    Broker->>Client (paho): ServerHello + X.509 Certificate

    Note over Client (paho): Verifies broker's cert (using CA trust store)

    alt Key Exchange (e.g., ECDHE)
        Client (paho)->>Broker: Key share (public part)
        Broker-->>Client (paho): Server key share
        Note over Client (paho),Broker: Both derive shared symmetric key
    end

    Client (paho)->>Broker: Finished (encrypted with session key)
    Broker->>Client (paho): Finished (encrypted)

    Note over Client (paho),Broker: Secure channel established (TLS)

    Client (paho)->>Broker: MQTT CONNECT (encrypted, includes username/password)
    Broker-->>Client (paho): CONNACK (encrypted)

    Client (paho)->>Broker: PUBLISH sensor/temp (encrypted payload)
    Broker-->>Client (paho): PUBACK (encrypted)

💾 Data Layer – At Rest

Purpose: Prevent unauthorized access to stored data on device, gateway, or cloud.

Storage Location	Encryption Approach	Example
Device memory	AES-128/256	Encrypt sensor logs or configs
Gateway database	Full-disk + app-level AES	SQLite, InfluxDB encryption
Cloud DBs/files	Server-side + client-side encryption	AWS S3, Azure Blob, GCP Bucket

Best Practices:

Use AES-256 for data encryption
Integrate with HSMs or cloud-native KMS (e.g., AWS KMS)
Enforce encryption policies (e.g., block unencrypted uploads)

🧱 Application Layer – API & Payload Encryption

Purpose: Protect sensitive data in payloads, configs, or tokens.

Use Case	Encryption	Notes
JWT tokens	Encrypted or signed (JWE/JWS)	Prevent tampering and impersonation
Config files	Encrypted secrets	Avoid exposing credentials in firmware
OTA updates	Signed and encrypted packages	Ensures authenticity and confidentiality

Best Practices:

Use HMACs for integrity verification
Use JWE (JSON Web Encryption) for sensitive tokens
Sign firmware/images with RSA or ECC

Cloud & IAM Layer – Secrets and Identity

Purpose: Secure identity tokens, secrets, and environment variables.

Best Practices:

Encrypt secrets using cloud-native KMS (e.g., AWS KMS, Azure Key Vault)
Use tools like HashiCorp Vault to manage secrets
Apply token expiration and rotation policiesVer 6.0.5

[Avg. reading time: 7 minutes]

IoT Data Privacy

IoT devices collect sensitive personal data: location, biometrics, habits, medical info, etc.
Many devices operate continuously and often silently, leading to passive surveillance risks.
Users rarely get full control or visibility into what’s collected, stored, or shared.
Non-compliance can result in huge fines

Popular Regulations

GDPR (EU)

Applies if data subjects are EU citizens.

Focus: Consent, Right to access/erase, Data minimization, Security by design, Data portability.

HIPAA (USA)

Applies to Protected Health Information (PHI).

Focus: Confidentiality, Integrity, Availability of electronic health data.

Requires Business Associate Agreements if third parties handle data.

How to Implement Privacy in IoT Systems

Privacy by Design

Collect only necessary data
Anonymize/pseudonymize where possible
Use edge processing to reduce data sent to cloud

Security Practices

Encrypted storage & transport (TLS 1.3)
Mutual authentication (cert-based, JWT)
Secure boot & firmware validation

User Controls

Explicit opt-in for data collection
Transparent data usage policies
Easy delete/download of personal data

Audit & Monitoring

Logging access to sensitive data
Regular privacy impact assessments

What Industry is Doing Now

Company/Platform	What They Do
Apple	Local processing for Siri; minimal cloud usage
Google Nest	Centralized cloud with opt-out data sharing
AWS IoT Core	Fine-grained access policies, audit logging
Azure IoT	GDPR-compliant SDKs; data residency controls
Fitbit (Google)	HIPAA-compliant services for health data

Pros & Cons of IoT Privacy Measures

Pros	Cons
Builds trust with users	May increase latency (edge compute)
Avoids fines & legal issues	Higher infra cost (storage, encryption)
Enables secure ecosystems	Limits on innovation using personal data
Competitive differentiator	Complex to manage cross-border compliance

Data Masking

This is about obfuscating sensitive info during storage, transit, or access.

Types

Static masking: Permanent (e.g., obfuscating device ID at ingestion)
Dynamic masking: At query time (e.g., show only last 4 digits to analysts)
Tokenization: Replacing values with reversible tokens

Use Cases

Sharing data with 3rd parties without exposing PII
Minimizing insider threats
Compliance with HIPAA/GDPR

Tools & Approaches

Telegraf Preprocessor modules (Static Masking)
SQL-level masking (e.g., MySQL, SQL Server)
API gateways that redact fields
Custom middleware that masks data at stream-level (e.g., MQTT → InfluxDB)Ver 6.0.5

[Avg. reading time: 3 minutes]

Auditing in IoT

Auditing means tracking who accessed what data, when, and how.

What to Audit

Device activity logs (e.g., sensor status, config changes)
Data access logs (who/what accessed sensitive data)
APIs usage (especially those that write or extract data)
Firmware updates and remote commands

Best Practices

Immutable logs (store in write-once S3 buckets or blockchain-based logs)
Time-synced entries (use NTP to standardize timestamps)
Correlation IDs to track actions across services

Tools

ELK stack (Elastic, Logstash, Kibana)
Loki + Grafana for lightweight logging

Retention Policies

Avoids data hoarding → reduces liability
Required by laws (e.g., GDPR’s “right to be forgotten”)

Suggested timelines (depends on business)

Data Type	Retention Period
Raw sensor data	7–30 days
Aggregated metrics	6–12 months
User consent logs	5–7 years (compliance)
Health data (HIPAA)	6+ years

Tiered storage (hot → warm → cold → delete)
Lifecycle rules (e.g., in S3, Azure Blob)
Automatic expiry using TTL in InfluxDB, etc.Ver 6.0.5

[Avg. reading time: 0 minutes]

Edge Computing

[Avg. reading time: 4 minutes]

Introduction

Edge computing enables data processing closer to the source, enhancing efficiency and reducing latency in various applications.

Use Cases

Autonomous Vehicles: Edge computing allows self-driving cars to process data from sensors in real-time, facilitating immediate decision-making crucial for navigation and safety.

Smart Cities: By processing data locally, edge computing supports applications like smart lighting and traffic management, leading to improved urban infrastructure efficiency.

Industrial Automation: In manufacturing, edge computing enables real-time monitoring and control of machinery, enhancing operational efficiency and predictive maintenance.

Healthcare: Medical devices equipped with edge computing can analyze patient data on-site, allowing for immediate responses and reducing reliance on centralized data centers.

Agriculture: Edge computing allows farms to process data from IoT sensors on-site, enabling quick decisions on irrigation and harvesting, thus improving efficiency.

Supply Chain/Warehousing: Edge solutions optimize inventory tracking and management, leading to improved efficiency and reduced operational costs.

Some Popular Tools

DuckDB
AWS Greengrass
Azure IoT Edge
Apache IoTDB
FogFlow

Challenges in Implementing Edge Computing:

Security Risks: Distributed edge devices can be more vulnerable to physical and cyber threats.

Scalability Issues: Managing numerous edge devices across various locations requires robust infrastructure and coordination.

Interoperability: Ensuring seamless communication among diverse devices and platforms remains a significant hurdle.Ver 6.0.5

[Avg. reading time: 17 minutes]

Duck DB

DuckDB is a fast, open-source, in-process analytical database designed for efficient data analysis.

Its Portable - Single Binary - No external dependencies.

Key Features

Embeddable Design: Integrates directly into applications without the need for a separate server, simplifying deployment.

Columnar Storage: Optimized for analytical workloads by storing data in columns, enhancing query performance.

SQL Support: Offers a robust SQL interface for complex queries, compatible with various programming languages.

Extensibility: Supports extensions for custom data types, functions, and file formats.

Automatic Parallelism: DuckDB has improved its automatic parallelism capabilities, meaning it can more effectively utilize multiple CPU cores without requiring manual tuning. This results in faster query execution for large datasets.

Parquet File Improvements: DuckDB has improved its handling of Parquet files, both in terms of reading speed and support for more complex data types and compression codecs. This makes DuckDB an even better choice for working with large datasets stored in Parquet format.

Query Caching: Improves the performance of repeated queries by caching the results of previous executions. This can be a game-changer for analytics workloads with similar queries being run multiple times.

Download the CLI Client

Windows
Mac
Linux).
For other programming languages, visit https://duckdb.org/docs/installation/
Unzip the file.
Open Command / Terminal and run the Executable.

DuckDB in Data Engineering

Download orders.parquet from

https://github.com/duckdb/duckdb-data/releases/download/v1.0/orders.parquet

More files are available here

https://github.com/cwida/duckdb-data/releases/

Open Command Prompt or Terminal

duckdb –ui orders.duckdb

duckdb

# Create / Open a database

.open orders.duckdb

Duckdb allows you to read the contents of orders.parquet as is without needing a table. Double quotes around the file name orders.parquet is essential.

describe table  "orders.parquet"

select * from "orders.parquet" limit 3;

DuckDB supports CTAS syntax and helps to create tables from the actual file.

show tables;

create table orders  as select * from "orders.parquet";

select count(*) from orders;

DuckDB supports parallel query processing, and queries run fast.

This table has 1.5 million rows, and aggregation happens in less than a second.

select now(); select o_orderpriority,count(*) cnt from orders group by o_orderpriority; select now();

DuckDB also helps to convert parquet files to CSV in a snap. It also supports converting CSV to Parquet.

COPY "orders.parquet" to 'orders.csv'  (FORMAT "CSV", HEADER 1);Select * from "orders.csv" limit 3;

It also supports exporting existing Tables to Parquet files.

COPY "orders" to  'neworder.parquet' (FORMAT "PARQUET");

DuckDB supports Programming languages such as Python, R, JAVA, node.js, C/C++.

DuckDB ably supports Higher-level SQL programming such as Macros, Sequences, Window Functions.

Get sample data from Yellow Cab

https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page

Copy yellow cabs data into yellowcabs folder

create table taxi_trips as select * from "yellowcabs/*.parquet";

SELECT
    PULocationID,
    EXTRACT(HOUR FROM tpep_pickup_datetime) AS hour_of_day,
    AVG(fare_amount) AS avg_fare
FROM
    taxi_trips
GROUP BY
    PULocationID,
    hour_of_day;

Extensions

https://duckdb.org/docs/extensions/overview

INSTALL json;
LOAD json;

select * from 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/canada.json';

describe 'select * from https://github.com/duckdb/duckdb-data/releases/download/v1.0/canada.json';

with cte as (
SELECT 
  unnest(features) as data
FROM 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/canada.json')
select data.type, data.properties.name from cte

Load directly from HTTP location

select * from 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv'

Press CTRL+D to quit the CLI.

Duck DB in IoT

Edge Analytics: DuckDB enables “edge analytics, where the Python program performs real-time analysis of the data close to the source.” This reduces the need to send all raw data to the cloud, decreases latency, and enables immediate insights like anomaly detection.

Data Preprocessing and Aggregation: DuckDB is “lightweight and designed for complex analytics queries, making it ideal for handling aggregated IoT data on the edge before it is sent to more comprehensive reporting tools.”

Real-Time Processing: Enables immediate analysis of streaming data from sensors, facilitating prompt decision-making.

Privacy: Personal data analysis on edge devices, and pre-processing data for machine learning, especially when data privacy is paramount.

Resource Efficiency: Its minimal resource footprint makes it suitable for deployment on devices with limited computational capabilities.

Popular Edge Devices that support DuckDB

Single-Board computers

Raspberry Pi
Orange Pi 5 Plus
ASUS Tinker Board S
Banana Pi M5
Rock Pi 4

---Ver 6.0.5

[Avg. reading time: 2 minutes]

IFTTT

If This Then That

Ifttt

IFTTT (If This Then That) is primarily a cloud-based automation platform that connects various web services and devices to enable users to create simple conditional statements, known as applets. These applets allow one service or device to trigger actions in another, facilitating automation across different platforms.

IFTTT facilitates communication between cloud services and edge devices, enabling users to create automations that leverage both cloud-based processing and local edge computing capabilities. However, the core functionality of IFTTT itself remains cloud-centric.Ver 6.0.5

[Avg. reading time: 1 minute]

IoT Cloud Computing

Ver 6.0.5

[Avg. reading time: 9 minutes]

Introduction

Definitions

Hardware: physical computer / equipment / devices

Software: programs such as operating systems, Word, Excel

Web Site: Readonly web pages such as company pages, portfolios, newspapers

Web Application: Read Write - Online forms, Google Docs, email, Google apps

Advantages of Cloud for IoT

Category	Advantage	Description
Scalability	Elastic infrastructure	Easily handle millions of IoT devices and sudden traffic spikes
Storage	Virtually unlimited data storage	Ideal for time-series sensor data, logs, images, video streams
Processing Power	High compute availability	Offload heavy ML, analytics, and batch processing to cloud
Integration	Seamless with APIs, services	Easily connect to AI/ML tools, databases, event processing
Cost Efficiency	Pay-as-you-go model	No upfront infra cost; optimize for usage
Global Reach	Edge zones and regional data centers	Connect globally distributed devices with low latency
Security	Built-in IAM, encryption, monitoring	Token-based auth, TLS, audit logs, VPCs
Rapid Development	PaaS tools and SDKs	Build, test, deploy faster using managed services
Maintenance-Free	No server management	Cloud handles uptime, patches, scaling
Disaster Recovery	Redundancy and backup	Automatic replication and geo-failover
Data Analytics	Integrated analytics platforms	Use BigQuery, Databricks, AWS Athena etc. for deep insights
Updates & OTA	Easy over-the-air updates to devices	Roll out firmware/software updates via cloud
Digital Twins	Model, simulate, and control remotely	Create cloud-based digital representations of devices/systems

Types of Cloud Computing in IoT Context

Public Cloud (AWS, Azure, GCP, etc.)

Usage: Most common for IoT startups, scale-outs, and global deployments.

Easy to onboard devices via managed IoT hubs
Global reach with edge zones
Rich AI/ML toolsets (SageMaker, Azure ML, etc.)

Example: A smart home company using AWS IoT Core + DynamoDB.

Private Cloud

Usage: Enterprises with strict data policies (e.g., manufacturing, healthcare).

More control over data residency
Can comply with HIPAA, GDPR, etc.
Custom security and network setups

Example: A hospital managing patient monitoring devices on their private OpenStack cloud.

Hybrid Cloud

Usage: Popular in industrial IoT (IIoT) and smart infrastructure.

Store sensitive data on-prem (private), offload non-critical analytics to cloud (public)
Low latency control at the edge, cloud for training ML models

Example: A smart grid using on-prem SCADA + Azure for demand prediction.

Cloud Types in IoT – Comparison

Cloud Type	Description	IoT Use Case Example	Advantages	Ideal For
Public Cloud	Hosted by providers like AWS, Azure, GCP	Smart home devices using AWS IoT Core	Scalable, global reach, pay-as-you-go	Startups, large-scale consumer IoT
Private Cloud	Dedicated infra for one org (e.g., on-prem OpenStack)	Hospital storing patient monitoring data securely	More control, security, compliance	Healthcare, government, regulated industries
Hybrid Cloud	Mix of public + private with data/apps moving between	Factory with local control + cloud analytics	Flexibility, optimized costs, lower latency	Industrial IoT, utilities, smart cities

[Avg. reading time: 13 minutes]

Terms to Know

Eventual Consistency

Eventual consistency is a model used in distributed systems, including some databases and storage systems. It allows for temporary inconsistencies between replicas of data across nodes, with the guarantee that all replicas will eventually converge to the same state.

Imagine a distributed database with three nodes (Node A, Node B, and Node C) that store a value for a particular key, “item_stock.” The initial value for “item_stock” is 10 on all nodes.

Node A: item_stock = 10
Node B: item_stock = 10
Node C: item_stock = 10

Now, a user wants to update the value of “item_stock” to 15. They send a write request to Node A, which updates its local value:

Node A: item_stock = 15
Node B: item_stock = 10
Node C: item_stock = 10

The system is inconsistent at this point, as different nodes have different values for “item_stock.” However, the eventual consistency model allows this temporary inconsistency. Over time, the update will propagate to the other nodes:

Node A: item_stock = 15
Node B: item_stock = 15
Node C: item_stock = 10

Eventually, all nodes will have the same value:

Node A: item_stock = 15
Node B: item_stock = 15
Node C: item_stock = 15

During the inconsistency, read requests to different nodes might return different results. Eventual consistency does not guarantee that all clients will immediately see the latest update. However, it does ensure that given enough time without further updates, all nodes will eventually have the same data.

Best Suited in Smart Home, Tracking Cars. Not great for Financial, Real-Time decision making.

Optimistic Concurrency

Conflicts are rare
Latency helps

Optimistic concurrency is a strategy used in databases and distributed systems to handle concurrent access to shared resources, like a dataset, without requiring locks. Instead of locking resources, optimistic concurrency relies on detecting conflicting changes made by multiple processes or users and resolving them when necessary.

| item_id| item_nm | stock  |
+--------+--------+--------+
| 1      | Apple  | 10     |
| 2      | Orange | 20     |
| 3      | Banana | 30     |
+--------+--------+--------+

Imagine two users, UserA and UserB, trying to update the apple stock simultaneously.

User A’s update:

UPDATE inventory SET stock = stock + 5 WHERE item_id = 1;

User B’s update:

UPDATE inventory SET stock = stock - 3 WHERE item_id = 1;

Using optimistic concurrency, both User A, and User B can execute their updates without waiting for the other to complete. However, after both updates are executed, the system checks for conflicts. If there were conflicts, the system would throw an exception, and one of the users would have to retry their transaction.

Suited for Smart Home Devices. Not suited on High-frequency multiple sensor writes.

Monotonic Reads

Once you read a value, you should never see an older value in the future — your view of data only moves forward in time.

Example: Meter Reading

Time	Reading from Cloud	Notes
10:00	102 kWh	✅ Normal
10:01	103 kWh	✅ Increasing
10:02	101 kWh	❌ Violates monotonic read
10:03	104 kWh	✅ Back to expected progression

Used to avoid inconsistent bill, GPS tracking and so on..

Last Write Wins (LWW)

Last Write Wins is a conflict resolution strategy used when multiple updates happen to the same data — the update with the latest timestamp wins.

Light is controlled by two apps at the same time. Whichever write has the newer timestamp “wins” and becomes the final state.

Alarm clock, set at 6:00am and manually you override it so later takes precedence.

Term	Why It’s Relevant in IoT
Eventual Consistency	Common in IoT where edge devices sync sensor data later (e.g., offline truck GPS)
Optimistic Concurrency	Useful in device updates or dashboards where conflicts are rare but possible
Causal Consistency	Ensures the correct order of dependent events (e.g., open door → log access)
Read-Your-Writes	Important for control systems — if a user switches a device “ON”, it reflects instantly
Monotonic Reads	Ensures that device readings don’t go “back in time” due to sync delays
Last Write Wins (LWW)	Often used in sensor data stores to resolve conflicting writes from multiple sources

[Avg. reading time: 5 minutes]

Types of Cloud

SaaS – Software as a Service

SaaS provides ready-to-use cloud applications. Example: Google Docs, Gmail. In IoT, it offers real-time dashboards, alerts, and analytics.

Pros

No infrastructure management
Fast deployment
Built-in analytics and alerts

Cons

Limited customization
Possible vendor lock-in
Data stored in vendor cloud

PaaS – Platform as a Service

PaaS provides the tools and services to build and deploy IoT apps, including SDKs, APIs, device management, rules engines, and ML pipelines.

Example: HiveMQ (MQTT)

Pros

Scalable and customizable
Device lifecycle and security handled
Integration with ML, analytics tools

Cons

Learning curve
Requires cloud expertise
Still dependent on vendor ecosystem

IaaS – Infrastructure as a Service

IaaS gives you virtual machines, storage, and networking. In IoT, it lets you build fully custom pipelines from scratch.

Example: Virtual Machine

Pros

Full control over environment
Highly customizable
Can install any software

Cons

You manage everything: scaling, patching, backups
Not beginner-friendly
Higher ops burden

FaaS – Function as a Service

FaaS lets you run small pieces of code (functions) in response to events, like an MQTT message or sensor spike. Also called serverless computing.

Example: AWS Lambda, Azure Functions

When a temperature sensor sends a value > 90°C to MQTT, a Lambda function triggers an alert and stores the value in a DB.

Pros

No need to manage servers
Scales automatically
Event-driven and cost-effective

Cons

Cold start delays
Limited execution time and memory
Stateless only

Ver 6.0.5

[Avg. reading time: 10 minutes]

IoT Specific Cloud

BaaS – Backend as a Service

BaaS provides backend features like authentication, real-time databases, and cloud functions, useful for mobile or lightweight IoT apps.

Example: Firebase. To some extent OAuth services like Google.

Pros

Easy to integrate with mobile/web apps
Realtime sync and authentication
Fast prototyping

Cons

Not designed for heavy industrial use
Vendor limitations on structure/storage
Less control over backend logic

DaaS – Device as a Service

DaaS bundles hardware devices with software, support, and cloud services, often with subscription billing.

A logistics company rents connected GPS from a provider, who also offers a dashboard and device monitoring as part of the plan.

Renting (house, car etc..)

Pros

No hardware management
Subscription model (OpEx > CapEx)
Full-stack support

Cons

Ongoing cost
Tied to specific hardware/software ecosystem
Less flexibility

Edge-aaS – Edge-as-a-Service

Edge-aaS enables local processing at the edge, closer to IoT devices. It reduces latency and bandwidth usage by handling logic locally.

Example: AWS Greengrass

Run Everything Locally

Camera sends input to Pi
Greengrass Lambda processes it in real time
Result (e.g., “object: person”) can be:
Logged locally
Sent to AWS via MQTT
Triggered to send message

Pros

Low latency, offline capable
Reduces cloud traffic and cost
Supports on-device inference

Cons

More complex deployment
Device resource limitations
Must sync carefully with cloud

DTaaS – Digital Twin as a Service

DTaaS offers cloud-hosted platforms to create, manage, and simulate digital replicas of physical systems (machines, buildings, etc.).

Example: Siemens MindSphere

A manufacturing firm models its conveyor system using MindSphere to monitor, predict failures, and optimize throughput using simulated conditions.

For understanding - Flight / Video Game Simulator

Pros

Powerful simulation and monitoring
Real-time mirroring of assets
Integrates well with AI/ML

Cons

Complex to model accurately
Requires continuous data flow
Can be costly at scale

Cloud Service Models for IoT

Service Model	Full Form	IoT-Specific Role/Usage	Examples
SaaS	Software as a Service	Ready-to-use IoT dashboards, analytics, asset tracking	Ubidots, ThingSpeak, AWS SiteWise, Azure IoT Central
PaaS	Platform as a Service	Build, deploy, manage IoT apps with SDKs and device APIs	Azure IoT Hub, AWS IoT Core, Google Cloud IoT (legacy), Kaa IoT
IaaS	Infrastructure as a Service	Run VMs, store raw sensor data, scale infra	AWS EC2, Azure VMs, GCP Compute Engine
FaaS	Function as a Service	Event-driven micro-processing (e.g., react to MQTT events)	AWS Lambda, Azure Functions, Google Cloud Functions
DaaS	Device as a Service	Subscription-based hardware + cloud updates	Cisco DaaS, HP DaaS
BaaS	Backend as a Service	Auth, DB, messaging backend for IoT apps	Firebase, Parse Platform
Edge-aaS	Edge-as-a-Service	Run ML + logic at the edge, sync with cloud	AWS Greengrass, Azure IoT Edge, ClearBlade
DTaaS	Digital Twin as a Service	Simulate, monitor, and control physical devices virtually	Siemens MindSphere, PTC ThingWorx

[Avg. reading time: 1 minute]

IoT Cloud - Pros and Cons

Pros	Cons
Scalability	Latency
Data Storage	Connectivity Dependency
Integrated Services	Privacy Concerns
Rapid Development	Recurring Costs
Remote Access	Vendor Lock-In
Security Features	Complexity
Disaster Recovery	Data Transfer Costs

[Avg. reading time: 8 minutes]

High Availability

High Availability refers to how much uptime (availability) a system guarantees over a period — usually per year.

It’s expressed using “nines” — like 99%, 99.9%, etc. More 9’s = Less downtime.

High Availability – Nines and Downtime

Availability	name	Allowed Downtime per Year	Per Month	Use Case Example
99%	“Two nines”	~3.65 days	~7.2 hours	Small apps, dev/test environments
99.9%	“Three nines”	~8.76 hours	~43.8 mins	Basic web services
99.99%	“Four nines”	~52.6 minutes	~4.38 mins	Payment systems, APIs
99.999%	“Five nines”	~5.26 minutes	~26.3 seconds	Medical, Telco, IoT control loops
99.9999%	“Six nines”	~31.5 seconds	~2.63 seconds	Mission-critical systems

For IoT

Smart Home Light Bulb → 99% is okay (a few hours of downtime is fine)
Smart Grid Control System → 99.999% is essential (every second counts)
Medical IoT (e.g., Heart Monitor) → Needs high availability

Beyond Just Nines

Concept	Why It Matters in IoT + Cloud
Redundancy	Backup sensors, edge nodes, and cloud instances ensure system keeps running if one fails
Failover Systems	Automatically switch to standby components during failure
Load Balancing	Spreads traffic across devices or cloud zones to prevent overload
Latency vs Availability	A service may be “up” but still slow — availability ≠ performance
Disaster Recovery (DR)	Ensures systems and data can recover from outages or disasters
Geographic Distribution	Spreading across regions/availability zones improves uptime and resilience
SLA (Service Level Agreement)	Understand what cloud vendors promise and what downtime you’re actually allowed
Edge Processing	Enables critical operations to continue even if cloud is unreachable (e.g., AWS Greengrass)
Monitoring & Alerting	Detect and respond to failures fast using tools like CloudWatch, Datadog, Prometheus
Cost vs HA Tradeoff	Higher availability usually means higher costs — design smart based on use case

Fun Discussion Pointers

To design each system, do we need Edge computing or Fog computing, should we go to Cloud if so how many 9’s we need.

How many 9’s we need for smart light switch at home?
How many 9’s we need for smart light switch at Bank ATM?
A temperature sensor on a cold-storage truck is sending data to the cloud.
You’re designing an IoT wearable for elderly patients that detects falls. What should be the design?
What happens if the MQTT broker goes down? How would you make it fault-tolerant?
A weather station publishes sensor data every 15 minutes. Do they need Highly Available system?Ver 6.0.5

[Avg. reading time: 4 minutes]

Good Reads

ESP32 - MicroPython : https://github.com/gchandra10/esp32-demo

IoT Arduino Projects

Projecthub

Autodesk

Hackster Seed Studio

MQTT Explorer

GUI desktop tool for inspecting MQTT topics & messages

mqtt-explorer

Wokwi

Online Arduino + ESP32 simulator. No hardware needed. VSCode / JetBrains supported.

wokwsi

Node Red

Visual flow-based tool for IoT logic and automation

nodered

Career Path

RoadMap

Example: RoadMap for Python Learning

Cloud Providers

Run and Code Python in Cloud. Free and Affordable plans good for demonstration during Interviews.

Python Anywhere

Cheap/Affordable GPUs for AI Workloads

RunPod

AI Tools

NotebookLM

---Ver 6.0.5

[Avg. reading time: 4 minutes]

Notebooks vs IDE

Feature	Notebooks (.ipynb)	Python Scripts (.py)
Use Case - DE	Quick prototyping, visualizing intermediate steps	Production-grade ETL, orchestration scripts
Use Case - DS	EDA, model training, visualization	Packaging models, deployment scripts
Interactivity	High – ideal for step-by-step execution	Low – executed as a whole
Visualization	Built-in (matplotlib, seaborn, plotly support)	Needs explicit code to save/show plots
Version Control	Harder to diff and merge	Easy to diff/merge in Git
Reusability	Lower, unless modularized	High – can be organized into functions, modules
Execution Context	Cell-based execution	Linear, top-to-bottom
Production Readiness	Poor (unless using tools like Papermill, nbconvert)	High – standard for CI/CD & Airflow etc.
Debugging	Easy with cell-wise changes	Needs breakpoints/logging
Integration	Jupyter, Colab, Databricks Notebooks	Any IDE (VSCode, PyCharm), scheduler integration
Documentation & Teaching	Markdown + code	Docstrings and comments only
Unit Tests	Not practical	Easily written using `pytest`, `unittest`
Package Management	Ad hoc, via `%pip`, `%conda`	Managed via `requirements.txt`, `poetry`, `pipenv`
Using Libraries	Easy for experimentation, auto-reloads supported	Cleaner imports, better for dependency resolution

[Avg. reading time: 5 minutes]

Assignments

Note 1: LinkedIn Learning is Free for Rowan Students.

Note 2: Submission should be LinkedIn Learning Certificate URLs. (No Screenshots or Google Docs or Drives)

Assignment 1 - Python

Getting Started with Python

Assignment 2 - Ethical Hacking IoT Devices

Ethical Hacking IoT Devices

Assignment 3 - Learning Git and GitHub

Learning Git and GitHub

Assignment 4 - Raspberry Pi

– Raspberry Pi Essential Training

Assignment 5 - Cloud

Cloud Computing

Extra Credit Choices (Optional)

(Extra credit should be submitted before the Finals.)Ver 6.0.5

[Avg. reading time: 3 minutes]

Answers

Chapter 1

For each of the following IoT components, identify whether it belongs to the upper stack or the lower stack and explain why.

1.1 Upper stack - It deals with user interaction and control applications.
1.2 Lower stack - It involves data collection from the environment.
1.3 Lower stack - It handles data transport and protocol translation.
1.4 Upper stack - It focuses on data processing and analytics.
1.5 Lower stack - It manages device operations and hardware control.

Determine whether the following statements are true or false.

2.1 False - Edge computing is generally considered part of the lower stack.
2.2 False - These are aspects of the upper stack.
2.3 True - It involves hardware (lower stack) and application (upper stack) components.
2.4 False - They are used for low-bandwidth, short-range communication.
2.5 True - Predictive maintenance uses processed data and analytics from the upper stack.Ver 6.0.5

Adv - IoT Upper Stack