Every modern factory is full of PLCs — programmable logic controllers that run conveyor belts, monitor temperatures, count products, and control valves. They do their job reliably, day after day. But the data they generate? That usually stays locked inside, invisible to anyone without a SCADA licence or a ladder-logic background.
In this article I show how Python can bridge the gap between the operational technology (OT) world of PLCs and the information technology (IT) world of databases, dashboards, and machine learning. Whether you want a simple live overview for the production floor or a full analytics pipeline, Python is the glue that makes it possible — without replacing anything that already works.
The IT/OT divide: why it exists and why it matters
Walk into any manufacturing company and you will find two worlds that barely talk to each other. On one side is OT — the PLCs, HMIs, and SCADA systems that keep the factory running. On the other side is IT — the ERP systems, databases, and business tools that management uses to make decisions. The gap between them is not just technical; it is cultural. OT engineers prioritise uptime and determinism. IT teams prioritise flexibility and data access. Both are right, and both are needed.
The problem is that valuable production data — cycle times, reject rates, energy consumption, machine states — sits on the OT side with no easy way to get it to the people who need it. Production managers rely on paper logs or weekly reports. Quality engineers download CSV files from SCADA manually. And the management team makes decisions based on data that is days or weeks old.
Python sits perfectly in the middle. It is flexible enough to speak OT protocols like Modbus and OPC UA, yet powerful enough to feed databases, build dashboards, and run analytics. It does not replace the PLC or the SCADA — it extends their reach into the IT world.
Communication protocols: how to talk to a PLC
Before you can do anything useful with PLC data, you need to get the data out. That means speaking one of the industrial communication protocols the PLC supports. The two most common are Modbus and OPC UA, and each has different strengths.
Modbus (TCP / RTU)
- Simple, well-understood protocol from 1979
- Supported by nearly every PLC on the market
- Read/write registers directly by address
- TCP variant runs over standard Ethernet
- RTU variant uses RS-485 serial
- No built-in security or discovery
- Python library: pymodbus
OPC UA
- Modern, platform-independent standard
- Rich data model with types and hierarchy
- Built-in security (encryption, authentication)
- Supports subscriptions for event-driven data
- Browse available data points dynamically
- More complex to set up than Modbus
- Python library: opcua / asyncua
My rule of thumb: If the PLC already exposes an OPC UA server (most modern WAGO, Siemens, and Beckhoff PLCs do), use OPC UA — it is more structured and safer. If you are working with older equipment or simple sensors, Modbus TCP is often the fastest path. In many real-world projects I use both, depending on the machine.
MQTT as middleware: decoupling producers and consumers
Once you can read data from a PLC, you face a design question: should your Python dashboard connect directly to the PLC, or should there be something in between? For a single machine in a small setup, a direct connection works fine. But the moment you have multiple machines, multiple consumers (dashboard, database, alerting), or need to handle network interruptions gracefully, you want a message broker.
MQTT is the standard choice in industrial IoT. It is lightweight, supports quality-of-service levels, and decouples data producers (the PLC readers) from data consumers (dashboards, databases). A small Python script reads from the PLC via Modbus or OPC UA and publishes the values to an MQTT broker. Any number of consumers can then subscribe to the topics they care about, without the PLC knowing or caring.
This architecture scales beautifully. Adding a new machine means adding one more publisher script. Adding a new dashboard means adding one more subscriber. Nothing else changes. The broker (Mosquitto is my usual choice) handles routing and buffering.
Building the data pipeline step by step
Here is the practical workflow I follow in most projects. It is deliberately incremental — each step delivers value on its own, so you do not have to build the entire pipeline before seeing results.
Identify the data points
Talk to the operators and the PLC programmer. Which registers hold the values you need? What are the data types, units, and update rates? Document everything in a simple spreadsheet before writing a single line of code.
Read and validate
Write a small Python script that reads those registers and prints them to the console. Verify that the values match what the HMI shows. This catches byte-order issues, scaling factors, and wrong addresses early.
Publish to MQTT
Wrap the reader in a loop, publish each reading to an MQTT topic with a timestamp. Now any consumer on the network can access the data in real time without touching the PLC directly.
Store in a time-series database
Subscribe to the MQTT topics and write the data to a time-series database like InfluxDB or TimescaleDB. This gives you history, trend analysis, and the ability to query arbitrary time ranges.
Visualise
Connect a dashboard tool (Grafana, custom web app, or even a simple Python Dash application) to the database. Build the views that operators and managers actually need — not everything you can show, but what helps them make decisions.
Key insight: Steps 1 and 2 alone can take a day. Steps 3-5 take a few more days. Within a week you can have a working pipeline from PLC to dashboard. That is fast enough to prove value before anyone questions the investment.
Real-time vs batch: choosing the right rhythm
Not every data point needs to be updated every second. A common mistake is to treat everything as real-time, which floods the network and the database with data that nobody looks at in real time anyway. I split data into three tiers:
Real-time (< 1 second)
Safety-critical values, active alarms, machine states. These need to be visible immediately. Use MQTT with QoS 1 and a live dashboard widget.
Near-real-time (1-60 seconds)
Production counters, temperatures, flow rates. Updated every few seconds is fine. This covers most operational dashboards and trend charts.
Batch (minutes to hours)
Energy totals, shift summaries, quality reports. Aggregated data that is calculated periodically. Store raw data and compute summaries on a schedule.
Choosing the right tier for each data point keeps network traffic manageable, database size reasonable, and dashboard performance snappy. Over-polling a PLC can even interfere with its control cycle, so always check the PLC programmer's recommendations for safe polling intervals.
Data storage: why time-series databases matter
You could store PLC data in a regular SQL database, and for small setups that works fine. But once you are logging dozens of data points every few seconds across multiple machines, a general-purpose database starts to struggle. Time-series databases are designed exactly for this pattern: large volumes of timestamped data with fast writes and efficient range queries.
InfluxDB is my default choice for most projects. It handles high write throughput, compresses historical data automatically, and integrates natively with Grafana. For projects that need SQL compatibility or complex joins with business data, TimescaleDB (a PostgreSQL extension) is an excellent alternative — you get time-series performance with the full power of SQL.
Whichever you choose, think about retention policies early. Raw data at 1-second resolution generates gigabytes per month. After a few weeks, you rarely need second-level granularity — minute or hour averages suffice. Set up automatic downsampling so the database stays fast and storage costs stay reasonable.
Dashboards and visualisation
The dashboard is where all the effort pays off. It is the interface between the data and the people who make decisions. I have used several approaches, and the right one depends on the audience and the complexity of the project.
Grafana
- Excellent for time-series data and trends
- Native InfluxDB and MQTT support
- Fast to set up with pre-built panels
- Alerting built in (email, Slack, webhook)
- Best for technical or semi-technical users
Custom web dashboard
- Full control over layout and branding
- Tailored to specific workflows
- Better for non-technical end users
- Can integrate business logic and navigation
- More development effort up front
For most industrial projects, I start with Grafana because it delivers value within hours. If the project grows and the audience shifts from engineers to managers, I build a custom dashboard that hides complexity and focuses on the three or four numbers that actually drive decisions.
Security: do not forget the factory network
The moment you connect a PLC to anything on the IT network, security becomes critical. PLCs were designed to be reliable, not secure. Most Modbus connections have no authentication at all. That does not mean you cannot build a safe system — it means you need to think about network architecture carefully.
Network segmentation
Keep the OT network physically or logically separated from the IT network. Your Python gateway sits in a DMZ between them, with firewall rules that only allow the specific traffic needed.
Read-only access
In most analytics and dashboard scenarios, you only need to read from the PLC. Configure your connection as read-only and block write commands at the protocol level. Never give a dashboard the ability to change a setpoint.
Use OPC UA where possible
OPC UA supports TLS encryption and certificate-based authentication out of the box. It is designed for exactly this use case. If security is a concern (and it should be), OPC UA is strongly preferred over Modbus.
Scaling: from one machine to a factory
A common trajectory I see: someone builds a Python script that reads one PLC and shows data on a screen next to the machine. It works great. Then the boss asks: "Can we do this for all 12 machines?" This is where architecture matters.
The key is to keep each component independent. One reader script per machine (or per PLC), all publishing to the same MQTT broker. One database receiving from all topics. One or more dashboards pulling from the database. Adding a machine means deploying one more reader script and adding a dashboard panel. Nothing else needs to change.
For deployment, I typically use Docker containers on a small industrial PC (a fanless mini-PC or even a Raspberry Pi 4 for lighter loads). Each reader script runs in its own container, with automatic restart on failure. The MQTT broker, database, and Grafana each get their own container. Docker Compose ties it all together in a single configuration file that anyone can deploy.
Practical tip: Start with one machine and get it right. Then template the reader script so adding a new machine is just a matter of copying a configuration file with the new PLC address and register map. I have seen factories go from zero to full coverage in two weeks once the template is proven.
Common pitfalls and lessons learned
After building several of these pipelines, here are the mistakes I see most often — including ones I have made myself.
Common mistakes
- Polling the PLC too fast and disrupting its control cycle
- Ignoring byte order (big-endian vs little-endian) when reading float values
- No error handling for network drops — the script crashes silently
- Storing everything at maximum resolution forever
- Building the dashboard first and the pipeline second
- Not involving the PLC programmer early enough
What works well
- Start with the 3-5 most important data points, not everything
- Validate readings against the HMI before trusting them
- Implement reconnection logic with exponential backoff
- Use retention policies from day one
- Build the pipeline end-to-end first, then widen it
- Sit with operators to understand what they actually want to see
The biggest lesson: This is not primarily a software project. It is a communication project. The Python code is straightforward. The hard part is understanding what data exists, what it means, and what decisions it should support. Spend the first day talking to people, not writing code.
Conclusion: Python as the bridge
Python is uniquely suited to bridge the IT/OT gap. It speaks industrial protocols, it has rich libraries for data processing and visualisation, and it is easy enough that a single engineer can build and maintain the entire pipeline. You do not need a team of software developers or an expensive middleware platform. A well-structured Python application, a message broker, and a time-series database can unlock years of trapped production data.
The result is not just a pretty dashboard. It is the ability to answer questions that were previously unanswerable: How long did machine 3 actually run last Tuesday? What is the average cycle time trend over the past month? Which shift has the lowest reject rate? These answers drive better decisions, and better decisions drive better products.
Interested in connecting your PLCs to a modern data stack? I work with WAGO, Siemens, and other PLC platforms and can set up a complete pipeline — from protocol integration to a dashboard your team will actually use. Get in touch for a free discovery call.