Six weeks. That’s how long the Maximo-to-data-warehouse integration at a coal handling plant had been silently failing before anyone noticed. Six weeks of work orders, equipment status changes, and maintenance completions that never reached the analytics platform. The dashboards showed a flatline that everyone assumed meant “nothing happened” rather than “the integration is dead.”

The root cause was a JMS connection to the outbound message queue that had dropped during a network maintenance window. Maximo continued operating normally — planners raised work orders, technicians completed them, supervisors approved them. The Integration Framework simply stopped sending the outbound messages and logged nothing about it.

How MIF outbound processing works (and fails)

Maximo’s Integration Framework sends data to external systems through a pipeline: a database event triggers an outbound message, the message is built from an Object Structure definition, optionally transformed via XSL, and dropped onto a JMS queue for delivery.

The architecture is sound in principle. JMS queues provide resilience — if the receiving system is down, messages wait in the queue. The problem is that the queue broker connection itself can fail, and Maximo’s behaviour when that happens ranges from “retry silently” to “stop trying and don’t tell anyone.”

In our case, the WebSphere message queue restarted after the network maintenance, but the Maximo cron task that processes the outbound queue didn’t re-establish its connection. The cron task was technically running — it showed as active in Maximo’s admin console — but it was cycling without processing anything. No error in SystemOut.log. No entry in the Maximo log. No integration message with a failed status. Nothing.

The three failure modes that catch everyone

1. The silent JMS disconnect

This is what hit us. The JMS connection drops, the cron task continues its schedule, and messages accumulate in Maximo’s internal queue table (MAXINTMSGTRK) without being delivered. The only way to detect it is to query that table directly and check for a growing backlog of messages in SEND status.

2. The XSL transformation that swallows errors

MIF uses XSL transformations to reshape Maximo’s object structure XML into whatever format the receiving system expects. A null value in an unexpected field, a namespace mismatch, or a character encoding issue can cause the transformation to produce empty output. The message is marked as processed — because the transformation ran — but the receiving system gets an empty or malformed payload and silently discards it.

I’ve seen this happen when an equipment description contained an ampersand that wasn’t XML-escaped. The transformation failed on that one record, produced an empty document, and the receiving system’s import process skipped it with a “no records to process” log entry. Nobody connected the two.

3. The Object Structure that excludes your data

Maximo Object Structures define which fields are included in integration messages. If someone modifies the Object Structure — adds a new field, changes a relationship, or adjusts a WHERE clause on a sub-object — the outbound messages may start excluding data that downstream systems depend on. This is particularly insidious because the messages still flow; they’re just missing fields.

In one case, a Maximo upgrade changed the default Object Structure for work orders. The upgraded structure excluded a custom field that stored equipment operating hours at the time of work order completion. The data warehouse continued receiving work orders, but the operating hours column was suddenly all nulls. It took two months for someone to connect the missing data to the upgrade.

What monitoring actually looks like

The fix isn’t in Maximo’s configuration. It’s in building external monitoring that doesn’t trust Maximo to report its own failures.

Message queue depth monitoring. Query MAXINTMSGTRK for messages in SEND status older than your expected processing interval. If messages are sitting for longer than 15 minutes and your cron task runs every 5 minutes, something is wrong. I write a Python script that runs on a schedule and hits Maximo’s REST API to check this. If the count exceeds a threshold, it fires an alert.

End-to-end record count reconciliation. Compare the count of records modified in Maximo in the last 24 hours against the count of records received by the target system. This catches every failure mode — silent disconnects, transformation errors, and Object Structure changes — because it validates the outcome, not the mechanism.

Heartbeat messages. Configure a lightweight integration that sends a small test message every hour. If the receiving system doesn’t get a heartbeat for two consecutive intervals, alert. This is crude but catches the JMS disconnect scenario immediately rather than waiting for someone to notice missing data.

The lesson for data engineers

If you’re building a data platform that ingests from Maximo, don’t trust the integration to tell you it’s working. Enterprise systems are designed for their primary users — in Maximo’s case, maintenance planners and technicians. Integration is a secondary concern, and the error handling reflects that.

Build your monitoring on the receiving side. Count records. Compare timestamps. Alert on gaps. The six-week silent failure wasn’t a Maximo bug — it was a monitoring gap. The system behaved exactly as designed. It just wasn’t designed to tell anyone when the integration stopped working.