[IBM-Aspera] - [Aoc API] - Service Disruption
Incident Report for IBM-Aspera Service Status
Resolved
We are working with our cloud provider on an RCA. The issue is now resolved.
Posted Jun 12, 2023 - 13:03 PDT
Monitoring
We no longer see the network issue affecting us. We are monitoring and evaluating the situation. We are currently catching up on processing transfer events.
Posted Jun 12, 2023 - 12:34 PDT
Investigating
We are working with our cloud provider at the moment for help diagnosing this network issue.
Posted Jun 12, 2023 - 11:11 PDT
Update
The issue we've been working on has us seeing various network related problems within 2 separate kubernetes clusters that host critical services. We do not observe any measurable packet loss, but we do see REST API requests between internal services within the cluster, as well as external requests being made to services running on the cluster, time out, seemingly randomly. This can not be associated with any recent change done by the Aspera on Cloud team.

We have replaced our worker node pool and have attempted to migrate workloads to that new pool. We have metrics that indicate the situation has improved, but it doesn't seem completely mitigated yet.

Customers right now might see timeouts when using the Aspera on Cloud frontend webapp or the API. In particular the Activity and Automation app will have delays
Posted Jun 12, 2023 - 09:18 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jun 12, 2023 - 07:18 PDT
Investigating
Our engineering team is investigating an issue affecting the AoC API. There are problems ingesting events at the moment
Posted Jun 12, 2023 - 01:21 PDT
This incident affected: IBM-Aspera API Services (api.ibmaspera.com).