RESOLVED: Users with Cloud Foundry applications are experiencing problems with starting/staging applications
  • IBM Cloud Platform
  • Dallas
  • INC0318659
  • Description

    SERVICES/COMPONENTS AFFECTED:
      - Cloud Foundry Application Management

    IMPACT:
      - Application provisioning
      - Application re-staging
      - Application state changes (stop/start)
      - Running applications are failing to restart after crashing
      - Services dashboards may experience issues too

    STATUS:
     
    - 2018-10-09 10:25 UTC - INVESTIGATING - The operations team is aware of the issues and is currently investigating.

      - 2018-10-09 20:05 UTC - INVESTIGATING - The operations team continues to investigate current application staging behavior to identify and resolve the issues.

      - 2018-10-09 23:44 UTC - INVESTIGATING - We are continuing to investigate and work toward resolution.  Remediation actions continue to be evaluated including actions taken to improve network bandwidth and address performance

      - 2018-10-10 00:03 UTC - MITIGATING - Remediation actions continue to be actively worked by the team.  At this time, provisioning of applications are improving and we are monitoring for continued progress.

      - 2018-10-10 01:36 UTC - MONITORING - Issues with Cloud Foundry applications in the region have been addressed.  We are monitoring the system closely to ensure ongoing stability.

      - 2018-10-10 04:00 UTC - INVESTIGATING - Evidence of recurrence of staging issues have been identified and investigation is underway.

      - 2018-10-10 07:10 UTC - INVESTIGATING - The operations team is continuing investigation and working through mitigation analysis and actions.

      - 2018-10-10 09:27 UTC - INVESTIGATING - Team members continue active investigation and applying remediation actions to address current issues.

      - 2018-10-10 12:03 UTC - MITIGATING - Remediation actions continue to be applied.  The system is showing steady improvement and the team is continuing to actively address the environment.

      - 2018-10-10 15:11 UTC - MITIGATING - We are aggressively working on remediation actions;  during this activity there are fluctuations in the success rates for applications including new application staging and existing application crashes and restarts.  Our efforts to improve application management continue.

      - 2018-10-10 21:19 UTC - MITIGATING - Our operations team is in the process of rolling out additional changes to address the root cause which should completely mitigate the issue. This process will make continous improvements, but the complete rollout will take some time. An estimate for when this will be complete is not currently available.

      - 2018-10-11 00:15 UTC - MITIGATING - The Operations team is continuing to make progress on returning the environment to a healthy state. While the estimated completion time cannot be provided at this time, the team is confident on the fixes that have been identified and are being deployed. We’re using all available resources to achieve the best possible recovery. 

      - 2018-10-11 03:00 UTC - MITIGATING - Operations team members continue to progress with mitigation actions and working toward resolution.  An estimated end time cannot be provided; however, strategies continue to be evaluated and applied to provide stability and improve the environment.

      - 2018-10-11 06:15 UTC - MITIGATING - The operations team is making progress towards the goal of a completely healthy environment. The team continues to focus on the actions that are required at this time;  we are tracking that process, but are not yet able to forecast when this activity will be complete.

      - 2018-10-11 10:15 UTC - MITIGATING - There is continued progress towards a healthy environment being made by our operations team. Our focus is solely on the actions that have been identified as critical in the recovery process. This is not a short duration task, but one with incremental improvements.  There is no forecast for completion at this time.

      - 2018-10-11 12:18 UTC - MITIGATING - Our operations team identified a new performance improvement in the platform. Changes are being deployed to the system to improve the health and stability of the region. Initial monitoring shows positive impact, however, we will continue to monitor the system very closely.

      - 2018-10-11 15:22 UTC - MONITORING - After making the performance improvement changes, our operations team confirmed that many applications recovered very quickly. A number of applications remain that are still in the process of recovering and we continue to observe their progress. Provisioning of new applications is also increasingly successful but with slow performance.

      - 2018-10-11 19:32 UTC - RESOLVED - Our operations team has observed that the platform has returned to normal operational ranges and all known issues appear to be resolved. If you continue to see problems, please open a support case.