A few scenarios fail and cannot be retried in multiple scenario runs

First, thanks in advance to the great pollination forum!
Thanks for the kindness and hard work of the developers!

I have run a number of studies with most scenarios, some of which have a few failures, such as the example below.

授粉云应用 (pollination.cloud)

授粉云应用 (pollination.cloud)

授粉云应用 (pollination.cloud)

授粉云应用 (pollination.cloud)

I could run a separate study for these failures, but it’s a bit of a pain to collect data with Colibri at the end, so I’d like to retry the failures in this study, but my display here shows that retry is greyed out and unclickable.
I would like to ask if it is possible to retry it and what to do.

1 Like

Hi @ziyudexiushisheji3,

Thank you for the kind words and thank you for documenting the issues. This is a rare case that we yet have to resolve on our end. In your case, it happened for 6 runs out of about 4000 runs that we had to delete the whole workflow, and because of that, you are not able to re-try it from the interface.

I don’t want to bore you with the technicalities of the matter but we are taking action on two fronts:

  1. We are working with Pipekit team to ensure this case doesn’t happen on Argo. This needs to change to the technology that we use to schedule the studies on Kubernetes which Pollination is using. This might take some time to be fixed.

  2. Improve our handling of the deleted cases and allow recreating them using the retry button as a new run in the previous job. @antoinedao is looking into this one.

Meanwhile, unfortunately, the only solution is to re-run these deleted workflows separately.

1 Like

Hi again, @ziyudexiushisheji3,

I had a closer look at one of the studies that you shared and the issue with that particular study is that those runs haven’t been scheduled in the first place. If you see the message it says “Failed to schedule run”.

The reason for this error is that the input files are not uploaded to the project. This can happen for studies when many files are being uploaded to the server. I can see that in this case, the GCS is the issue that their service wasn’t available and that files the files haven’t been uploaded.

@mingbo, do we have a check to ensure the artifacts are uploaded and retry in case the server returns an error?

1 Like