Error when downloading the results of a large study

prateekwahi · October 31, 2023, 6:16pm

Hi Mostapha !!! The script you shared worked terrific—Thank you for that. I did make some changes to it to suit my workflow. I tried the script with 100 simulations, and it worked fine. However, I faced an issue downloading SQLs as an output from 200 simulated samples. I got this error :

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-70b78be3ed2e> in <module>
     54     check_study_status(study=study)
     55 
---> 56     download_study_results(
     57         api_client=api_client, study=study, output_folder=results_folder
     58     )

<ipython-input-3-3d56e1ffb414> in download_study_results(api_client, study, output_folder)
     65     study_id = study.id
     66 
---> 67     _download_results(
     68         owner=owner, project=project, study_id=study_id, download_folder=output_folder,
     69         api_client=api_client

<ipython-input-3-3d56e1ffb414> in _download_results(owner, project, study_id, download_folder, api_client, page)
     25     response = requests.get(url, params=params, headers=api_client.headers)
     26     response_dict = response.json()
---> 27     runs = response_dict['resources']
     28     temp_dir = tempfile.TemporaryDirectory()
     29     # with tempfile.TemporaryDirectory() as temp_dir:

KeyError: 'resources'

Any idea as to why this happened? Thanks again for your help.

mostapha · November 1, 2023, 3:25pm

Hi @prateekwahi, this might have been a glitch in your Internet connection. The size of the study shouldn’t affect the request that has failed in your code. I see that you have run larger studies after running this set. Were you able to recreate the issue?

prateekwahi · November 1, 2023, 3:59pm

Hi @mostapha !!
Sorry, I couldn’t update on it. Yes, I was able to solve it. I initially changed to download 1000 results per page, which worked for downloading 100 Sql files but was throwing errors for 200 iterations. What solved this was to keep downloading 25 outputs per page but with a while loop to read through all pages.

Here is the code

def _download_results(
    owner: str, project: str, study_id: int, download_folder: pathlib.Path,
    api_client: ApiClient):
    """
    This function downloads results from Pollination.

    Args:
    owner (str): The owner of the project.
    project (str): The project name.
    study_id (int): The ID of the study.
    download_folder (pathlib.Path): The path where results should be downloaded.
    api_client (ApiClient): The API client object.
   
   """
    per_page = 25
    page = 1
    while True:
        print(f'Downloading page {page}')
        url = f'https://api.pollination.cloud/projects/{owner}/{project}/runs'
        params = {
            'job_id': study_id,
            'status': 'Succeeded',
            'page': page,
            'per-page': per_page
        }
        response = requests.get(url, params=params, headers=api_client.headers)
        response_dict = response.json()
        runs = response_dict['resources']
        if not runs:
            break  # Exit the loop if no more runs
        temp_dir = tempfile.TemporaryDirectory()
        if temp_dir:
            temp_folder = pathlib.Path(temp_dir.name)
            for run in runs:
                run_id = run['id']
                input_id = [
                    inp['value']
                    for inp in run['status']['inputs'] if inp['name'] == 'model_id'
                ][0]
                run_folder = temp_folder.joinpath(input_id)
                sql_file = run_folder.joinpath('eplusout.sql')
                out_file = download_folder.joinpath(f'{input_id}.sql')
                print(f'downloading {input_id}.json to {out_file.as_posix()}')
                run_folder.mkdir(parents=True, exist_ok=True)
                download_folder.mkdir(parents=True, exist_ok=True)
                url = f'https://api.pollination.cloud/projects/{owner}/{project}/runs/{run_id}/outputs/sql'
                signed_url = requests.get(url, headers=api_client.headers)
                output = api_client.download_artifact(signed_url=signed_url.json())
                with zipfile.ZipFile(output) as zip_folder:
                    zip_folder.extractall(run_folder.as_posix())
                shutil.copy(sql_file.as_posix(), out_file.as_posix())
        page += 1  # Go to the next page

As always, thanks again for your support.