Sanity check on a new Recipe

darren.lynch · May 12, 2021, 3:41pm

Hi guys, I wanted to get a quick sanity check before I go too far. I would describe the following as a test although it will be part of my eventual recipe. I was wondering if you would be so kind as to give me a sanity check before I go too far. Seems to be working as expected at least from a python POV.

    from os import listdir
    import fnmatch
    import pandas as pd
    import numpy as np
    import click
    
    
    
    def retun_CSV(directory):
        files = listdir(directory)
        
        csv = []
        
        for file in files:
            if fnmatch.fnmatch(file, '*.csv'):
                csv.append(file)
        return csv
    
    def read_directions(files, _results_dir, referenceSpeed):
        i=0
        labels = []
        for file in files:
            table = pd.read_csv(_results_dir + '\\' + file)
            #columns = table.columns
            if i ==0:
                points = table[['Points:0', 'Points:1', 'Points:2']]
                
            u = table['Velocities_n:0'].to_numpy()
            v = table['Velocities_n:1'].to_numpy()
            U = np.sqrt(u**2+v**2).reshape(-1,1)/referenceSpeed
            direction = (270+np.degrees(np.arctan2(v,u))).reshape(-1,1)
            label = float(file.replace('deg0.csv', '').replace('Results_', ''))
            labels.append(label)
            if i == 0:
                Speed = U
                Direction = direction
            else:
                Speed = np.concatenate((Speed, U), axis=1)
                Direction = np.concatenate((Direction, direction), axis=1)  
            
            i+=1
        
        point_labels = ['X', 'Y', 'Z']
        points.columns = point_labels
        
        Direction = pd.DataFrame(data=Direction, columns=labels).reset_index(drop=True)
        Speed = pd.DataFrame(data=Speed, columns=labels).reset_index(drop=True)
        
        return Speed, Direction, points
    
    def toFile(df, path, fileType):
        try:
            if fileType == 'CSV':
                df.to_csv(path)
            elif fileType == 'feather':
                df.columns = df.columns.astype(str)
                df.to_feather(path)
            return True
        except:
            return False
    
    @click.command()
    @click.option('--filetype', default='feather', help='file format, feather is recomended')
    @click.option('--refspeed', default=10, help='the speed at reference height, usually 10m/s at 10m')
    @click.option('--path', default='16 Direction Results', help='the location at which the directional CFD data is located in csv flies from the vtk export')       
    def formatCFD(filetype, refspeed, path):
    
        #_results_dir = r'16 Direction Results'
        _results_dir = path
        csv = retun_CSV(_results_dir)
        
        local_speed, local_direction, points = read_directions(csv, _results_dir, refspeed)
        
        #local_speed.to_feather('localSpeedFactor.feather')
        toFile(local_speed, 'results\localSpeedFactor.feather', filetype)
        toFile(local_direction, 'results\localDirection.feather', filetype)
        toFile(points, 'results\points.feather', filetype)
        
    if __name__ == '__main__':
        formatCFD()

And this is my entry.py:

    from dataclasses import dataclass
    from pollination_dsl.function import Function, command, Inputs, Outputs
    
    @dataclass
    class formatCFD(Function):
        """
        A function to take CFD data from individual csv exports from vtk to a single 
        file for local speed up factor and direction. Where the number of columns 
        will be number of directions, and number of rows will be the number of points 
        or sensors. This will also create points file with the XYZ coordinates of the
        sensors
        
        """
        
        path = Inputs.file(
            description='A directory containing the csv data',
            path='16 Direction Results'
        )
        
        fileType = Inputs.str(
            description='The file formate, CSV or feather. feather is recomended as it writes faster, and is very compact',
            default = 'feather',
            spec={'type': 'string'}
        )
        
        refSpeed = Inputs.float(
            description='A value with units m/s, usually 10m/s, at the reference height of 10m',
            default=10.0,
            spec={'type': 'float',}
        )
        
        @command
        def formatCFD(self):
            return 'formatCFD.py --fileType --refSpeed --path > results\localSpeedFactor.feather results\localDirection.feather'
        
        speedFactor = Outputs.file(
            description='Output files for wind speedup.',
            path='results\localSpeedFactor.feather'
        )
        
        direction = Outputs.file(
            description='Output files for wind speedup.',
            path='results\localDirection.feather'
        )

Thanks, Darren

mostapha · May 12, 2021, 4:59pm

This is a great start @dlynch! I’ll have a close look and will let you know my comments.

mostapha · May 12, 2021, 9:23pm

Hi @dlynch, The code looks good. I made some small changes to make the code more cross-platform. There are some styling that can be improved but that’s not important at this point.

import pathlib
import pandas as pd
import numpy as np
import click


def return_CSV(directory):
    """Return csv files in a directory."""
    return list(pathlib.Path(directory).glob('*.csv'))


def read_directions(files, _results_dir, referenceSpeed):
    i = 0
    labels = []
    for file in files:
        table = pd.read_csv(pathlib.Path(_results_dir, file).as_posix())
        # columns = table.columns
        if i == 0:
            points = table[["Points:0", "Points:1", "Points:2"]]

        u = table["Velocities_n:0"].to_numpy()
        v = table["Velocities_n:1"].to_numpy()
        U = np.sqrt(u ** 2 + v ** 2).reshape(-1, 1) / referenceSpeed
        direction = (270 + np.degrees(np.arctan2(v, u))).reshape(-1, 1)
        label = float(file.replace("deg0.csv", "").replace("Results_", ""))
        labels.append(label)
        if i == 0:
            Speed = U
            Direction = direction
        else:
            Speed = np.concatenate((Speed, U), axis=1)
            Direction = np.concatenate((Direction, direction), axis=1)

        i += 1

    point_labels = ["X", "Y", "Z"]
    points.columns = point_labels

    Direction = pd.DataFrame(data=Direction, columns=labels).reset_index(drop=True)
    Speed = pd.DataFrame(data=Speed, columns=labels).reset_index(drop=True)

    return Speed, Direction, points


def toFile(df, path, fileType):
    try:
        if fileType == "CSV":
            df.to_csv(path)
        elif fileType == "feather":
            df.columns = df.columns.astype(str)
            df.to_feather(path)
        return True
    except:
        return False


@click.command()
@click.option(
    "--filetype", default="feather", help="file format, feather is recommended"
)
@click.option(
    "--refspeed", default=10, help="the speed at reference height, usually 10m/s at 10m"
)
@click.option(
    "--path",
    default="16 Direction Results",
    help="the location at which the directional CFD data is located in csv flies from the vtk export",
)
def formatCFD(filetype, refspeed, path):

    # _results_dir = r'16 Direction Results'
    _results_dir = path
    csv = return_CSV(_results_dir)

    local_speed, local_direction, points = read_directions(csv, _results_dir, refspeed)

    # local_speed.to_feather('localSpeedFactor.feather')
    toFile(local_speed, "results/localSpeedFactor.feather", filetype)
    toFile(local_direction, "results/localDirection.feather", filetype)
    toFile(points, "results/points.feather", filetype)


if __name__ == "__main__":
    formatCFD()

entry.py looks pretty good. I had to make few changes to your code. Most of the changes are in how you should template the command. Here is the list:

I changed the style for inputs and outputs. What you have should be fine but this is more in line with how we name inputs and outputs.
I set the path for the path input to a folder with no whitespace. This is just to make it easier when it comes to write the command. Keep in mind that path is where the input folder will be copied to. In this case the input folder will be copied to input_folder.
I changed the type for path to folder instead of file since the input is a directory.
I added enum to spec for file_type input to make the validation more robust.
I added python3 at the start of the command so it can execute the formatCFD.py
I templated the command based on inputs. For the folder I directly used the path. For the other two inputs which are not files or folders I templated the command using {{}}. The changes are in italic.

python3 formatCFD.py --fileType {{self.file_type}} --refSpeed {{self.ref_speed}} --path input_folder
Finally I removed the stdout. From what I can see formatCFD.py generates does folders directly. There is no output directed to stdout.

from dataclasses import dataclass
from pollination_dsl.function import Function, command, Inputs, Outputs


@dataclass
class formatCFD(Function):
    """
    A function to take CFD data from individual csv exports from vtk to a single
    file for local speed up factor and direction. Where the number of columns
    will be number of directions, and number of rows will be the number of points
    or sensors. This will also create points file with the XYZ coordinates of the
    sensors.
    """

    path = Inputs.folder(
        description="A directory containing the csv data", path="input_folder"
    )

    file_type = Inputs.str(
        description="The file formate, CSV or feather. feather is recommended as it "
        "writes faster, and is very compact",
        default="feather",
        spec={"type": "string", "enum":["feather", "CSV"]}
    )

    ref_speed = Inputs.float(
        description="A value with units m/s, usually 10m/s, at the reference height of 10m",
        default=10.0,
        spec={
            "type": "float", "minimum": 0
        }
    )

    @command
    def formatCFD(self):
        return "python3 formatCFD.py --fileType {{self.file_type}} --refSpeed {{self.ref_speed}} --path input_folder"

    speed_factor = Outputs.file(
        description="Output files for wind speedup.",
        path="results/localSpeedFactor.feather",
    )

    direction = Outputs.file(
        description="Output files for wind speedup.",
        path="results/localDirection.feather",
    )

Hope it helps! Let me know if you have any other questions.

darren.lynch · May 13, 2021, 9:32am

Thank you very much @Mostapha I will study the changes, at the moment I have not 100% understood point 7 but I’ll try to get my head around it.

Much appreciated, Darren

darren.lynch · May 13, 2021, 10:05am

Just for completeness, formatCFD.py didn’t run straight away, mainly to do with the new path being passed, since it was a better way of doing it I just changed a few lines to accept it. Creating the table, the path in ‘file’ already had its directory listed therefore no need to include it. Also I was using the file name as the header for columns so I changed this instead to be file.parts[1] to reference just the file name again. Just in case this of interest to future readers.

    import pathlib
    import pandas as pd
    import numpy as np
    import click
    
    
    def return_CSV(directory):
        """Return csv files in a directory."""
        return list(pathlib.Path(directory).glob('*.csv'))
    
    
    def read_directions(files, _results_dir, referenceSpeed):
        global file
        i = 0
        labels = []
        for file in files:
            #table = pd.read_csv(pathlib.Path(_results_dir, file).as_posix())
            table = pd.read_csv(pathlib.Path(file).as_posix())
            # columns = table.columns
            if i == 0:
                points = table[["Points:0", "Points:1", "Points:2"]]
    
            u = table["Velocities_n:0"].to_numpy()
            v = table["Velocities_n:1"].to_numpy()
            U = np.sqrt(u ** 2 + v ** 2).reshape(-1, 1) / referenceSpeed
            direction = (270 + np.degrees(np.arctan2(v, u))).reshape(-1, 1)
            label = float(file.parts[1].replace("deg0.csv", "").replace("Results_", ""))
            labels.append(label)
            if i == 0:
                Speed = U
                Direction = direction
            else:
                Speed = np.concatenate((Speed, U), axis=1)
                Direction = np.concatenate((Direction, direction), axis=1)
    
            i += 1
    
        point_labels = ["X", "Y", "Z"]
        points.columns = point_labels
    
        Direction = pd.DataFrame(data=Direction, columns=labels).reset_index(drop=True)
        Speed = pd.DataFrame(data=Speed, columns=labels).reset_index(drop=True)
    
        return Speed, Direction, points
    
    
    def toFile(df, path, fileType):
        try:
            if fileType == "CSV":
                df.to_csv(path)
            elif fileType == "feather":
                df.columns = df.columns.astype(str)
                df.to_feather(path)
            return True
        except:
            return False
    
    
    @click.command()
    @click.option(
        "--filetype", default="feather", help="file format, feather is recommended"
    )
    @click.option(
        "--refspeed", default=10, help="the speed at reference height, usually 10m/s at 10m"
    )
    @click.option(
        "--path",
        default="16 Direction Results",
        help="the location at which the directional CFD data is located in csv flies from the vtk export",
    )
    def formatCFD(filetype, refspeed, path):
    
        # _results_dir = r'16 Direction Results'
        _results_dir = path
        csv = return_CSV(_results_dir)
    
        local_speed, local_direction, points = read_directions(csv, _results_dir, refspeed)
    
        # local_speed.to_feather('localSpeedFactor.feather')
        toFile(local_speed, "results/localSpeedFactor.feather", filetype)
        toFile(local_direction, "results/localDirection.feather", filetype)
        toFile(points, "results/points.feather", filetype)
    
    
    if __name__ == "__main__":
        formatCFD()

mostapha · May 13, 2021, 1:48pm

Hi @dlynch, looks good. Should we start one/two repositories for this recipe and collaborate there? That should make it easier to review code and also send PRs.

darren.lynch · May 13, 2021, 1:51pm

Hi @Mostapha sure, that works for me, I am currently pushing to an internal repository, not sure how/if I can make it public. If this makes the most sense, I can sync with @adammer to see how we can best do this?

mostapha · May 13, 2021, 1:54pm

Sounds good! If you can add me to the repo I can start helping you there. I saw in the other post that you referenced gitlab - I think I have an account there too but if it is somewhere on GitHub it will be far easier for me. I’m sure they will be very similar but using GitHub means I can start helping you right away!

Also let me know if we if it will be helpful to set up a call next week to review the overall process.

darren.lynch · May 13, 2021, 2:03pm

Ok then let me sync internally and I’ll either add you, or create a github repository and add you there. Also I am sure it would be most helpful to organize a call for next week, and equally great to meet you, I will pop something in tomorrow for the following week if that’s ok? Any preferences?

Best,
Darren

darren.lynch · May 14, 2021, 10:07am

@Mostapha just so you know we are working on migrating to Github, hopefully I can invite you shortly

adammer · May 14, 2021, 1:28pm

Try now @Mostapha ! Took me a bit to make git-lfs work, but now it should be fine You should have gotten an invite.