hossam.util¶

hossam.util ¶

hs_make_normalize_values ¶

hs_make_normalize_values(mean, std, size=100, round=2)

정규분포를 따르는 데이터를 생성한다.

Parameters:

Name	Type	Description	Default
`mean`	`float`	평균	required
`std`	`float`	표준편차	required
`size`	`int`	데이터 크기. Defaults to 100.	`100`
`round`	`int`	소수점 반올림 자리수. Defaults to 2.	`2`

Returns:

Type	Description
`ndarray`	np.ndarray: 정규분포를 따르는 데이터

Examples:

>>> from hossam.util import hs_make_normalize_values
>>> x = hs_make_normalize_values(mean=0.0, std=1.0, size=100)
>>> x.shape
(100,)

Source code in hossam/util.py

def hs_make_normalize_values(
    mean: float, std: float, size: int = 100, round: int = 2
) -> np.ndarray:
    """정규분포를 따르는 데이터를 생성한다.

    Args:
        mean (float): 평균
        std (float): 표준편차
        size (int, optional): 데이터 크기. Defaults to 100.
        round (int, optional): 소수점 반올림 자리수. Defaults to 2.

    Returns:
        np.ndarray: 정규분포를 따르는 데이터

    Examples:
        >>> from hossam.util import hs_make_normalize_values
        >>> x = hs_make_normalize_values(mean=0.0, std=1.0, size=100)
        >>> x.shape
        (100,)
    """
    p = 0.0
    x: np.ndarray = np.array([])
    attempts = 0
    max_attempts = 100  # 무한 루프 방지
    while p < 0.05 and attempts < max_attempts:
        x = np.random.normal(mean, std, size).round(round)
        _, p = normaltest(x)
        attempts += 1

    return x

hs_make_normalize_data ¶

hs_make_normalize_data(
    means=None, stds=None, sizes=None, rounds=2
)

정규분포를 따르는 데이터프레임을 생성한다.

Parameters:

Name	Type	Description	Default
`means`	`list`	평균 목록. Defaults to [0, 0, 0].	`None`
`stds`	`list`	표준편차 목록. Defaults to [1, 1, 1].	`None`
`sizes`	`list`	데이터 크기 목록. Defaults to [100, 100, 100].	`None`
`rounds`	`int`	반올림 자리수. Defaults to 2.	`2`

Returns:

Name	Type	Description
`DataFrame`	`DataFrame`	정규분포를 따르는 데이터프레임

Source code in hossam/util.py

def hs_make_normalize_data(
    means: list | None = None,
    stds: list | None = None,
    sizes: list | None = None,
    rounds: int = 2,
) -> DataFrame:
    """정규분포를 따르는 데이터프레임을 생성한다.

    Args:
        means (list, optional): 평균 목록. Defaults to [0, 0, 0].
        stds (list, optional): 표준편차 목록. Defaults to [1, 1, 1].
        sizes (list, optional): 데이터 크기 목록. Defaults to [100, 100, 100].
        rounds (int, optional): 반올림 자리수. Defaults to 2.

    Returns:
        DataFrame: 정규분포를 따르는 데이터프레임
    """
    means = means if means is not None else [0, 0, 0]
    stds = stds if stds is not None else [1, 1, 1]
    sizes = sizes if sizes is not None else [100, 100, 100]

    if not (len(means) == len(stds) == len(sizes)):
        raise ValueError("means, stds, sizes 길이는 동일해야 합니다.")

    data = {}
    for i in range(len(means)):
        data[f"X{i+1}"] = hs_make_normalize_values(
            means[i], stds[i], sizes[i], rounds
        )

    return DataFrame(data)

hs_pretty_table ¶

hs_pretty_table(data, tablefmt='simple', headers='keys')

tabulate를 사용해 DataFrame을 단순 표 형태로 출력한다.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	출력할 데이터프레임	required
`tablefmt`	`str`	`tabulate` 테이블 포맷. Defaults to "simple".	`'simple'`
`headers`	`str \| list`	헤더 지정 방식. Defaults to "keys".	`'keys'`

Returns:

Type	Description
`None`	None

Examples:

>>> from hossam.util import hs_pretty_table
>>> from pandas import DataFrame
>>> hs_pretty_table(DataFrame({"a":[1,2],"b":[3,4]}))

Source code in hossam/util.py

def hs_pretty_table(data: DataFrame, tablefmt="simple", headers: str = "keys") -> None:
    """`tabulate`를 사용해 DataFrame을 단순 표 형태로 출력한다.

    Args:
        data (DataFrame): 출력할 데이터프레임
        tablefmt (str, optional): `tabulate` 테이블 포맷. Defaults to "simple".
        headers (str | list, optional): 헤더 지정 방식. Defaults to "keys".

    Returns:
        None

    Examples:
        >>> from hossam.util import hs_pretty_table
        >>> from pandas import DataFrame
        >>> hs_pretty_table(DataFrame({"a":[1,2],"b":[3,4]}))
    """

    tabulate.WIDE_CHARS_MODE = False
    print(
        tabulate(
            data, headers=headers, tablefmt=tablefmt, showindex=True, numalign="right"
        )
    )

hs_load_data ¶

hs_load_data(
    key,
    index_col=None,
    timeindex=False,
    info=True,
    categories=None,
    local=None,
)

데이터 키를 통해 데이터를 로드한 뒤 기본 전처리/출력을 수행한다.

Parameters:

Name	Type	Description	Default
`key`	`str`	데이터 키 (metadata.json에 정의된 데이터 식별자)	required
`index_col`	`str`	인덱스로 설정할 컬럼명. Defaults to None.	`None`
`timeindex`	`bool`	True일 경우 인덱스를 시계열(DatetimeIndex)로 설정한다. Defaults to False.	`False`
`info`	`bool`	True일 경우 데이터 정보(head, tail, 기술통계, 카테고리 정보)를 출력한다. Defaults to True.	`True`
`categories`	`list`	카테고리 dtype으로 설정할 컬럼명 목록. Defaults to None.	`None`
`local`	`str`	원격 데이터 대신 로컬 메타데이터 경로를 사용한다. Defaults to None.	`None`

Returns:

Name	Type	Description
`DataFrame`	`DataFrame`	전처리(인덱스 설정, 카테고리 변환)가 완료된 데이터프레임

Examples:

>>> from hossam.util import hs_load_data
>>> df = hs_load_data("AD_SALES", index_col=None, timeindex=False, info=False)
>>> isinstance(df.columns, object)
True

Source code in hossam/util.py

def hs_load_data(key: str,
                index_col: str = None,
                timeindex: bool = False,
                info: bool = True,
                categories: list = None,
                local: str = None) -> DataFrame:
    """데이터 키를 통해 데이터를 로드한 뒤 기본 전처리/출력을 수행한다.

    Args:
        key (str): 데이터 키 (metadata.json에 정의된 데이터 식별자)
        index_col (str, optional): 인덱스로 설정할 컬럼명. Defaults to None.
        timeindex (bool, optional): True일 경우 인덱스를 시계열(DatetimeIndex)로 설정한다. Defaults to False.
        info (bool, optional): True일 경우 데이터 정보(head, tail, 기술통계, 카테고리 정보)를 출력한다. Defaults to True.
        categories (list, optional): 카테고리 dtype으로 설정할 컬럼명 목록. Defaults to None.
        local (str, optional): 원격 데이터 대신 로컬 메타데이터 경로를 사용한다. Defaults to None.

    Returns:
        DataFrame: 전처리(인덱스 설정, 카테고리 변환)가 완료된 데이터프레임

    Examples:
        >>> from hossam.util import hs_load_data
        >>> df = hs_load_data("AD_SALES", index_col=None, timeindex=False, info=False)
        >>> isinstance(df.columns, object)
        True
    """

    k = key.lower()

    if k.endswith(".xlsx"):
        origin = read_excel(key)
    elif k.endswith(".csv"):
        origin = read_csv(key)
    else:
        origin = load_data(key, local)

    return __data_info(origin, index_col, timeindex, info, categories)