mydatapreprocessing.helpers package

Helper functions that are used across all library.

It’s made mostly for internal use, but finally added to public API as it may be helpful.

mydatapreprocessing.helpers.get_copy_or_view(data: DataFrameOrArrayGeneric, inplace: bool) → DataFrameOrArrayGeneric[source]

As DataFrame copy function needs to be casted for correct type hints this helps to solve it.

Parameters:
  • data (DataFrameOrArrayGeneric) – Input data
  • inplace (bool) – Whether to return copy or not.
Returns:

Copy or original data.

Return type:

DataFrameOrArrayGeneric

Example

>>> a = np.array([1, 2, 3])
>>> b = get_copy_or_view(a, inplace=True)
>>> id(a) == id(b)
True
>>> b = get_copy_or_view(a, inplace=False)
>>> id(a) == id(b)
False
mydatapreprocessing.helpers.check_column_in_df(df: pd.DataFrame, name_or_index: PandasIndex, source: None | str = None) → None[source]

If defined column is not in DataFrame, it raise Error.

Parameters:
  • df (pd.DataFrame) – Input data.
  • name_or_index (PandasIndex) – Integer index, name or pandas.Index.
  • source (str, optional) – In raised message wanted column can be referenced. Defaults to None.
Raises:

KeyError – If column not found in DataFrame.

Example

>>> df = pd.DataFrame([[1, 2, 3]], columns=["a", "b", "c"])
>>> check_column_in_df(df, "a")
>>> check_column_in_df(df, "z")
Traceback (most recent call last):
KeyError...
mydatapreprocessing.helpers.get_column_name(df: pd.DataFrame, index: PandasIndex) → str | pd.Index[source]

Return index that can be used to access column directly.

In user input the column can be defined by name or by it’s index. Then selecting the column has different syntax. It’s verified whether column is available. If it’s integer index, it’s converted to string so the syntax is always the same.

Parameters:
  • df (pd.DataFrame) – Input data
  • index (PandasIndex) – Also integer index.

Example

>>> df = pd.DataFrame([[1, 2, 3]], columns=["a", "b", "c"])
>>> get_column_name(df, "b")
'b'
>>> get_column_name(df, 2)
'c'
>>> get_column_name(df, "z")
Traceback (most recent call last):
KeyError...
mydatapreprocessing.helpers.check_not_empty(data: DataFrameOrArrayGeneric)[source]

Check whether there are data. It can happen that in some functions empty data would result error.

Parameters:data (DataFrameOrArrayGeneric) – Data
Raises:TypeError – If data.size == 0