mydatapreprocessing.consolidation.consolidation_config.subconfigurations package

Subconfigs subpackage.

class mydatapreprocessing.consolidation.consolidation_config.subconfigurations.Datetime(frozen=None, *a, **kw)[source]

Bases: mypythontools.config.config_internal.Config

Define whether to set datetime index.

datetime_column

Name or index of datetime column that will be set as index and converted to datetime.

Type

PandasIndex | None

Default

None

If None, then no column will be set as index.

on_set_datetime_error

Define what happens if converting to datetime fails.

Type

Literal[“ignore”, “raise”]

Default

“ignore”

class mydatapreprocessing.consolidation.consolidation_config.subconfigurations.RemoveMissingValues(frozen=None, *a, **kw)[source]

Bases: mypythontools.config.config_internal.Config

Remove NaN values.

remove_all_column_with_nans_threshold

Delete all the column based on amount of NaN values.

Type

None | Numeric

Default

0.85

From 0 to 1. Require that many non-nan numeric values to not be deleted. E.G if value is 0.9 with column with 10 values, 90% must be numeric that implies max 1 np.nan can be presented, otherwise column will be deleted.

remove_nans_type

Remove rows where NaN or replace rest nan values.

Type

None | Literal[“interpolate”, “mean”, “neighbor”, “remove”] | Any

Default

“interpolate”

If None, NaN are not removed. If you want to replace with concrete value, use float or int type.

class mydatapreprocessing.consolidation.consolidation_config.subconfigurations.Resample(frozen=None, *a, **kw)[source]

Bases: mypythontools.config.config_internal.Config

Change the sampling frequency.

resample

Frequency of resampled data.

Type

None | Literal[“S”, “min”, “H”, “M”, “Y”] | str

Default

None

If None, then data are not resampled.

resample_function

Define whether resampled values are sum of values or it’s average.

Type

Literal[“sum”, “mean”]

Default

“sum”

class mydatapreprocessing.consolidation.consolidation_config.subconfigurations.StringsToNumeric(frozen=None, *a, **kw)[source]

Bases: mypythontools.config.config_internal.Config

Remove or replace string values with numeric.

cast_str_to_numeric

Try to convert strings to numeric.

Type

bool

Default

True

Errors will be ignored, so if column cannot be converted to numeric, it’s untouched.

embedding

Implement categorical encoding.

Type

None | Literal[“label”, “one-hot”]

Default

“label”

Create numbers from strings. ‘label’ give each category (unique string) concrete number. Result will have the same number of columns. ‘one-hot’ create for every category new column. Only columns, where are strings repeating (unique_threshold) will be used.

only_numeric

Remove all non numeric values.

Type

bool

Default

True

If True, all the non numeric columns will be dropped. ‘cast_str_to_numeric’ and ‘embedding’ are used before dropping columns.

unique_threshold

Remove string columns, that have to many categories.

Type

Numeric

Default

0.6

E.g 0.9 define, that if column contain more that 90% of NOT unique values it’s deleted. Min is 0, max is 1. It will remove ids, hashes etc.