mydatapreprocessing.consolidation.consolidation_config.subconfigurations package¶
Subconfigs subpackage.
-
class
mydatapreprocessing.consolidation.consolidation_config.subconfigurations.
Datetime
(frozen=None, *a, **kw)[source]¶ Bases:
mypythontools.config.config_internal.Config
Define whether to set datetime index.
-
datetime_column
¶ Name or index of datetime column that will be set as index and converted to datetime.
Type
PandasIndex | None
Default
None
If None, then no column will be set as index.
-
on_set_datetime_error
¶ Define what happens if converting to datetime fails.
Type
Literal[“ignore”, “raise”]
Default
“ignore”
-
-
class
mydatapreprocessing.consolidation.consolidation_config.subconfigurations.
RemoveMissingValues
(frozen=None, *a, **kw)[source]¶ Bases:
mypythontools.config.config_internal.Config
Remove NaN values.
-
remove_all_column_with_nans_threshold
¶ Delete all the column based on amount of NaN values.
Type
None | Numeric
Default
0.85
From 0 to 1. Require that many non-nan numeric values to not be deleted. E.G if value is 0.9 with column with 10 values, 90% must be numeric that implies max 1 np.nan can be presented, otherwise column will be deleted.
-
remove_nans_type
¶ Remove rows where NaN or replace rest nan values.
Type
None | Literal[“interpolate”, “mean”, “neighbor”, “remove”] | Any
Default
“interpolate”
If None, NaN are not removed. If you want to replace with concrete value, use float or int type.
-
-
class
mydatapreprocessing.consolidation.consolidation_config.subconfigurations.
Resample
(frozen=None, *a, **kw)[source]¶ Bases:
mypythontools.config.config_internal.Config
Change the sampling frequency.
-
resample
¶ Frequency of resampled data.
Type
None | Literal[“S”, “min”, “H”, “M”, “Y”] | str
Default
None
If None, then data are not resampled.
-
resample_function
¶ Define whether resampled values are sum of values or it’s average.
Type
Literal[“sum”, “mean”]
Default
“sum”
-
-
class
mydatapreprocessing.consolidation.consolidation_config.subconfigurations.
StringsToNumeric
(frozen=None, *a, **kw)[source]¶ Bases:
mypythontools.config.config_internal.Config
Remove or replace string values with numeric.
-
cast_str_to_numeric
¶ Try to convert strings to numeric.
Type
bool
Default
True
Errors will be ignored, so if column cannot be converted to numeric, it’s untouched.
-
embedding
¶ Implement categorical encoding.
Type
None | Literal[“label”, “one-hot”]
Default
“label”
Create numbers from strings. ‘label’ give each category (unique string) concrete number. Result will have the same number of columns. ‘one-hot’ create for every category new column. Only columns, where are strings repeating (unique_threshold) will be used.
-
only_numeric
¶ Remove all non numeric values.
Type
bool
Default
True
If True, all the non numeric columns will be dropped. ‘cast_str_to_numeric’ and ‘embedding’ are used before dropping columns.
-
unique_threshold
¶ Remove string columns, that have to many categories.
Type
Numeric
Default
0.6
E.g 0.9 define, that if column contain more that 90% of NOT unique values it’s deleted. Min is 0, max is 1. It will remove ids, hashes etc.
-