mydatapreprocessing.consolidation.consolidation_config package

Consolidation config and subconfig classes.

mydatapreprocessing.consolidation.consolidation_config.default_consolidation_config

Default config, that you can use. You can use intellisense with help tooltip to see what you can setup there or you can use update method for bulk configuration.

Type:mydatapreprocessing.consolidation.consolidation_config.ConsolidationConfig
class mydatapreprocessing.consolidation.consolidation_config.ConsolidationConfig(frozen=None, *a, **kw)[source]

Bases: mypythontools.config.config_internal.Config

Config class for consolidate_data pipeline.

There is default_consolidation_config object already created. You can import it, edit and use. Static type check and intellisense should work.

check_shape_and_transform

Check whether correct shape is used and eventually transpose.

Type

bool

Default

True

Usually there is much more rows than columns in table. If not, it can mean that dimensions are swapped from data load. This will check this, transform if necessary and log it.

data_length

Limit the data length after resampling.

Type

int

Default

0

If 0, then all the data is used.

datetime = None

Set datetime index and convert it to datetime type.

dtype

Set output dtype.

Type

str | np.dtype | pd.Series | list[str | np.dtype]

Default

“float32”

For possible inputs check pandas function astype.

first_column

Move defined column on index 0.

Type

None | PandasIndex

Default

None

inplace

Define whether work on inserted data itself, or on a copy.

Type

bool

Default

False

Copy is created just once, then internally all the consolidating functions are used inplace. Syntax is a bit different than in for example Pandas. Use assigning to variable e.g. df = consolidate_data(df) even with inplace. If True your inserted data will be changed.

remove_missing_values = None

Define whether and how to remove NotANumber values.

resample = None

Change sampling frequency on defined frequency if there is a datetime column. You can use sum or average.

strings_to_numeric = None

Remove or replace string values with numbers.