Skip to content

_filter_funcs

Row-level filters and data cleaning helpers.

remove_known_errors()

python
phenofhy._filter_funcs.remove_known_errors(df, *, clinic_ranges=None)

Apply known error-removal helpers when required columns exist.

Parameters

  df: pandas.DataFrame
    Input dataframe.   clinic_ranges: dict | None
    Optional mapping of clinic column ranges.

Returns

  out: pandas.DataFrame
    Filtered dataframe with known error rows removed.

Example

python
from phenofhy import _filter_funcs

cleaned = _filter_funcs.remove_known_errors(df)

apply_row_filters()

python
phenofhy._filter_funcs.apply_row_filters(df, *, ranges=None, exprs=None, inclusive="both", keep_na=False, ignore_missing_range_cols=False)

Apply range- and expression-based row filters.

Parameters

  df: pandas.DataFrame
    Input dataframe.   ranges: dict | None
    Mapping of column -> (low, high) bounds.   exprs: list[str] | None
    Optional list of pandas eval() expressions to AND into the mask.   inclusive: str
    Bound inclusion for between().   keep_na: bool
    If True, retain rows with NA in range columns.   ignore_missing_range_cols: bool
    If True, skip missing range columns.

Returns

  out: pandas.DataFrame
    Filtered dataframe.

Example

python
from phenofhy import _filter_funcs

filtered = _filter_funcs.apply_row_filters(
	df,
	ranges={"clinic_measurements.weight": (30, 200)},
	keep_na=True,
)

floor_age_series()

python
phenofhy._filter_funcs.floor_age_series(s)

Floor ages to integer years, treating negatives as missing.

Parameters

  s: pandas.Series
    Input series of ages.

Returns

  out: pandas.Series
    Nullable Int64 series of floored ages.

Example

python
from phenofhy import _filter_funcs

floored = _filter_funcs.floor_age_series(df["derived.age_at_registration"])

filter_preferred_nonresponse()

python
phenofhy._filter_funcs.filter_preferred_nonresponse(df)

Remove rows with preferred non-response values in key demographics.

Parameters

  df: pandas.DataFrame
    Input dataframe.

Returns

  out: pandas.DataFrame
    Filtered dataframe with non-substantive responses removed.

Example

python
from phenofhy import _filter_funcs

cleaned = _filter_funcs.filter_preferred_nonresponse(df)