_filter_funcs
Row-level filters and data cleaning helpers.
remove_known_errors()
phenofhy._filter_funcs.remove_known_errors(df, *, clinic_ranges=None)Apply known error-removal helpers when required columns exist.
Parameters
df: pandas.DataFrame
Input dataframe. clinic_ranges: dict | None
Optional mapping of clinic column ranges.
Returns
out: pandas.DataFrame
Filtered dataframe with known error rows removed.
Example
from phenofhy import _filter_funcs
cleaned = _filter_funcs.remove_known_errors(df)apply_row_filters()
phenofhy._filter_funcs.apply_row_filters(df, *, ranges=None, exprs=None, inclusive="both", keep_na=False, ignore_missing_range_cols=False)Apply range- and expression-based row filters.
Parameters
df: pandas.DataFrame
Input dataframe. ranges: dict | None
Mapping of column -> (low, high) bounds. exprs: list[str] | None
Optional list of pandas eval() expressions to AND into the mask. inclusive: str
Bound inclusion for between(). keep_na: bool
If True, retain rows with NA in range columns. ignore_missing_range_cols: bool
If True, skip missing range columns.
Returns
out: pandas.DataFrame
Filtered dataframe.
Example
from phenofhy import _filter_funcs
filtered = _filter_funcs.apply_row_filters(
df,
ranges={"clinic_measurements.weight": (30, 200)},
keep_na=True,
)floor_age_series()
phenofhy._filter_funcs.floor_age_series(s)Floor ages to integer years, treating negatives as missing.
Parameters
s: pandas.Series
Input series of ages.
Returns
out: pandas.Series
Nullable Int64 series of floored ages.
Example
from phenofhy import _filter_funcs
floored = _filter_funcs.floor_age_series(df["derived.age_at_registration"])filter_preferred_nonresponse()
phenofhy._filter_funcs.filter_preferred_nonresponse(df)Remove rows with preferred non-response values in key demographics.
Parameters
df: pandas.DataFrame
Input dataframe.
Returns
out: pandas.DataFrame
Filtered dataframe with non-substantive responses removed.
Example
from phenofhy import _filter_funcs
cleaned = _filter_funcs.filter_preferred_nonresponse(df)