Process
Entity-aware processing, derivations, and data cleaning.
participant_fields()
phenofhy.process.participant_fields(input_data, *, derive="auto", derive_registry=None,
coalesce_rules=None, auto_row_filters=True, age_col="derived.age_at_registration",
min_age=18, max_age=110, floor_age=True, age_group_bins=None,
age_group_labels=None, extra_ranges=None, extra_exprs=None,
keep_na_in_ranges=False)Process participant entity fields with optional derives and filters.
Parameters
input_data: str | pandas.DataFrame
File path or DataFrame input.
derive: bool | list[str] | Literal["all", "auto"]
Derivation selection policy.
derive_registry: dict | None
Optional custom derive registry.
coalesce_rules: dict | None
Optional coalesce rule overrides.
auto_row_filters: bool
Whether to apply age-based filtering.
age_col: str
Age column for filtering.
min_age: int
Minimum age inclusive.
max_age: int
Maximum age inclusive.
floor_age: bool
Whether to floor age values before deriving.
age_group_bins: list[float] | None
Optional custom age bins.
age_group_labels: list[str] | None
Optional custom age labels.
extra_ranges: dict | None
Optional extra numeric ranges for filtering.
extra_exprs: list[str] | None
Optional query expressions for filtering.
keep_na_in_ranges: bool
Whether to keep NA rows during range filters.
Returns
out: pandas.DataFrame
Processed DataFrame.
questionnaire_fields()
phenofhy.process.questionnaire_fields(input_data, *, derive="auto",
derive_registry=None, coalesce_rules=None, auto_row_filters=False,
age_col="derived.age_at_registration", min_age=18, max_age=110,
floor_age=True, extra_ranges=None, extra_exprs=None, keep_na_in_ranges=False)Process questionnaire entity fields with optional derives.
Parameters
input_data: str | pandas.DataFrame
File path or DataFrame input.
derive: bool | list[str] | Literal["all", "auto"]
Derivation selection policy.
derive_registry: dict | None
Optional custom derive registry.
coalesce_rules: dict | None
Optional coalesce rule overrides.
auto_row_filters: bool
Whether to apply age-based filtering.
age_col: str
Age column for filtering.
min_age: int
Minimum age inclusive.
max_age: int
Maximum age inclusive.
floor_age: bool
Whether to floor age values before deriving.
extra_ranges: dict | None
Optional extra numeric ranges for filtering.
extra_exprs: list[str] | None
Optional query expressions for filtering.
keep_na_in_ranges: bool
Whether to keep NA rows during range filters.
Returns
out: pandas.DataFrame
Processed DataFrame.
clinic_measurements_fields()
phenofhy.process.clinic_measurements_fields(input_data, *, derive="auto",
derive_registry=None, coalesce_rules=None, auto_row_filters=False,
age_col="derived.age_at_registration", min_age=18, max_age=110,
floor_age=True, extra_ranges=None, extra_exprs=None, keep_na_in_ranges=False)Process clinic measurements fields with optional BMI derives.
Parameters
input_data: str | pandas.DataFrame
File path or DataFrame input.
derive: bool | list[str] | Literal["all", "auto"]
Derivation selection policy.
derive_registry: dict | None
Optional custom derive registry.
coalesce_rules: dict | None
Optional coalesce rule overrides.
auto_row_filters: bool
Whether to apply age-based filtering.
age_col: str
Age column for filtering.
min_age: int
Minimum age inclusive.
max_age: int
Maximum age inclusive.
floor_age: bool
Whether to floor age values before deriving.
extra_ranges: dict | None
Optional extra numeric ranges for filtering.
extra_exprs: list[str] | None
Optional query expressions for filtering.
keep_na_in_ranges: bool
Whether to keep NA rows during range filters.
Returns
out: pandas.DataFrame
Processed DataFrame.
get_dummies()
phenofhy.process.get_dummies(df, codings_glob="./metadata/*.codings.csv",
coding_name="MEDICAT_1_M", col="questionnaire.medicat_1_m",
prefix="derived.medicates_", exclude_codes=(-7, -1, -3), user_map=None,
inplace=True)Expand a coded multi-select column into dummy variables.
Parameters
df: pandas.DataFrame
Input dataframe to modify.
codings_glob: str
Glob path to codings CSV files.
coding_name: str
Coding name to match in codings CSVs.
col: str
Column containing coded values.
prefix: str
Prefix for generated dummy columns.
exclude_codes: tuple[int, ...] | None
Codes to exclude from expansion.
user_map: dict | None
Optional code-to-label mapping overrides.
inplace: bool
If True, modify df in place.
Returns
out: pandas.DataFrame | tuple[pandas.DataFrame, pandas.DataFrame]
Updated dataframe (and mapping dataframe if returned by helper).
resolve_categoricals_and_labels()
phenofhy.process.resolve_categoricals_and_labels(df, traits, *, label_mode="labels",
codebook_csv=None, autodetect_coded_categoricals=True, autodetect_max_levels=10)Prepare a DataFrame and categorical trait list for summary.
Parameters
df: pandas.DataFrame
Input dataframe.
traits: Iterable[str]
Trait columns to evaluate.
label_mode: Literal["labels", "codes"]
Map codes to labels or reverse.
codebook_csv: str | None
Optional codings CSV path.
autodetect_coded_categoricals: bool
Whether to infer coded categoricals from numeric columns.
autodetect_max_levels: int
Max unique values to consider numeric as categorical.
Returns
out: tuple[pandas.DataFrame, list[str]]
Processed dataframe and categorical trait list.
Example
from phenofhy import process
df = process.participant_fields("outputs/raw/phenos.csv")
df2, cat_traits = process.resolve_categoricals_and_labels(df, traits=["derived.sex"])