_derive_funcs
Core derivation helpers used by process.
expand_multi_code_column()
phenofhy._derive_funcs.expand_multi_code_column(df, codings_glob="./metadata/*.codings.csv",
coding_name="MEDICAT_1_M", col="questionnaire.medicat_1_m",
prefix="derived.medicates_", exclude_codes=(-7, -1, -3),
abbrev_map=MEDICAT_ABBREV, inplace=True)Expand a multi-code column into binary indicator columns.
Parameters
df: pandas.DataFrame
Input dataframe containing the multi-code column.
codings_glob: str
Glob pattern for codings CSV files.
coding_name: str
Coding name to filter within the codings file.
col: str
Multi-code column to expand.
prefix: str
Prefix for derived indicator columns.
exclude_codes: Sequence[int]
Codes to exclude from expansion.
abbrev_map: dict | None
Optional mapping for abbreviating column names.
inplace: bool
If True, mutate the input dataframe.
Returns
mapping: pandas.DataFrame
Mapping metadata for created columns.
self_reported_sex()
phenofhy._derive_funcs.self_reported_sex(df)Create derived.sex as numeric codes with categorical dtype.
Parameters
df: pandas.DataFrame
Input dataframe with demographic sex fields.
Returns
out: pandas.DataFrame
Dataframe with derived.sex added.
registration_date()
phenofhy._derive_funcs.registration_date(df)Derive derived.registration_date as a datetime column.
Parameters
df: pandas.DataFrame
Input dataframe.
Returns
out: pandas.DataFrame
Dataframe with derived.registration_date added or coerced.
age_at_registration()
phenofhy._derive_funcs.age_at_registration(df)Derive derived.age_at_registration as continuous age in years.
Parameters
df: pandas.DataFrame
Input dataframe with registration and birth date fields.
Returns
out: pandas.DataFrame
Dataframe with derived.age_at_registration added.
age_group()
phenofhy._derive_funcs.age_group(df, bins=None, labels=None)Derive derived.age_group from continuous age.
Parameters
df: pandas.DataFrame
Input dataframe. bins: list | None
Optional list of bin edges. labels: list | None
Optional list of labels for bins.
Returns
out: pandas.DataFrame
Dataframe with derived.age_group added.
bmi()
phenofhy._derive_funcs.bmi(df)Derive derived.bmi from height and weight.
Parameters
df: pandas.DataFrame
Input dataframe with clinic_measurements.height/weight.
Returns
out: pandas.DataFrame
Dataframe with derived.bmi added.
bmi_status()
phenofhy._derive_funcs.bmi_status(df)Derive derived.bmi_status as categorical BMI class codes.
Parameters
df: pandas.DataFrame
Input dataframe with derived.bmi.
Returns
out: pandas.DataFrame
Dataframe with derived.bmi_status added.
vape_status()
phenofhy._derive_funcs.vape_status(df, numeric=True, out_col="derived.vape_status")Derive vaping status as numeric codes with categorical dtype.
Parameters
df: pandas.DataFrame
Input dataframe with vaping-related questionnaire fields. numeric: bool
Unused; kept for compatibility. out_col: str
Output column name.
Returns
out: pandas.DataFrame
Dataframe with derived vaping status column.
smoke_status_v1()
phenofhy._derive_funcs.smoke_status_v1(df, numeric=True)Derive derived.smoke_status_v1 as categorical smoking status codes.
Parameters
df: pandas.DataFrame
Input dataframe with smoking questionnaire fields. numeric: bool
Unused; kept for compatibility.
Returns
out: pandas.DataFrame
Dataframe with derived.smoke_status_v1 added.
smoke_status_v2()
phenofhy._derive_funcs.smoke_status_v2(df, numeric=True)Derive derived.smoke_status_v2 as categorical smoking status codes.
Parameters
df: pandas.DataFrame
Input dataframe with smoking questionnaire fields. numeric: bool
Unused; kept for compatibility.
Returns
out: pandas.DataFrame
Dataframe with derived.smoke_status_v2 added.
medicat_expand()
phenofhy._derive_funcs.medicat_expand(df, codings_glob="./metadata/*.codings.csv",
coding_name="MEDICAT_1_M", col="questionnaire.medicat_1_m",
prefix="derived.medicates_", exclude_codes=(-7, -1, -3), abbrev_map=None)Expand medicat multi-code column into binary flags.
Parameters
df: pandas.DataFrame
Input dataframe. codings_glob: str
Glob pattern for codings CSV files. coding_name: str
Coding name to filter within the codings file. col: str
Multi-code column to expand. prefix: str
Prefix for derived indicator columns. exclude_codes: Sequence[int]
Codes to exclude from expansion. abbrev_map: dict | None
Optional mapping for abbreviating column names.
Returns
out: pandas.DataFrame
Dataframe with derived medication flags added.
any_hospital_contact()
phenofhy._derive_funcs.any_hospital_contact(data, merged=None, before_registration=False)Derive derived.any_hospital_contact as a binary flag.
Parameters
data: dict | pandas.DataFrame
Dict of entity DataFrames or a merged DataFrame. merged: pandas.DataFrame | None
Optional merged DataFrame to attach the derived column to. before_registration: bool
If True, count only pre-registration contacts.
Returns
out: pandas.DataFrame
Dataframe with derived.any_hospital_contact added.
ae_visits()
phenofhy._derive_funcs.ae_visits(data, merged=None, before_registration=False)Derive derived.ae_visits as count of A&E visits per participant.
Parameters
data: dict | pandas.DataFrame
Dict of entity DataFrames or a merged DataFrame. merged: pandas.DataFrame | None
Optional merged DataFrame to attach the derived column to. before_registration: bool
If True, count only pre-registration visits.
Returns
out: pandas.DataFrame
Dataframe with derived.ae_visits added.
apc_visits()
phenofhy._derive_funcs.apc_visits(data, merged=None, before_registration=False)Derive derived.apc_visits as count of inpatient admissions per participant.
Parameters
data: dict | pandas.DataFrame
Dict of entity DataFrames or a merged DataFrame. merged: pandas.DataFrame | None
Optional merged DataFrame to attach the derived column to. before_registration: bool
If True, count only pre-registration admissions.
Returns
out: pandas.DataFrame
Dataframe with derived.apc_visits added.
op_visits()
phenofhy._derive_funcs.op_visits(data, merged=None, before_registration=False)Derive derived.op_visits as count of outpatient visits per participant.
Parameters
data: dict | pandas.DataFrame
Dict of entity DataFrames or a merged DataFrame. merged: pandas.DataFrame | None
Optional merged DataFrame to attach the derived column to. before_registration: bool
If True, count only pre-registration visits.
Returns
out: pandas.DataFrame
Dataframe with derived.op_visits added.
total_hospital_contacts()
phenofhy._derive_funcs.total_hospital_contacts(data, merged=None, before_registration=False, winsorize_pct=0.99)Derive derived.total_hospital_contacts as total visit counts.
Parameters
data: dict | pandas.DataFrame
Dict of entity DataFrames or a merged DataFrame. merged: pandas.DataFrame | None
Optional merged DataFrame to attach the derived column to. before_registration: bool
If True, count only pre-registration events. winsorize_pct: float
Upper percentile to winsorize counts (0-1).
Returns
out: pandas.DataFrame
Dataframe with derived.total_hospital_contacts added.