Skip to content

_derive_funcs

Core derivation helpers used by process.

expand_multi_code_column()

python
phenofhy._derive_funcs.expand_multi_code_column(df, codings_glob="./metadata/*.codings.csv",
	coding_name="MEDICAT_1_M", col="questionnaire.medicat_1_m",
	prefix="derived.medicates_", exclude_codes=(-7, -1, -3),
	abbrev_map=MEDICAT_ABBREV, inplace=True)

Expand a multi-code column into binary indicator columns.

Parameters

  df: pandas.DataFrame
    Input dataframe containing the multi-code column.
  codings_glob: str
    Glob pattern for codings CSV files.
  coding_name: str
    Coding name to filter within the codings file.
  col: str
    Multi-code column to expand.
  prefix: str
    Prefix for derived indicator columns.
  exclude_codes: Sequence[int]
    Codes to exclude from expansion.
  abbrev_map: dict | None
    Optional mapping for abbreviating column names.
  inplace: bool
    If True, mutate the input dataframe.

Returns

  mapping: pandas.DataFrame
    Mapping metadata for created columns.

self_reported_sex()

python
phenofhy._derive_funcs.self_reported_sex(df)

Create derived.sex as numeric codes with categorical dtype.

Parameters

  df: pandas.DataFrame
    Input dataframe with demographic sex fields.

Returns

  out: pandas.DataFrame
    Dataframe with derived.sex added.

registration_date()

python
phenofhy._derive_funcs.registration_date(df)

Derive derived.registration_date as a datetime column.

Parameters

  df: pandas.DataFrame
    Input dataframe.

Returns

  out: pandas.DataFrame
    Dataframe with derived.registration_date added or coerced.

age_at_registration()

python
phenofhy._derive_funcs.age_at_registration(df)

Derive derived.age_at_registration as continuous age in years.

Parameters

  df: pandas.DataFrame
    Input dataframe with registration and birth date fields.

Returns

  out: pandas.DataFrame
    Dataframe with derived.age_at_registration added.

age_group()

python
phenofhy._derive_funcs.age_group(df, bins=None, labels=None)

Derive derived.age_group from continuous age.

Parameters

  df: pandas.DataFrame
    Input dataframe.   bins: list | None
    Optional list of bin edges.   labels: list | None
    Optional list of labels for bins.

Returns

  out: pandas.DataFrame
    Dataframe with derived.age_group added.

bmi()

python
phenofhy._derive_funcs.bmi(df)

Derive derived.bmi from height and weight.

Parameters

  df: pandas.DataFrame
    Input dataframe with clinic_measurements.height/weight.

Returns

  out: pandas.DataFrame
    Dataframe with derived.bmi added.

bmi_status()

python
phenofhy._derive_funcs.bmi_status(df)

Derive derived.bmi_status as categorical BMI class codes.

Parameters

  df: pandas.DataFrame
    Input dataframe with derived.bmi.

Returns

  out: pandas.DataFrame
    Dataframe with derived.bmi_status added.

vape_status()

python
phenofhy._derive_funcs.vape_status(df, numeric=True, out_col="derived.vape_status")

Derive vaping status as numeric codes with categorical dtype.

Parameters

  df: pandas.DataFrame
    Input dataframe with vaping-related questionnaire fields.   numeric: bool
    Unused; kept for compatibility.   out_col: str
    Output column name.

Returns

  out: pandas.DataFrame
    Dataframe with derived vaping status column.

smoke_status_v1()

python
phenofhy._derive_funcs.smoke_status_v1(df, numeric=True)

Derive derived.smoke_status_v1 as categorical smoking status codes.

Parameters

  df: pandas.DataFrame
    Input dataframe with smoking questionnaire fields.   numeric: bool
    Unused; kept for compatibility.

Returns

  out: pandas.DataFrame
    Dataframe with derived.smoke_status_v1 added.

smoke_status_v2()

python
phenofhy._derive_funcs.smoke_status_v2(df, numeric=True)

Derive derived.smoke_status_v2 as categorical smoking status codes.

Parameters

  df: pandas.DataFrame
    Input dataframe with smoking questionnaire fields.   numeric: bool
    Unused; kept for compatibility.

Returns

  out: pandas.DataFrame
    Dataframe with derived.smoke_status_v2 added.

medicat_expand()

python
phenofhy._derive_funcs.medicat_expand(df, codings_glob="./metadata/*.codings.csv",
	coding_name="MEDICAT_1_M", col="questionnaire.medicat_1_m",
	prefix="derived.medicates_", exclude_codes=(-7, -1, -3), abbrev_map=None)

Expand medicat multi-code column into binary flags.

Parameters

  df: pandas.DataFrame
    Input dataframe.   codings_glob: str
    Glob pattern for codings CSV files.   coding_name: str
    Coding name to filter within the codings file.   col: str
    Multi-code column to expand.   prefix: str
    Prefix for derived indicator columns.   exclude_codes: Sequence[int]
    Codes to exclude from expansion.   abbrev_map: dict | None
    Optional mapping for abbreviating column names.

Returns

  out: pandas.DataFrame
    Dataframe with derived medication flags added.

any_hospital_contact()

python
phenofhy._derive_funcs.any_hospital_contact(data, merged=None, before_registration=False)

Derive derived.any_hospital_contact as a binary flag.

Parameters

  data: dict | pandas.DataFrame
    Dict of entity DataFrames or a merged DataFrame.   merged: pandas.DataFrame | None
    Optional merged DataFrame to attach the derived column to.   before_registration: bool
    If True, count only pre-registration contacts.

Returns

  out: pandas.DataFrame
    Dataframe with derived.any_hospital_contact added.

ae_visits()

python
phenofhy._derive_funcs.ae_visits(data, merged=None, before_registration=False)

Derive derived.ae_visits as count of A&E visits per participant.

Parameters

  data: dict | pandas.DataFrame
    Dict of entity DataFrames or a merged DataFrame.   merged: pandas.DataFrame | None
    Optional merged DataFrame to attach the derived column to.   before_registration: bool
    If True, count only pre-registration visits.

Returns

  out: pandas.DataFrame
    Dataframe with derived.ae_visits added.

apc_visits()

python
phenofhy._derive_funcs.apc_visits(data, merged=None, before_registration=False)

Derive derived.apc_visits as count of inpatient admissions per participant.

Parameters

  data: dict | pandas.DataFrame
    Dict of entity DataFrames or a merged DataFrame.   merged: pandas.DataFrame | None
    Optional merged DataFrame to attach the derived column to.   before_registration: bool
    If True, count only pre-registration admissions.

Returns

  out: pandas.DataFrame
    Dataframe with derived.apc_visits added.

op_visits()

python
phenofhy._derive_funcs.op_visits(data, merged=None, before_registration=False)

Derive derived.op_visits as count of outpatient visits per participant.

Parameters

  data: dict | pandas.DataFrame
    Dict of entity DataFrames or a merged DataFrame.   merged: pandas.DataFrame | None
    Optional merged DataFrame to attach the derived column to.   before_registration: bool
    If True, count only pre-registration visits.

Returns

  out: pandas.DataFrame
    Dataframe with derived.op_visits added.

total_hospital_contacts()

python
phenofhy._derive_funcs.total_hospital_contacts(data, merged=None, before_registration=False, winsorize_pct=0.99)

Derive derived.total_hospital_contacts as total visit counts.

Parameters

  data: dict | pandas.DataFrame
    Dict of entity DataFrames or a merged DataFrame.   merged: pandas.DataFrame | None
    Optional merged DataFrame to attach the derived column to.   before_registration: bool
    If True, count only pre-registration events.   winsorize_pct: float
    Upper percentile to winsorize counts (0-1).

Returns

  out: pandas.DataFrame
    Dataframe with derived.total_hospital_contacts added.