Skip to content

Quickstart

Phenofhy helps you extract, process, and summarize OFH phenotype data inside the TRE.

Prerequisites

  • You are running inside the OFH TRE with the dx toolkit available.
  • config.json is available in /mnt/project/helpers (Phenofhy uses it to resolve files).
  • If you have not uploaded Phenofhy to the TRE yet, see the Installation page.

Download Phenofhy into your JupyterLab session

Before running the workflow, download the Phenofhy package into your JupyterLab Spark instance using dx. Replace project_name with your TRE project name:

python
!dx download "project_name:/applets/phenofhy/" -r

This assumes Phenofhy was uploaded under the applets folder and contains the beta source code (i.e., Python modules).

Minimal workflow

python
from phenofhy import extract, process, calculate, profile, utils

# 1) Extract a small set of fields
extract.fields(
    output_file="outputs/raw/phenos.csv",
    fields=[
        "participant.registration_year",
        "participant.registration_month",
        "participant.birth_year",
        "participant.birth_month",
        "participant.demog_sex_2_1",
        "questionnaire.smoke_status_2_1",
    ],
)

# 2) Process participant data (derives age, sex, age_group)
df = process.participant_fields("outputs/raw/phenos.csv")

# 3) Summaries
summary = calculate.summary(
    df,
    traits=["derived.age_at_registration", "derived.sex"],
    stratify="derived.sex",
)

# 4) Profile report
report = profile.phenotype_profile(
    df,
    phenotype="derived.age_at_registration",
    output="outputs/reports/age_profile.pdf",
)

# 5) Upload your results
report = utils.upload_files(
   files="outputs/reports/age_profile.pdf",
   dx_target="results"
)

Next steps

  • Use calculate.prevalence() for prevalence tables.
  • Use icd.match_icd_traits() to match ICD codes to traits.
  • Use process.questionnaire_fields() or process.clinic_measurements_fields() to work by entity.