Quickstart
Phenofhy helps you extract, process, and summarize OFH phenotype data inside the TRE.
Prerequisites
- You are running inside the OFH TRE with the
dxtoolkit available. config.jsonis available in/mnt/project/helpers(Phenofhy uses it to resolve files).- If you have not uploaded Phenofhy to the TRE yet, see the Installation page.
Download Phenofhy into your JupyterLab session
Before running the workflow, download the Phenofhy package into your JupyterLab Spark instance using dx. Replace project_name with your TRE project name:
python
!dx download "project_name:/applets/phenofhy/" -rThis assumes Phenofhy was uploaded under the applets folder and contains the beta source code (i.e., Python modules).
Minimal workflow
python
from phenofhy import extract, process, calculate, profile, utils
# 1) Extract a small set of fields
extract.fields(
output_file="outputs/raw/phenos.csv",
fields=[
"participant.registration_year",
"participant.registration_month",
"participant.birth_year",
"participant.birth_month",
"participant.demog_sex_2_1",
"questionnaire.smoke_status_2_1",
],
)
# 2) Process participant data (derives age, sex, age_group)
df = process.participant_fields("outputs/raw/phenos.csv")
# 3) Summaries
summary = calculate.summary(
df,
traits=["derived.age_at_registration", "derived.sex"],
stratify="derived.sex",
)
# 4) Profile report
report = profile.phenotype_profile(
df,
phenotype="derived.age_at_registration",
output="outputs/reports/age_profile.pdf",
)
# 5) Upload your results
report = utils.upload_files(
files="outputs/reports/age_profile.pdf",
dx_target="results"
)Next steps
- Use
calculate.prevalence()for prevalence tables. - Use
icd.match_icd_traits()to match ICD codes to traits. - Use
process.questionnaire_fields()orprocess.clinic_measurements_fields()to work by entity.