Pipeline
Utilities for metadata dictionaries and phenotype list processing.
metadata()
phenofhy.pipeline.metadata()Load metadata dictionary files into DataFrames.
Returns
out: dict[str, pandas.DataFrame]
Dictionary with codings, data_dictionary, and entity_dictionary.
Raises
RuntimeError: Exception
If a required file is missing or fails to load.
FileNotFoundError: Exception
If an expected file path does not exist.
run_preprocessing_pipeline()
phenofhy.pipeline.run_preprocessing_pipeline(fields, cohort_key="FULL_SAMPLE_ID",
derive_participant=True, derive_questionnaire=True,
derive_questionnaire_mode="auto", derive_clinic=True)Extract and preprocess data with automatic field derivation.
This high-level function orchestrates the full phenotype preprocessing pipeline: extracting fields from DNAnexus, then applying entity-specific derivations.
Parameters
fields: list[str] | dict[str, str]
List of "entity.field" strings or dict of entity->field mappings.
cohort_key: str
Config key for the cohort dataset ID. Default: "FULL_SAMPLE_ID".
derive_participant: bool
Whether to derive participant fields (age groups, etc.). Default: True.
derive_questionnaire: bool
Whether to derive questionnaire fields. Default: True.
derive_questionnaire_mode: "all" | "auto"
Derivation mode for questionnaire. Default: "auto".
derive_clinic: bool
Whether to derive clinic measurement fields (BMI, etc.). Default: True.
Returns
out: pandas.DataFrame
Processed DataFrame ready for analysis.
Raises
RuntimeError: Exception
If extraction or processing fails.
ValueError: Exception
If fields format is unsupported.
Example
from phenofhy import pipeline
# Extract and preprocess specific fields
df = pipeline.run_preprocessing_pipeline(
fields=["participant.birth_year", "participant.birth_month"],
derive_participant=True,
derive_questionnaire=True,
)field_list()
phenofhy.pipeline.field_list(fields=None, output_file=None, fields_list_name=None,
input_file=None, input_file_name=None)Build a merged metadata table from a phenotype list.
Parameters
fields: list[str] | dict[str, str] | str | None
List of "entity.field", dict of entity->field, or path/ID.
output_file: str | None
If provided, write CSV to this path and return None.
fields_list_name: str | None
Optional filename when downloading a direct file ID.
input_file: any
Backward-compatible alias for fields.
input_file_name: str | None
Backward-compatible alias for fields_list_name.
Returns
out: pandas.DataFrame | None
Merged metadata table, or None if output_file is provided.
Example
from phenofhy import pipeline
meta = pipeline.metadata()
fields = pipeline.field_list(fields=["participant.birth_year", "participant.birth_month"])