Skip to content

Pipeline

Utilities for metadata dictionaries and phenotype list processing.

metadata()

python
phenofhy.pipeline.metadata()

Load metadata dictionary files into DataFrames.

Returns

  out: dict[str, pandas.DataFrame]
    Dictionary with codings, data_dictionary, and entity_dictionary.

Raises

  RuntimeError: Exception
    If a required file is missing or fails to load.
  FileNotFoundError: Exception
    If an expected file path does not exist.

run_preprocessing_pipeline()

python
phenofhy.pipeline.run_preprocessing_pipeline(fields, cohort_key="FULL_SAMPLE_ID",
	derive_participant=True, derive_questionnaire=True,
	derive_questionnaire_mode="auto", derive_clinic=True)

Extract and preprocess data with automatic field derivation.

This high-level function orchestrates the full phenotype preprocessing pipeline: extracting fields from DNAnexus, then applying entity-specific derivations.

Parameters

  fields: list[str] | dict[str, str]
    List of "entity.field" strings or dict of entity->field mappings.
  cohort_key: str
    Config key for the cohort dataset ID. Default: "FULL_SAMPLE_ID".
  derive_participant: bool
    Whether to derive participant fields (age groups, etc.). Default: True.
  derive_questionnaire: bool
    Whether to derive questionnaire fields. Default: True.
  derive_questionnaire_mode: "all" | "auto"
    Derivation mode for questionnaire. Default: "auto".
  derive_clinic: bool
    Whether to derive clinic measurement fields (BMI, etc.). Default: True.

Returns

  out: pandas.DataFrame
    Processed DataFrame ready for analysis.

Raises

  RuntimeError: Exception
    If extraction or processing fails.
  ValueError: Exception
    If fields format is unsupported.

Example

python
from phenofhy import pipeline

# Extract and preprocess specific fields
df = pipeline.run_preprocessing_pipeline(
    fields=["participant.birth_year", "participant.birth_month"],
    derive_participant=True,
    derive_questionnaire=True,
)

field_list()

python
phenofhy.pipeline.field_list(fields=None, output_file=None, fields_list_name=None,
	input_file=None, input_file_name=None)

Build a merged metadata table from a phenotype list.

Parameters

  fields: list[str] | dict[str, str] | str | None
    List of "entity.field", dict of entity->field, or path/ID.
  output_file: str | None
    If provided, write CSV to this path and return None.
  fields_list_name: str | None
    Optional filename when downloading a direct file ID.
  input_file: any
    Backward-compatible alias for fields.
  input_file_name: str | None
    Backward-compatible alias for fields_list_name.

Returns

  out: pandas.DataFrame | None
    Merged metadata table, or None if output_file is provided.

Example

python
from phenofhy import pipeline

meta = pipeline.metadata()
fields = pipeline.field_list(fields=["participant.birth_year", "participant.birth_month"])