Pipeline
The InstaWell pipeline is intentionally deterministic: every step reads numbered CSV files, writes its own numbered outputs, and logs to experiment.log. This page captures what each stage expects, produces, and how to rerun it safely.
Directory Layout
Running setup_experiment creates experiments/<name>/:
01_raw_organized_data.csv
02_filtered_organized_data.csv
03_averaged_data.csv
03_averaged_data_long.csv
04_bg_subtracted_data.csv
04_bg_subtracted_data_long.csv
05_min_max_scaled_data.csv
05_min_max_scaled_data_long.csv
06_derivative_data.csv
06_derivative_data_long.csv
07_min_temperatures.csv
08_curve_params.csv
08_curve_diagnostics.csv
experiment.json
experiment.log
experiment_info.json
plots/
Numbers correspond to the functions documented below.
Step-by-Step
| # | Function | Description | Key Output |
|---|---|---|---|
| 00 | setup_experiment |
Copies raw/layout CSVs into the experiment folder, stores parsing metadata (experiment.json), and configures logging. Custom separator, placeholder (^), condition order, and NPC marker all live here. |
experiment.json |
| 01 | ingest_data |
Melts the raw matrix into long form, expands condition strings via the configured separator, and records replicate metadata (experiment_info.json). |
01_raw_organized_data.csv |
| 02 | filter_wells |
Records wells to exclude. Must run even when the list is empty so downstream steps see 02_filtered_organized_data.csv. |
02_filtered_organized_data.csv, filtered_wells.txt |
| 03 | average_across_replicates |
Groups measurements by Temperature + unqcond, outputs both wide and long tables. Maintains column order based on condition fields. |
03_averaged_data.csv, _long.csv |
| 04 | subtract_background |
Finds NPC traces per (ligand, buffer, concentration) key, subtracts from protein traces, drops NPC rows, and realigns the wide table. | 04_bg_subtracted_data.csv, _long.csv |
| 05 | min_max_scale |
Scales each unqcond trace to [0, 1] to make shape comparisons easier. Logs warnings for zero-variance traces. |
05_min_max_scaled_data.csv, _long.csv |
| 06 | calculate_derivative |
Computes -d(value)/d(Temperature) for each condition, rescales derivatives, and merges back condition metadata for long-form outputs. |
06_derivative_data.csv, _long.csv |
| 07 | find_min_temperature |
Picks the temperature at the minimum derivative for each unqcond (Tm-like point) and splits unqcond back into tidy columns. |
07_min_temperatures.csv |
| 08 | calculate_curve_params |
Fits Prism-style 4PLs (log10 domain) to Tm vs concentration per panel (ligand/buffer/protein), computes diagnostics, and stores CI metrics when covariance is available. | 08_curve_params.csv, 08_curve_diagnostics.csv |
Step 08 and any downstream visuals that rely on its outputs (e.g., min_temp_figure_generator(..., mode="log10_fit")) are still under active development. Inspect residuals and CSV diagnostics before trusting the fit parameters.
Steps are idempotent: rerunning a function overwrites its outputs but does not modify earlier steps. To reprocess a different separator or layout, re-run steps 00–08.
CSV Schema Highlights
unqcond– canonical string assembled from the configured condition fields using the separator (default|).well_unqcond– well name + separator +unqcond, used for per-well traces.- Condition columns – automatically created from the layout parsing (e.g.,
concentration,ligand,protein,buffer). Additional layout fields are supported. min_temperature– output of step 07; downstream 4PL fits use the numericconc_numcolumn (converted viautils.convert_concentration_to_float).
Troubleshooting
Layout parsing failures
- Ensure the layout CSV uses the same separator and placeholder provided to
setup_experiment. - Every non-empty cell in the layout (outside the
Wellcolumn) must contain exactly as many components ascondition_fields. - Empty wells should be filled with the automatically computed mask (e.g.,
^|^|^|^).
The Dash layout validator reproduces the same parsing logic and surfaces helpful messages before pipeline execution.
Missing NPC traces
- NPC rows are filtered out after background subtraction. If your downstream CSVs are missing certain conditions, confirm that:
non_protein_control_markermatches the layout’s string (case-sensitive).- Each ligand/buffer/concentration combination has an NPC trace; missing NPCs are left unchanged but logged as warnings.
Column order drift
Column order in the wide tables is reconstructed at each step. Custom condition field orders are supported, but the default ordering logic (sort by ligand → protein → buffer → concentration) only runs when the field tuple equals ("concentration","ligand","protein","buffer").
Figure Generators by Step
| Stage | Figure | Notes |
|---|---|---|
| Raw / Filtered | raw_figure_generator |
Colors by replicate well, one subplot per condition combo (except series_by). |
| Averaged / BG / Min-Max / Derivative | processed_figure_generator(data_source=...) |
Builds discrete color maps from numeric concentration ordering. |
| Min Temperatures | min_temp_figure_generator(mode="linear"|"log1p"|"log10_fit") |
log10_fit overlays the 4PL curve; weighting and color scales are configurable. |
Use the widget wrappers (*_figures_widget) inside notebooks for quick browsing, or the Dash app to interactively explore each stage without writing code.