Pipeline

The InstaWell pipeline is intentionally deterministic: every step reads numbered CSV files, writes its own numbered outputs, and logs to experiment.log. This page captures what each stage expects, produces, and how to rerun it safely.

Directory Layout

Running setup_experiment creates experiments/<name>/:

01_raw_organized_data.csv
02_filtered_organized_data.csv
03_averaged_data.csv
03_averaged_data_long.csv
04_bg_subtracted_data.csv
04_bg_subtracted_data_long.csv
05_min_max_scaled_data.csv
05_min_max_scaled_data_long.csv
06_derivative_data.csv
06_derivative_data_long.csv
07_min_temperatures.csv
08_curve_params.csv
08_curve_diagnostics.csv
experiment.json
experiment.log
experiment_info.json
plots/

Numbers correspond to the functions documented below.

Step-by-Step

#	Function	Description	Key Output
00	`setup_experiment`	Copies raw/layout CSVs into the experiment folder, stores parsing metadata (`experiment.json`), and configures logging. Custom separator, placeholder (`^`), condition order, and NPC marker all live here.	`experiment.json`
01	`ingest_data`	Melts the raw matrix into long form, expands condition strings via the configured separator, and records replicate metadata (`experiment_info.json`).	`01_raw_organized_data.csv`
02	`filter_wells`	Records wells to exclude. Must run even when the list is empty so downstream steps see `02_filtered_organized_data.csv`.	`02_filtered_organized_data.csv`, `filtered_wells.txt`
03	`average_across_replicates`	Groups measurements by `Temperature` + `unqcond`, outputs both wide and long tables. Maintains column order based on condition fields.	`03_averaged_data.csv`, `_long.csv`
04	`subtract_background`	Finds NPC traces per (ligand, buffer, concentration) key, subtracts from protein traces, drops NPC rows, and realigns the wide table.	`04_bg_subtracted_data.csv`, `_long.csv`
05	`min_max_scale`	Scales each `unqcond` trace to [0, 1] to make shape comparisons easier. Logs warnings for zero-variance traces.	`05_min_max_scaled_data.csv`, `_long.csv`
06	`calculate_derivative`	Computes `-d(value)/d(Temperature)` for each condition, rescales derivatives, and merges back condition metadata for long-form outputs.	`06_derivative_data.csv`, `_long.csv`
07	`find_min_temperature`	Picks the temperature at the minimum derivative for each `unqcond` (Tm-like point) and splits `unqcond` back into tidy columns.	`07_min_temperatures.csv`
08	`calculate_curve_params`	Fits Prism-style 4PLs (log10 domain) to Tm vs concentration per panel (ligand/buffer/protein), computes diagnostics, and stores CI metrics when covariance is available.	`08_curve_params.csv`, `08_curve_diagnostics.csv`

Curve fitting still stabilizing

Step 08 and any downstream visuals that rely on its outputs (e.g., min_temp_figure_generator(..., mode="log10_fit")) are still under active development. Inspect residuals and CSV diagnostics before trusting the fit parameters.

Rerunning

Steps are idempotent: rerunning a function overwrites its outputs but does not modify earlier steps. To reprocess a different separator or layout, re-run steps 00–08.

CSV Schema Highlights

unqcond – canonical string assembled from the configured condition fields using the separator (default |).
well_unqcond – well name + separator + unqcond, used for per-well traces.
Condition columns – automatically created from the layout parsing (e.g., concentration, ligand, protein, buffer). Additional layout fields are supported.
min_temperature – output of step 07; downstream 4PL fits use the numeric conc_num column (converted via utils.convert_concentration_to_float).

Troubleshooting

Layout parsing failures

Ensure the layout CSV uses the same separator and placeholder provided to setup_experiment.
Every non-empty cell in the layout (outside the Well column) must contain exactly as many components as condition_fields.
Empty wells should be filled with the automatically computed mask (e.g., ^|^|^|^).

The Dash layout validator reproduces the same parsing logic and surfaces helpful messages before pipeline execution.

Missing NPC traces

NPC rows are filtered out after background subtraction. If your downstream CSVs are missing certain conditions, confirm that:
- non_protein_control_marker matches the layout’s string (case-sensitive).
- Each ligand/buffer/concentration combination has an NPC trace; missing NPCs are left unchanged but logged as warnings.

Column order drift

Column order in the wide tables is reconstructed at each step. Custom condition field orders are supported, but the default ordering logic (sort by ligand → protein → buffer → concentration) only runs when the field tuple equals ("concentration","ligand","protein","buffer").

Figure Generators by Step

Stage	Figure	Notes
Raw / Filtered	`raw_figure_generator`	Colors by replicate well, one subplot per condition combo (except `series_by`).
Averaged / BG / Min-Max / Derivative	`processed_figure_generator(data_source=...)`	Builds discrete color maps from numeric concentration ordering.
Min Temperatures	`min_temp_figure_generator(mode="linear"\|"log1p"\|"log10_fit")`	`log10_fit` overlays the 4PL curve; weighting and color scales are configurable.

Use the widget wrappers (*_figures_widget) inside notebooks for quick browsing, or the Dash app to interactively explore each stage without writing code.