I needed probes where the output was tiny, a few tokens at most, and where scoring was objective and deterministic. No judge model in the loop. That’s what led me to the final two probes:
Is it intentional that the note is duplicated?
。业内人士推荐wps作为进阶阅读
Россиян научили законно сдавать в аренду ипотечные квартиры14:44
PostgreSQL writes this to the WAL (write-ahead log).
Context engineering is about curating what the model sees. Harness engineering is about building the entire environment, tooling, and feedback loops that let agents do reliable work without you intervening. Give the agent the feedback loop, not just the editor.