6 Comments
User's avatar
Vishal Ahuja's avatar

Hi Viksit, can you please explain the following:

"In traditional systems, structure is a constraint. You define a flow upfront, and then hope your logic fits inside it. Differential programming flips that.

DSPy, for instance, watches what your program does, and builds the structure from that. Every call to a module is an edge. Every submodule is a node. The graph isn’t explicitly declared, its implicitly constructed through the core programming language’s own execution flow.

This means you’re not limited to static paths. You can branch on values, loop through reasoning steps, or conditionally call tools — and DSPy will still know what your agent did, and when.

And once the structure is traced, it becomes a target for learning and improves over time."

My understanding of DSPy is that optimizes all the prompts in your workflow (by just providing the input,output pairs - not the intermediate steps) but the workflow is defined by us just how we can define a custom neural network in pytorch. But when I read what you have written, it feels as if we provide the modules to DSPy and it figures out the pipeline too. Is that correct?

Expand full comment
Viksit Gaur's avatar

hi @vishal, thanks for the comment! yes that’s correct. the system is tracing through the call graph and using various techniques in the optimizer to update the prompts in each step. check out the MIPROv2 algorithm for more details!

you don’t need to define a custom pytorch model necessarily. you just define dspy modules and let the optimizer do things.

you CAN define custom modules and optimize them which i’ll address in a future post! lmk if this helps!

Expand full comment
Sanmitra Pandharpur's avatar

Excellent post and great concepts on applying differentiable programming to agentic workflows. Does this apply just as well to multi turn complex conversations ? Also can we configure different LLMs or SLMs for each of the paths or routers. Lastly can this concept be nested to n levels for more advanced workflows ? Thanks

Expand full comment
Viksit Gaur's avatar

hey @sanmitra, thanks for the feedback and comment, appreciate it!

- yes it does apply to multi turn conversations too. as long as you define it in python using modules.

- you can! see my previous post on how to do routing through custom models. more in the future too on how they may work in an end to end optimization world

- can you elaborate on what you mean by nested?

Expand full comment
Lingzhen Chen's avatar

Great article! I'm wondering if you have insights on how data efficient is the optimizers in DSPY. For example, in a real-world scenario, are we talking about tens of samples or hundreds of samples to see a 20% improvement with multi-agent workflow?

Expand full comment
Viksit Gaur's avatar

this is a great question! i'm actually writing my next post on this -- will have an update for you soon!

Expand full comment