Differentiable Programming for Learnable…

Viksit Gaur

Jul 13

How learnable graphs reduce manual tuning, scale decision-making, and make LLM workflows self-improving.

Read →

6 Comments

Vishal Ahuja

Jul 14

Hi Viksit, can you please explain the following:

"In traditional systems, structure is a constraint. You define a flow upfront, and then hope your logic fits inside it. Differential programming flips that.

DSPy, for instance, watches what your program does, and builds the structure from that. Every call to a module is an edge. Every submodule is a node. The graph isn’t explicitly declared, its implicitly constructed through the core programming language’s own execution flow.

This means you’re not limited to static paths. You can branch on values, loop through reasoning steps, or conditionally call tools — and DSPy will still know what your agent did, and when.

And once the structure is traced, it becomes a target for learning and improves over time."

My understanding of DSPy is that optimizes all the prompts in your workflow (by just providing the input,output pairs - not the intermediate steps) but the workflow is defined by us just how we can define a custom neural network in pytorch. But when I read what you have written, it feels as if we provide the modules to DSPy and it figures out the pipeline too. Is that correct?

Expand full comment

Reply (1)

Viksit Gaur

Jul 14

hi @vishal, thanks for the comment! yes that’s correct. the system is tracing through the call graph and using various techniques in the optimizer to update the prompts in each step. check out the MIPROv2 algorithm for more details!

you don’t need to define a custom pytorch model necessarily. you just define dspy modules and let the optimizer do things.

you CAN define custom modules and optimize them which i’ll address in a future post! lmk if this helps!

Expand full comment

Sanmitra Pandharpur

Jul 14Edited

Excellent post and great concepts on applying differentiable programming to agentic workflows. Does this apply just as well to multi turn complex conversations ? Also can we configure different LLMs or SLMs for each of the paths or routers. Lastly can this concept be nested to n levels for more advanced workflows ? Thanks

Expand full comment

Reply (1)

Viksit Gaur

Jul 14

hey @sanmitra, thanks for the feedback and comment, appreciate it!

- yes it does apply to multi turn conversations too. as long as you define it in python using modules.

- you can! see my previous post on how to do routing through custom models. more in the future too on how they may work in an end to end optimization world

- can you elaborate on what you mean by nested?

Expand full comment

Lingzhen Chen

Jul 23

Great article! I'm wondering if you have insights on how data efficient is the optimizers in DSPY. For example, in a real-world scenario, are we talking about tens of samples or hundreds of samples to see a 20% improvement with multi-agent workflow?

Expand full comment

Reply (1)

Viksit Gaur

Jul 24

this is a great question! i'm actually writing my next post on this -- will have an update for you soon!

Expand full comment

viksit has notes.

Differentiable Programming for Learnable…