Register To Watch This Content
This Talk has ended, but you can still watch the replay! Register now to get access.
About this Talk
As AI tooling becomes more prominent in developer workflows, one of the unsolved problems we face is how to optimally use these tools to understand and transform codebases at scale. Our work focuses on reimagining file sorting within large code changes, particularly within pull requests. One hypothesis was that file sorting might not matter—as long as the context window provided to an LLM is sufficient, the results should be accurate regardless of order. However, we wanted to explore if, just like human developers need a logical order to understand complex code, an LLM would benefit from such ordering as well. We explored whether LLMs could sort files within PR topics based on their relationships and dependencies—a seemingly straightforward task with unexpected challenges. This session will share our learnings from building a deterministic sorting algorithm versus using an LLM-based approach, with insights on when AI might actually be overkill.
Key Takeaways
- Order Matters (in most cases): Just like human developers need logically ordered files to understand complex codebases, LLMs also benefit from well-structured input. Proper file ordering significantly improved LLM accuracy, reducing hallucinations and improving accuracy.
- Deterministic vs. AI Approaches: While LLMs show potential in understanding code dependencies, a deterministic, graph-based approach provided comparable results in small batches. Combining these methods can helped achieve optimal results in understanding and managing code relationships.
- Leveraging Visualization for Human Insight: Visualizing code dependencies through helps developers comprehend jpw dependencies that are not immediately obvious, ultimately lead to better-informed code composition choices.