Enhancing engineering workflows with AI: a real-world experience - Building Nubank

8 min read Original article ↗

Artificial Intelligence (AI) and Large Language Models (LLMs) are revolutionizing the tech industry, and at Nubank, we’re using these technologies to enhance engineering workflows across Brazil, Mexico, and Colombia. In a recent talk at Clojure Conj 2024, Carin Meier, Principal Software Engineer at Nubank, and Marlon Silva, Software Engineer at Nubank, shared how AI-powered tools are transforming how we work.

Clojure Conj, a conference held since 2010, is a key event for the global Clojure community. It brings together developers and thought leaders to discuss the latest trends in Clojure programming. In 2024, it provided the perfect platform for Carin and Marlon to present how Nubank is integrating AI, including LLMs, to streamline our engineering processes.

In this article, we’ll explore the main topics from the lecture, and how these AI tools are optimizing everything from code generation to team collaboration at Nubank—and how they could help your team too.

What are Large Language Models (LLMs)?

Before diving into our experiences, let’s start with a quick overview of what LLMs are and how they work.

At a high level, LLMs like GPT-3 and GPT-4 are machine learning models trained on vast datasets to predict the next word (or token) in a sequence based on the context provided. They are designed to mimic human-like understanding and generation of language.

For example, when you type a prompt like “Clojure is a lovely programming language that allows you to,” an LLM can predict and continue the sentence with something like “code a program in a pure functional style.” The model does this by drawing from patterns it has learned during training, where it encounters large amounts of code and documentation, allowing it to generate meaningful sentences in response.

However, LLMs are not perfect. They require experimentation to understand their potential, especially when it comes to generating code in specific programming languages like Clojure, a language that doesn’t have as much public training data compared to more mainstream languages like Python or JavaScript.

Check our job opportunities

The power of benchmarking: testing LLMs for Clojure

To understand whether LLMs could truly enhance engineering workflows, we needed to test their capabilities. At Nubank, we selected a few models and applied them to generate Clojure code. While many existing benchmarks showed impressive results for languages like Python and JavaScript, we were curious how well these models would perform for Clojure, which has its own unique syntax and concepts.

Initially, we used a tool called the MultiPL-E (Multi-Programming Language Evaluation of Large Language Models of code) Benchmarking Tool. This open-source tool allows us to test the quality of code generated by LLMs based on a set of predefined problems, like those in the HumanEval and MBPP datasets.

With this tool, we were now able to put our Clojure code generation capabilities to the test. Thanks to invaluable support from Alex Miller, a prominent figure in the Clojure community and a vital part of Nubank’s operations, we integrated Clojure into MultiPL-E and started comparing it alongside Python and JavaScript. 

At first, we didn’t apply any special fine-tuning or engineering tricks; we simply wanted to observe the raw potential of the latest models (including open-source projects like Llama3 and private GPT variants from OpenAI) in producing production-ready code. Unsurprisingly, Clojure lagged a bit behind Python and JavaScript at first—likely a reflection of the smaller corpus of Clojure code used to train most LLMs—but the surprise was how close these results actually turned out.

With each new release—GPT-3.5, GPT-4, GPT-4o, o1-preview, o1, and beyond—we’ve observed the gap shrink further. It’s encouraging to see Clojure gain ground so quickly, and it gives us hope for a future where the disparity between languages all but disappears. As more models are trained on increasingly diverse datasets, we expect to see Clojure’s performance match Python’s and JavaScript’s. 

The open-source community and ongoing efforts like MultiPL-E are making strides to improve support and visibility for functional languages, and we’re excited about what this means for developers who rely on Clojure every day.

The lesson here? Don’t be afraid to experiment. Try various models and see how they align with your specific use cases. The performance of these models can vary significantly depending on your needs.

Building flexible tools for Engineering teams

One of the key takeaways from our journey with LLMs is the importance of building flexible and extensible tools. The world of AI is moving so fast that we can’t predict exactly what our engineers will need in the next month, let alone a year.

At Nubank, we’ve embraced this uncertainty. We’ve designed tools that are small, modular, and easy to adapt as new developments emerge. A good example of this is Roxy, a local proxy that facilitates the use of LLMs in a regulated environment.

Roxy is designed to ensure that any interaction with LLMs adheres to compliance and security regulations. Rather than building a complex solution tailored to a specific use case, we created a thin, flexible interface that engineers can use in a variety of ways. This approach allowed us to quickly adapt as new requirements or opportunities arose.

The key takeaway here is that teams shouldn’t over-engineer their tools. They should create something simple that can grow and evolve alongside technology.

Fostering a community for sharing AI insights

In any fast-moving field, collaboration is key. At Nubank, we’ve found that creating a community of practice—what we call guilds—has been invaluable. These are internal user groups where we share experiences, discuss challenges, and brainstorm ways to leverage new tools like LLMs effectively.

By gathering on a regular basis, we ensure that everyone stays up-to-date on the latest AI advancements and gets a chance to provide feedback. This has helped us continually improve our tools and techniques for integrating LLMs into engineering workflows.

If you’re working with AI or any new technology, consider fostering your own community. It’s a great way to keep learning and stay ahead of the curve.

Can LLMs help us think?

While many people worry that AI will replace human thinking, we believe that LLMs can actually enhance our thinking—if used correctly. For example, LLMs can help engineers and product managers think critically, ask better questions, and approach problems from new angles.

Something that we’ve found useful is using AI to guide us in identifying the root cause of a problem, rather than just providing the answer. For instance, if we’re faced with a performance issue in a microservice, we might prompt the LLM with a question like, “How can I best frame a solution for a microservice that runs slow on an IO operation?”

The idea isn’t to ask for an answer right away but to use the LLM to help us structure our thinking. By using LLMs this way, we can dig deeper into the problem and come up with better solutions.

In another example, Marlon used this method to craft a product report. He asked the LLM to assume the role of a product manager and help him structure a report for upper management on the benchmark of LLM models for Clojure. The result was a report that exceeded expectations and impressed the product manager.

A look into the future: the power of autonomous AI agents

As AI evolves, the idea of autonomous agents that can write code and solve problems on their own is becoming more of a reality. We’ve explored some early-stage tools, like Open Hands, which use LLMs to assist with tasks like data analysis.

In a recent demo, we tasked Open Hands with performing a data analysis on the Iris dataset using Clojure. The agent autonomously planned, wrote, and executed the code, demonstrating how LLMs can assist engineers in tasks that would typically require more time and effort. While the technology is still in its early stages, we’re excited by the possibilities it presents.

Devin, an autonomous AI software engineer developed by Cognition Labs, is another example of how AI is transforming software development. Devin has been instrumental in helping us migrate our massive ETL system with over 6 million lines of code. 

By automating repetitive tasks like refactoring and code migration, Devin enabled Nubank to complete a project initially estimated to take over 18 months with a thousand engineers in just weeks, achieving a 12-fold increase in efficiency and significant cost savings. 

Looking ahead

As AI continues to evolve, it’s clear that Large Language Models are not just tools for automating tasks—they are essential for enhancing developer workflows. By integrating LLMs into Nubank’s engineering processes, we’ve seen firsthand how they can boost productivity, foster creativity, and bridge gaps between technical and business teams. 

And, as we continue to explore and refine our AI solutions, we encourage other organizations to experiment and build flexible, extensible tools that adapt to the fast-moving world of AI. The future of engineering is here, and with LLMs, the possibilities are endless.

Learn more about what we shared on this topic in the video below: