Deep Onboarding - NFHN Reader

Our CEO used to personally white-glove onboard every single customer at Logic. It helped ensure everyone had a great first experience, but of course it didn’t scale. As we worked on making Logic self-serve, we naturally thought “Can we automate this?”

We’re a tiny six-person startup, but in the age of LLMs every customer should get a personalized experience.

The original process involved performing deep research on the company, discovering their unique take on things (their “corporate soul”), researching the specific employee’s role within that company, and brainstorming ways our platform could help them. We often wrapped up by hopping on a Zoom call with the customer, running through everything, and refining as needed. And as a last step, our CEO would email them login instructions and all the context they need to get started. It would take hours per customer.

When a user first signs in to Logic, we want them to have a set of pre-loaded workflows and tasks that are very specific to the user that signed up. Not to the point of being creepy, but right up to the point of them asking “How did they know?”.

For example, if you work in operations at an e-commerce shop and you screen your vendors for specific sustainability policies, we want a Logic doc – a self-contained agent that runs a process you define – ready to help you enforce that. If you’re an engineering manager who owns receipt categorization at a fintech company, we want a Logic doc waiting to do it for you. If you work in medtech and need to do medical coding for certain procedures, we want a doc ready for that too.

Importantly, these shouldn’t be generic. They should naturally incorporate policies and philosophies we’ve discovered from your website, corporate blog, published interviews with your founders, or anything else that’s online.

The first step in automating such a bespoke process is to precisely define what the steps are.

When a new organization is created in Logic:

Research the company
Research the employee
Generate personalized use cases that fit within our capabilities
Create Logic documents for them based on those use cases

We figured it might take a few minutes to do this research so we’d need to perform it in the background while they were first trying out the product.

Our first task was to figure out which model would be best for the company research. The first two candidates were Perplexity and OpenAI’s o3 deep research model. We already had an integration with Perplexity for performing web searches, and our CEO had success doing manual onboarding using results from OpenAI’s o3.

We wrote a detailed prompt that defined what our platform could do, and the information that we knew about our new customer: name, email address, and company website. We then asked the model to infer details about the company and the individual, and identify some key use cases that they might benefit from automating with Logic.

We then compared the outputs from Perplexity, OpenAI’s o3-deep-research, and added in o4-mini-deep-research as well to see which would perform best. o3 gave us the best output but it was too slow and too expensive. The output from o4-mini-deep-research was nearly as good but finished in 4 minutes and the price was right.

One thing we discovered was that in order to get 3 really good examples it worked better to ask for 10, and then tell the model to select the 3 best ones from that list. If we just requested the 3 best examples directly, the results tended to be very monotonous. In many cases 2 of the 3 use cases would be the same across many different users.

Our hypothesis: variety generation and quality selection are separate cognitive tasks. When asked for “the 3 best,” the model simultaneously tries to be creative and critical, which muddies both. When asked broadly and then to narrow the results, it can explore a bit first, then evaluate. This is the same principle as chain-of-thought prompting - and the internal reasoning that newer models do automatically: forcing it to “think” rather than immediately selecting the most likely completion produces a better result. It costs some tokens and some latency, but for a process like this it’s clearly worth it.

Unfortunately, we ran into a small problem when attempting to integrate the research into our application: OpenAI’s deep research models don’t support strict structured output, and we found that it would often produce non-parseable JSON output, or deviate from the schema that we provided.

To solve that, we split the problem into two phases: research first, then structure.

We kept the existing prompt, but stopped asking it to structure the output. Instead we just requested text, allowing the model to decide how it wanted to deliver the results. There was some variance, but it consistently followed the prompt and provided good data even if the structure was inconsistent.

We then took that raw research and sent it to a second, faster model with our JSON schema. This model extracts the key fields: company summary, employee role, and the top three use cases.

The structuring phase doesn’t require much reasoning, so our internal model router picks the fastest model that can handle strict JSON schema output.

Occasionally when left to its own devices, the model could come up with fanciful use cases that were simply beyond the capabilities of our platform (and sometimes, LLMs as a whole!).

We found that by explicitly grounding it in the capabilities that Logic supports we were able to steer deep research into giving us use cases at a much higher success rate. The grounding included:

File formats (pdf, png, wav, etc.)
Data expectations (structured data, json)
Capabilities (reasoning, image analysis, audio processing)
Tools (web search, image generation)

Some companies are stealth startups. Some employees have minimal online presence. What happens when the research phase comes back empty?

We handle this explicitly in the prompt: if information is insufficient, return what’s available and infer the rest based on industry patterns or general business needs. Something personalized is better than nothing. Something generic but grounded in our actual capabilities is better than something hallucinated and impossible. It’s important to offer the model a fallback case when no information is available rather than insisting that it come up with something.

Of course identifying 3 use cases isn’t the entire story. Next we need to actually turn those into automated workflows. To do that we rely on our existing autodoc feature, which takes a simple request like “categorize a receipt” and then constructs a full document from that.

It works by understanding our platform’s capabilities similar to the org research step but it adds an additional feature: we have a set of known good automated workflows that we search through using embeddings, then attach the best 3 matches as few-shot examples to the LLM.

In the spirit of automating this entire process, we also send out an email to the customer, with login details, and offer them a zoom link to meet with our CEO, just like before.

The feedback has been consistently positive. People don’t expect a B2B SaaS product to know anything about them when they sign up, and it’s fun to see what a system comes up with about you. We were a little worried that it might feel invasive, but instead it feels tailored. Our users immediately pick up on how it must work since there wasn’t enough time for a human being to have done the research.

The pattern here works beyond onboarding. Any workflow where a human researches and takes action is a candidate for this kind of automation.

We automated our CEO’s two-hour research process not by making it less personal, but by making it scale. The research is just as thorough and the suggestions just as tailored. But it doesn’t take two hours anymore and Steve is freed up to work on other things, like rant about fine tuning.