Transformer Lab - NFHN Reader

§1

Research

Transformer Lab is dedicated to exploring the frontier of artificial intelligence. We conduct research across diverse domains in machine learning and publish our findings in the open.

The defining property of the lab is velocity and versatility. We pursue diverse challenges across distinct domains of machine learning, with a bias toward novelty and a deep love for the technically intriguing.

§2

Publications

Selected results from our lab's recent work.

Asaria, Salomone, Gandhi.

Judging to Improve: A De-biased VLM-as-3D-Judge Protocol for Single-Image 3D Generation. [3D] arXiv:2606.20364 June 18, 2026
A trainable, de-biased VLM-as-judge for single-image 3D generation — one VLM family labels training pairs, a different family scores, and verdicts only count when they survive an order swap. Used to test cheap label-free adaptation of a strong base: six methods reach only parity (0.50 win-rate), never the 0.65 bar — the durable artifact is the judge protocol, not a model.
Asaria, Salomone, Gandhi.

Train, Retrieve, or Both? A Four-Arm Head-to-Head for Correct Statutory Citation on the Ontario Residential Tenancies Act. [LLM] arXiv:2606.20359 June 18, 2026
A four-arm head-to-head (base, LoRA SFT, RAG, SFT+RAG) for correct statutory citation on Ontario tenancy law. The base model hallucinates 81% of its citations; retrieval is the decisive lever, driving hallucinations to zero by construction and lifting citation exact-match to 0.44, with the SFT+RAG hybrid best at 0.481.
Asaria, Salomone, Gandhi.

A Cross-Model VLM-Judge Protocol for Single-Image 3D Mesh Quality (and Why Cheap Proxies Fall Short). [3D] arXiv:2606.18451 June 16, 2026
A standardized evaluation protocol for single-image-to-3D mesh generators, using 24-view rendering and position-bias correction — and showing that common proxies like CLIP similarity and geometry-validity metrics don't substitute for a VLM judge.
Asaria, Salomone, Gandhi.

Reliable Neural-Codec Text-to-Speech by ASR Self-Verification and Distillation: Near-Zero Catastrophic Failures Across Models and Codecs. [AUDIO] arXiv:2606.18323 June 16, 2026
ASR-based self-verification drives catastrophic failures (silence, early termination, repetition) to near zero in autoregressive neural-codec TTS, then distills the behavior for inference-time efficiency — generalizing across four TTS systems and three codecs.
Asaria, Salomone, Gandhi.

Neither Parallel Nor Sequential: How DiffusionGemma Actually Commits Tokens. [LLM] arXiv:2606.14620 June 12, 2026
A close look at token-commitment patterns in DiffusionGemma 26B. Contrary to parallel-decoding marketing, the behavior is neither parallel nor block-autoregressive — weak left-to-right bias and substantial within-batch ordering ambiguity.
Asaria, Salomone, Gandhi.

Realizing Native INT8 Compute for Diffusion Transformers on Consumer GPUs: A Fused INT8 GEMM Kernel for Ideogram 4.0. [SYSTEMS] arXiv:2606.14598 June 12, 2026
A fused Triton kernel that properly drives the INT8 tensor cores on consumer Ampere GPUs — ~1.1× end-to-end speedup, making 1024px generation feasible on a single RTX 3090.
Gandhi, Asaria, Salomone.

Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs. [VISION] arXiv:2606.12280 June 10, 2026
Post-training quantization of Ideogram 4.0 where INT8 W8A8 comes out statistically indistinguishable from FP8 on key quality metrics, with INT8 and GGUF Q4_K both cutting compute for consumer-GPU deployment.

→ Read all of our research

§3

Research Tooling

Our lab doesn't just release papers and code, we also partner with the world's best labs, across academia and industry, to unlock velocity for their researchers (and their researchers' agents). The tools we build are designed to accelerate the entire research loop, from planning to publication.

§4

Research at Maximum Velocity

Science is, fundamentally, a search algorithm through the infinite space of possible truths. Our goal is to transform research from a highly manual, sequential bottleneck into a massively parallel utility you can dial up, empowering scientists to act as conductors of an intellectual orchestra that can discover paths previously uncharted. Let's discover the unknown together!