tosijs-schema
npm | github | discord | examples
A schema-first Typescript / Javascript library for generating a single, standards-compliant source of truth for data types.
- Define data types once using standard json-schema
- Infer Typescript types automatically
- Validate efficiently, using "prime-jump" sampling for O(1) performance on massive arrays and dictionaries
- Diff schemas to detect breaking changes or structural drift
Smaller, faster, smarter, and safer.
Oh and it cheats when validating large datasetsβ¦
π¦ GENERATING DATA...
βοΈ PHASE 1: COLD START (Simulating Serverless / CLI) βοΈ
π Cold Run
[Array 1M] Tosi (Skip): 0.6629 ms
[Array 1M] Tosi (Full): 186.9792 ms
[Array 1M] Zod: 334.3249 ms
----------------------------------
π vs Zod: 504.4x faster (Optimized)
π vs Zod: 1.8x faster (Raw Speed)
[Dict 100k] Tosi (Skip): 5.4868 ms
[Dict 100k] Tosi (Full): 25.3275 ms
[Dict 100k] Zod: 54.1105 ms
----------------------------------
π vs Zod: 9.9x faster (Optimized)
π vs Zod: 2.1x faster (Raw Speed)
π WARMING UP JIT...
(Engine is hot)
π₯ PHASE 2: HOT JIT (Simulating Long-Running Server) π₯
π Hot Run
[Array 1M] Tosi (Skip): 0.3265 ms
[Array 1M] Tosi (Full): 182.9595 ms
[Array 1M] Zod: 367.2505 ms
----------------------------------
π vs Zod: 1124.8x faster (Optimized)
π vs Zod: 2.0x faster (Raw Speed)
[Dict 100k] Tosi (Skip): 3.1230 ms
[Dict 100k] Tosi (Full): 18.3557 ms
[Dict 100k] Zod: 59.0150 ms
----------------------------------
π vs Zod: 18.9x faster (Optimized)
π vs Zod: 3.2x faster (Raw Speed)
The reason for the two benchmarks is that
tosijs-schemawhen cheating is so efficient that it never gets JITed if you just do one run. It was winning without getting JITed! So now there are two scenarios which let you see just how big an advantage you get at scale.
Installation
npm install tosijs-schema
# or
bun add tosijs-schemaDefining a schema
tosijs-schema uses a clean, property-based syntax. Properties like string, email, or optional are getters, keeping definitions concise.
You can also attach Metadata (title, describe, default) to any node. This doesn't affect validation but is crucial for generating rich API documentation (Swagger/OpenAPI).
import { s, type Infer } from 'tosijs-schema' // 1. Define the Runtime Schema // This builds a standard JSON Schema object under the hood export const UserSchema = s .object({ id: s.string.uuid, // Property access (no parentheses) // First-class email support email: s.email .describe("User's primary contact") .default('anon@example.com'), // Metadata role: s.enum(['admin', 'editor', 'viewer']), // First-class Integer support score: s.integer.min(0).max(100), tags: s.array(s.string).optional, // .optional is a property // Dictionaries (Record<string, number>) with constraints // .min(1) = minProperties: 1 meta: s.record(s.number).min(1), }) .meta({ $id: '[https://api.mysite.com/schemas/user](https://api.mysite.com/schemas/user)', title: 'UserProfile', }) // 2. Infer the Compile-time Type export type User = Infer<typeof UserSchema> // Result: // { // id: string; // email: string; // role: "admin" | "editor" | "viewer"; // score: number; // tags?: string[] | undefined; // meta: Record<string, number>; // }
Runtime validation
tosijs-schema separates Type Safety (Is this valid?) from Debugging (Why is it invalid?).
1. The "Fast" Path (Default)
By default, validation is optimized for speed and returns a boolean.
Performance Note: For large arrays (>97 items) and large dictionaries, tosijs-schema uses a "prime-jump" strategy. It checks a fixed sample of items (roughly 100, including the first and last) regardless of size. This provides O(1) performance while maintaining a high statistical probability of catching errors.
import { validate } from 'tosijs-schema' const data = await fetchUsers() // Returns true/false. Blazing fast. if (!validate(data, UserSchema)) { console.error('Invalid data received') }
2. The "Strict" Path (Full Scan)
If you need 100% guarantee (e.g., critical financial data), you can disable the optimization.
// Forces checking of EVERY item in arrays and keys in objects validate(data, UserSchema, { fullScan: true })
3. Error Reporting & Debugging
To get detailed error messages, provide an onError callback.
Note: The validator "Fails Fast". It stops at the first error it finds to save CPU cycles.
// Option A: Log Errors validate(data, UserSchema, (path, msg) => { console.error(`Error at ${path}: ${msg}`) }) // Option B: Throw on Error validate(data, UserSchema, (path, msg) => { throw new Error(`Validation failed at ${path}: ${msg}`) }) // Option C: Options Object Syntax validate(data, UserSchema, { fullScan: true, // You can combine fullScan with logging onError: (path, msg) => console.error(path, msg), })
4. Validating Raw JSON Schemas
The validator is versatile: it accepts tosijs builder objects OR standard JSON Schema objects. This is useful if you receive schemas from an external API or file.
const rawSchema = { type: 'object', properties: { count: { type: 'integer', minimum: 0 }, }, } // Works perfectly validate({ count: 5 }, rawSchema)
Monadic Pipelines (M)
tosijs-schema includes a "Railway Oriented Programming" module for building type-safe tool chains. This is especially useful for AI Agents, ensuring that hallucinations or bad data are caught immediately at the source (Input vs Output) rather than cascading.
1. Guarded Functions (M.func)
Create functions that enforce schemas on both input and output, with built-in timeout protection.
import { M, s } from 'tosijs-schema' // M.func(InputSchema, OutputSchema, Implementation, TimeoutMs=5000) const getSize = M.func(s.string, s.number, (str) => { return str.length }, 1000) // Optional timeout // Usage: const len = await getSize('hello') // Returns 5 // await getSize(123) // Throws SchemaError (Input mismatch)
2. Execution Contexts (new M)
Chain multiple functions together. The execution context handles error propagation automatically.
const pipeline = new M({ getSize, isEven: M.func(s.number, s.boolean, (n) => n % 2 === 0), }) const result = await pipeline .getSize('hello') // Output: 5 .isEven() // Input: 5 -> Output: false .result() // Returns false | Error // If any step fails schema validation, .result() returns the specific error.
Object Constraints & "Ghost" Properties
You can use .min(n) and .max(n) on Objects and Records to set minProperties and maxProperties.
The "Ghost" Constraint:
-
minProperties: Is validated strictly. (e.g., A dictionary must have at least 1 item). -
maxProperties: Is added to the JSON Schema output (for documentation/Swagger) but is IGNORED by the validator.
Why? Validating maxProperties requires counting every key in an object, which is an O(N) operation. If a malicious user sends a payload with 100,000 keys, validating the count wastes CPU cycles after the memory has already been allocated by JSON.parse. We assume your business logic or database will handle quota limits, keeping the schema validator focused on structure.
Schema Diffing
Check the difference between two schemas to spot API changes or version drifts.
import { diff } from 'tosijs-schema' const V1 = s.object({ id: s.number }) const V2 = s.object({ id: s.integer }) // Changed type console.log(diff(V1.schema, V2.schema)) // Output: { id: { error: "Type mismatch: number vs integer" } }
API Reference
Primitives & Properties (No Arguments)
Use these as static properties.
-
Base:
s.string,s.number,s.integer,s.boolean,.optional -
String Formats:
.email,.uuid,.ipv4,.url,.datetime,.emoji -
Number Formats:
.int(casts number to integer schema)
Note that these formats are all treated as "first class". You can write
s.string.emailif you really want to, but justs.emailwill do, and you can still further constrain it with.pattern(β¦)etc.
Constraints (Arguments Required)
Use these as chainable methods.
-
String:
.pattern(/regex/),.min(length),.max(length) -
Number/Integer:
.min(value),.max(value),.step(value) -
Array:
.min(count),.max(count) -
Object/Record:
.min(count),.max(count)(Note:.maxis documentation only)
Metadata
Available on all types to enrich the JSON schema output.
.title("...")-
.describe("...")maps todescription .default(value)-
.meta({ ... })merges arbitrary JSON keys (e.g.,$id,$schema,examples)
Complex Types
-
Arrays:
s.array(s.string) -
Objects:
s.object({ key: s.string }) -
Records:
s.record(s.number)β Maps toadditionalProperties. -
Enums:
s.enum(['a', 'b']) -
Unions:
s.union([s.string, s.number])β Maps to TypeScript unions (|) and JSON SchemaanyOf. -
Tuples
s.tuple([s.string, s.number])β Fixed-length arrays.
Limitations
To keep the library tiny and fast, specific JSON Schema features are not implemented:
-
Deep Constraints:
uniqueItemsanddependenciesare omitted for performance. -
maxPropertiesfor objects is added to the schema, but the validator doesn't check. - Recursive Types: TypeScript inference for circular references (e.g., Trees) is not currently supported in the builder API.
In my opinion,
minPropertiesmakes perfect sense when checking for objects treated as hash-mapped arrays being non-empty, butmaxPropertiesmakes no sense and is non-trivial to implement.
LLM & OpenAI Structured Outputs
tosijs-schema is "LLM-Native" by design.
If you are building AI agents using OpenAI's response_format: { type: "json_schema" } or Anthropic's tool use, this library is likely a better fit than Zod.
1. "Strict Mode" Ready
OpenAI's Structured Outputs have strict requirements: all fields must be required, and additionalProperties must be false.
-
Zod: Defaults to flexible objects. You often have to aggressively chain
.required()or use post-processors to satisfy the API. -
tosijs: Defaults to strict.
s.object(...)automatically marks all keys as required and setsadditionalProperties: false. It works out of the box.
2. Zero-Conversion Token Savings
Zod requires a third-party adapter (zod-to-json-schema) to talk to LLMs. This often introduces verbose artifacts, nested $defs, or bloated schema structures that waste tokens.
tosijs-schema is JSON Schema. The .schema property is the literal object the LLM needs. It is cleaner, flatter, and consumes fewer tokens in the context window.
Example: OpenAI Extraction
import OpenAI from 'openai' import { s } from 'tosijs-schema' const ExtractionSchema = s .object({ sentiment: s.enum(['positive', 'neutral', 'negative']), key_points: s.array(s.string).describe('List of main topics mentioned'), urgency: s.integer.min(1).max(10), }) .meta({ title: 'SentimentAnalysis', description: 'Analyze the user input', }) const completion = await openai.chat.completions.create({ model: 'gpt-4o', messages: [ { role: 'user', content: 'I love this product but shipping was slow.' }, ], response_format: { type: 'json_schema', json_schema: { name: 'analysis', strict: true, // Works instantly because tosijs is strict by default schema: ExtractionSchema.schema, }, }, })
Why not Zod?
Zod is an excellent library, but it is TypeScript-first and relies on a heavy Object-Oriented class hierarchy. Real-world Zod bundles can be heavy because it does not tree-shake well.
tosijs-schema is Schema-first and Functional.
- LLM Ready: Generates strict, token-efficient JSON schemas compatible with OpenAI Structured Outputs without adapters.
- Portable: The output is standard JSON Schema, usable by other tools/languages.
- Typesafe Functions and Railway Oriented monadic execution pipelines.
- Tiny: ~3kB minified.
- Performance: Uses "prime-jump" sampling for O(1) validation of massive arrays and dictionaries (vs Zod's O(N)). Even in full-scan mode, it is ~2x faster due to raw recursion.
Both have zero dependencies. Choose the one that meets your needs.
License
MIT