GitHub - vlm-run/vlmrun-node-sdk: Official Node.js SDK for VLM Run

The VLM Run Node.js SDK is the official Node.js client for VLM Run API platform, providing a convenient way to interact with our REST APIs.

🚀 Getting Started

Installation

# Using npm
npm install vlmrun

# Using yarn
yarn add vlmrun

# Using pnpm
pnpm add vlmrun

Basic Usage

Image Predictions

import { VlmRun } from "vlmrun";

// Initialize the client
const client = new VlmRun({
  apiKey: "your-api-key",
});

// Process an image (using image url)
const imageUrl =
  "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/invoice_1.jpg";
const response = await client.image.generate({
  images: [imageUrl],
  domain: "document.invoice",
  config: {
    jsonSchema: {
      type: "object",
      properties: {
        invoice_number: { type: "string" },
        total_amount: { type: "number" },
      },
    },
  },
});
console.log(response);

// Process an image passing zod schema
import { z } from "zod";

const imageUrl =
  "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/invoice_1.jpg";

const schema = z.object({
  invoice_number: z.string(),
  total_amount: z.number(),
});

const response = await client.image.generate({
  images: [imageUrl],
  domain: "document.invoice",
  config: {
    responseModel: schema,
  },
});
const response = response.response as z.infer<typeof schema>;
console.log(response);

// Process an image (using local file path)
const response = await client.image.generate({
  images: ["tests/integration/assets/invoice.jpg"],
  model: "vlm-1",
  domain: "document.invoice",
});
console.log(response);

Document Predictions

import { VlmRun } from "vlmrun";

// Initialize the client
const client = new VlmRun({
  apiKey: "your-api-key",
});

// Upload a document
const file = await client.files.upload({
  filePath: "path/to/invoice.pdf",
});

// Process a document (using file id)
const response = await client.document.generate({
  fileId: file.id,
  model: "vlm-1",
  domain: "document.invoice",
});
console.log(response);

// Process a document (using url)
const documentUrl =
  "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/google_invoice.pdf";
const response = await client.document.generate({
  url: documentUrl,
  model: "vlm-1",
  domain: "document.invoice",
});
console.log(response);

// Process a document passing zod schema
import { z } from "zod";

const schema = z.object({
  invoice_id: z.string(),
  total: z.number(),
  sub_total: z.number(),
  tax: z.number(),
  items: z.array(
    z.object({
      name: z.string(),
      quantity: z.number(),
      price: z.number(),
      total: z.number(),
    })
  ),
});

const response = await client.document.generate({
  url: documentUrl,
  domain: "document.invoice",
  config: { responseModel: schema },
});

const response = response.response as z.infer<typeof schema>;
console.log(response);

Using Callback URLs for Async Processing

VLM Run supports callback URLs for asynchronous processing. When you provide a callback URL, the API will send a webhook notification to your endpoint when the prediction is complete.

import { VlmRun } from "vlmrun";

// Initialize the client
const client = new VlmRun({
  apiKey: "your-api-key",
});

// Process a document with callback URL
const url =
  "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/google_invoice.pdf";
const response = await client.document.generate({
  url: url,
  domain: "document.invoice",
  batch: true, // Enable batch processing for async execution
  callbackUrl: "https://your-webhook-endpoint.com/vlm-callback",
});

console.log(response.status); // "pending"
console.log(response.id); // Use this ID to track the prediction

Webhook Payload

When the prediction is complete, VLM Run will send a POST request to your callback URL with the following payload:

{
  "id": "pred_abc123",
  "status": "completed",
  "response": {
    "invoice_id": "INV-001",
    "total": 1250.0,
    "items": []
  },
  "created_at": "2024-01-15T10:30:00Z",
  "completed_at": "2024-01-15T10:30:45Z"
}

Document Predictions with Zod Definitions

import { VlmRun } from "vlmrun";
import { z } from "zod";
// Initialize the client
const client = new VlmRun({
  apiKey: "your-api-key",
});

// Define enums and base schemas
enum PaymentStatus {
  PAID = "Paid",
  UNPAID = "Unpaid",
  PARTIAL = "Partial",
  OVERDUE = "Overdue",
}

enum PaymentMethod {
  CREDIT_CARD = "Credit Card",
  BANK_TRANSFER = "Bank Transfer",
  CHECK = "Check",
  CASH = "Cash",
  PAYPAL = "PayPal",
  OTHER = "Other",
}

const currencySchema = z
  .number()
  .min(0, "Currency values must be non-negative");

const dateSchema = z
  .string()
  .regex(/^\d{4}-\d{2}-\d{2}$/, "Date must be in YYYY-MM-DD format");

// Define address schema
const addressSchema = z.object({
  street: z.string().nullable(),
  city: z.string().nullable(),
  state: z.string().nullable(),
  postal_code: z.string().nullable(),
  country: z.string().nullable(),
});

// Define line item schema
const lineItemSchema = z.object({
  description: z.string(),
  quantity: z.number().positive(),
  unit_price: currencySchema,
  total: currencySchema,
});

// Define company schema
const companySchema = z.object({
  name: z.string(),
  address: addressSchema.nullable(),
  tax_id: z.string().nullable(),
  phone: z.string().nullable(),
  email: z.string().nullable(),
  website: z.string().nullable(),
});

// Define invoice schema using the definitions
const invoiceSchema = z.object({
  invoice_id: z.string(),
  invoice_date: dateSchema,
  due_date: dateSchema.nullable(),
  vendor: companySchema,
  customer: companySchema,
  items: z.array(lineItemSchema),
  subtotal: currencySchema,
  tax: currencySchema.nullable(),
  total: currencySchema,
  payment_status: z.nativeEnum(PaymentStatus).nullable(),
  payment_method: z.nativeEnum(PaymentMethod).nullable(),
  notes: z.string().nullable(),
});

const documentUrl =
  "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/google_invoice.pdf";

const result = await client.document.generate({
  url: documentUrl,
  domain: "document.invoice",
  config: {
    responseModel: invoiceSchema,
    zodToJsonParams: {
      definitions: {
        address: addressSchema,
        lineItem: lineItemSchema,
        company: companySchema,
      },
      $refStrategy: "none",
    },
  },
});

OpenAI-Compatible Chat Completions

The VLM Run SDK provides OpenAI-compatible chat completions through the agent endpoint. This allows you to use the familiar OpenAI API with VLM Run's powerful vision-language models.

import { VlmRun } from "vlmrun";

// Initialize the client with the agent endpoint
const client = new VlmRun({
  apiKey: "your-api-key",
  baseURL: "https://agent.vlm.run/v1",
});

// Use OpenAI-compatible chat completions
const response = await client.agent.completions.create({
  model: "vlmrun-orion-1",
  messages: [{ role: "user", content: "Hello! How can you help me today?" }],
});

console.log(response.choices[0].message.content);

Streaming Responses

const stream = await client.agent.completions.create({
  model: "vlmrun-orion-1",
  messages: [{ role: "user", content: "Tell me a story" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Note: The OpenAI SDK is an optional peer dependency. Install it with:

npm install openai
# or
yarn add openai

🛠️ Examples

Check out the examples directory for more detailed usage examples:

Models - List available models
Files - Upload and manage files
Predictions - Make predictions with different types of inputs
Feedback - Submit feedback for predictions

🔑 Authentication

To use the VLM Run API, you'll need an API key. You can obtain one by:

Create an account at VLM Run
Navigate to dashboard Settings -> API Keys

Then use it to initialize the client:

const client = new VlmRun({
  apiKey: "your-api-key",
});

📚 Documentation

For detailed documentation and API reference, visit our documentation site.

🤝 Contributing

We welcome contributions! Please check out our contributing guidelines for details.

📝 License

This project is licensed under the Apache-2.0 License - see the LICENSE file for details.