Skip to content
AI Games — software & AI studio, Sofia

Engineering

Engineering, in principle and in practice.

A small studio earns leverage by deciding the hard questions once and defending those decisions across every product. Six principles set the direction; ten invariants, enforced in code, keep the system honest.

Engineering principles

Six engineering values the studio holds across projects and platforms. They describe how we think — not which database we picked. The platform choices below them are downstream of these.

Simplicity is a feature you defend, not a default you inherit

Systems grow more complex every week unless somebody is paid to push back. The cost of an extra dependency, an extra service, an extra code path is paid every day forever; the benefit is collected once. New complexity earns its keep before it gets tenure — and most of it never does.

The right thing easy, the wrong thing impossible

Invariants belong in code, not in documentation. Schema constraints, generated clients, type systems, append-only ledgers, idempotency keys, lint rules that fail the build. A rule written in prose is a rule that will be forgotten; a rule the compiler enforces is a rule that cannot be violated.

The server is the trust boundary

Clients ship into hostile territory: user-space binaries that can be tampered with, replayed, decompiled or run under a debugger. Pricing, identity, ownership and money therefore live on machines we control. A jailbroken phone, a forged cookie or a replayed request cannot move funds or read someone else's data.

Async-first for unpredictable work

Anything whose latency cannot be promised should not block a user. AI calls, batch jobs, third-party reaches, long renders — return immediately with a job handle, reconcile in the background, refund work that fails. The synchronous-by-default request handler is the most expensive habit a codebase inherits.

Latency is a UX feature

Perceived speed is a product requirement, not a late-stage optimisation. Compute should live close to the user, assets close to the request, payloads sized to the network they cross. An interface that responds in 80 milliseconds is a different product from one that responds in 800 — even when the work behind the response is identical.

Ship the slice, not the scaffold

A feature is done when a person can complete the workflow end-to-end on a real device — sign in, do the thing, see the result, share the result, and recover from a failure. Half-built work behind feature flags compounds into debt; every sprint should end with one fewer flag, not one more.

The default stack — and what we reach for elsewhere

LayerDefault · alternatives we deploy in client engagements
ComputeCloudflare Workers · AWS Lambda · Azure Functions · GCP Cloud Run · containerised Kubernetes on any cloud or on-prem
Relational stateCloudflare D1 (SQLite) · Postgres on RDS / Aurora / Azure / GCP · Snowflake for analytics
Object storageCloudflare R2 · AWS S3 · Azure Blob · GCP Cloud Storage · on-prem MinIO / Ceph
Cache & rate limitingCloudflare KV · Redis / ElastiCache / Azure Cache
Async jobs & streamingCloudflare Queues · SQS / SNS · Azure Service Bus · Kafka · Pub/Sub
Data platformDatabricks · Snowflake · Azure Synapse · AWS Lake Formation · BigQuery
Vector & retrievalCloudflare Vectorize · pgvector · Pinecone · Azure AI Search · OpenSearch
AI gateway & routingCloudflare AI Gateway → OpenRouter · Bedrock · Azure OpenAI · Vertex AI
Workloads we run in-houseCloudflare Workers AI · LoRA / QLoRA fine-tuning on open-weight models (Llama, Qwen, Gemma, PaliGemma)
Web frameworksNext.js 16, React 19, TypeScript 6, Tailwind CSS 4 — deployable to the edge or any cloud
MobileKotlin / Jetpack Compose · Swift / SwiftUI
Infrastructure as codeTerraform · Pulumi · Wrangler · platform-native templates
ObservabilityOpenTelemetry · Grafana · Datadog · Cloud-native logs / metrics
On-prem & hardwareVMware · Proxmox · KVM · SAN / NAS design · BGP routing · capacity & DR planning
Robotics middlewareROS 2 · ArduPilot

Ten invariants, enforced in code

The principles above describe what we believe. The ten rules below describe what the build refuses to ship.

  1. Cloud-native, on one cloud per project.

    Studio products run on Cloudflare. Client engagements run where the project demands — AWS, Azure, GCP, hybrid or on-prem. Either way, one platform per project, one console to read at 3 a.m.

  2. OpenAPI is the contract.

    The web client, the iOS app and the Android app consume the same generated client. Break the contract and the mobile clients break with it — which forces honesty at the API layer.

  3. Async-first for AI.

    HTTP returns 202 + jobId; every LLM and image-model call runs in a queue consumer. The user request never times out at 30 seconds because a model took 90, and failed inferences refund credits automatically.

  4. All business logic server-side.

    Mobile and web are thin clients. A jailbroken phone cannot mint credits, edit pricing, or skip an ownership check, because the rules do not live on the client.

  5. Security in middleware.

    HttpOnly cookies, Turnstile, rate limits, CSRF protection and Zod on every endpoint. Security is not a feature you remember to add to a route — it is the floor every route already stands on.

  6. Funds through credit ledgers only.

    Append-only — never a direct balance UPDATE. Every transaction is a row with a source event and a hash, which is why double-charges and silent balance drift are physically impossible.

  7. Idempotency is mandatory.

    Idempotency-Key on every mutation; the request and response hash is cached. Re-trying a failed network request never accidentally does the work twice, or charges twice.

  8. One ownership-check helper.

    assertOwnsResource() returns 404 on miss — never 403. An attacker probing for resources of other users gets the same response as if the resource never existed.

  9. Provider keys behind a gateway only.

    Enforced by ESLint and Semgrep. AI model credentials live in a gateway — Cloudflare AI Gateway, AWS Bedrock, Azure OpenAI, or equivalent — never in application code directly. A leaked worker cannot spend the AI budget.

  10. Ship when the workflow completes.

    Each sprint ends with a working flow, not a half-built feature behind a flag. If a user cannot complete it end to end on the demo, we do not merge it.

The async AI request flow

1.  Client POSTs to /api/v1/<action>
2.  Worker validates input (Zod), checks idempotency key, deducts credits
    in an atomic D1 batch, enqueues a Queue job, returns 202 + jobId.

3.  Queue consumer runs:
      ├─ Pull payload from R2 (if any)
      ├─ Call provider via Cloudflare AI Gateway → OpenRouter
      ├─ Circuit breaker after repeated failures (5 / 60s)
      ├─ Write result to R2, update D1 status row
      └─ Auto-refund credits on non-retryable failure (idempotent)

4.  Client polls /api/v1/<action>/<jobId> with exponential backoff
    (2s → 5s → 10s, 180s hard timeout).
5.  On success, the client fetches the artifact through a Worker proxy
    that streams from R2 behind a KV-cached signed URL.

What we ship in AI

The studio works across the modern AI stack — model selection and orchestration, fine-tuning, retrieval, evaluation, data engineering and AI-assisted security. We do not train base models from scratch; we adapt, compose and operationalise models we can audit.

LLM orchestration & ensembles

Multi-model routing through Cloudflare AI Gateway, AWS Bedrock or Azure OpenAI — with caching, rate limits and per-model cost telemetry. Where one model is enough, we use one. Where reliability demands it (see Fatumu), we compose Claude Sonnet 4.6, GPT-4.1 and Gemini 2.5 Pro against the same prompt and reconcile their disagreement with an inverse-CI95 Bayesian weighted average.

Retrieval-augmented generation

Embedding pipelines, chunking strategies tuned per corpus, hybrid sparse-dense retrieval, query rewriting and cross-encoder reranking. Vector storage on Cloudflare Vectorize, pgvector, Pinecone or Azure AI Search — whichever the surrounding architecture already favours. We do not ship chat-with-your-PDF demos; we ship retrieval systems that survive in production.

Fine-tuning & adapter training

LoRA and QLoRA adaptation of open-weight models — Llama, Qwen, Gemma, PaliGemma — for tasks where prompting alone underperforms: domain classification, structured extraction, calibrated probability scoring, schema-strict generation. Datasets curated and versioned; evaluation harnesses run on every checkpoint.

Vision, multimodal & VLA models

Identity-preserving image generation (FLUX.2, Seedream), image moderation (LLaVA-1.5 on Workers AI), document understanding, and Vision-Language-Action policies for robotics. Praemonitus uses SPEAR-1 by INSAIT — a VLA that learns 3D scene structure from monocular RGB — as the on-robot manipulation policy on Franka and WidowX embodiments.

Probabilistic decision systems

Bayesian outcome simulation, calibrated probabilities with explicit 80 % and 95 % confidence intervals, scenario branching, and mission-abort thresholds. Used in Fatumu to ship defensible forecasts; used in Praemonitus to decide when an autonomous action requires operator confirmation.

Data engineering for AI

Pipelines on Databricks, Snowflake, Azure Synapse and AWS Lake Formation. Lakehouses, streaming ingestion (Kafka, Pub/Sub), CDC, dbt models, embedding pipelines that feed RAG systems, and LLM-assisted analytics that let analysts query the warehouse in natural language with a verifiable SQL audit trail.

AI-assisted security automation

Continuous SAST, DAST, dependency and container scanning, with an LLM-augmented triage layer that prioritises real findings and opens patched pull requests instead of writing tickets nobody resolves. SBOM diffing, supply-chain analysis, secret scanning, automated dependency upgrades — measured against mean time to patch.

Evaluation, calibration & safety

Brier score and log-loss tracking against resolved outcomes. Prompt-injection canaries with leak detection. Output schema validation. Jailbreak regression suites. Pre-deployment red-teaming. AI features without measurement are not features — they are vibes. We measure.

Spotlight · 3D-aware robotic vision

SPEAR-1, by INSAIT

INSAIT — the Institute for Computer Science, Artificial Intelligence and Technology, hosted by Sofia University in partnership with ETH Zürich and EPFL — released SPEAR-1 in October 2025 as Europe's first open robotic foundation model trained on 3D understanding (arXiv:2511.17411). The model is a Vision-Language-Action policy. Its key idea: rather than learn 3D scene structure from scarce, expensive robot demonstrations, SPEAR-VLM learns it during pretraining from roughly 45 million frames of mostly non-robotic data — by pairing a PaliGemma 3B vision-language backbone with the MoGe monocular geometric encoder and training on 3D visual-question-answering tasks. A Flow-Matching action expert then attends to the VLM's features and emits action chunks: delta end-effector translation, delta rotation, gripper state.

The 2D → 3D step lives inside the VLM, not as a separate perception module. SPEAR-1 reads monocular RGB and reasons about depth, occlusion and scene geometry directly — which is why it beats π0-FAST and π0.5 with roughly 20× fewer robot demonstrations and shows its biggest gains on fine-positioning tasks where 3D understanding is the bottleneck. Published checkpoints (huggingface.co/INSAIT-Institute/spear1-franka) target Franka Research 3 manipulators and WidowX rovers.

How Praemonitus uses it. SPEAR-1 is a per-robot manipulation policy, not a fleet coordinator — it lives inside one robot and outputs that robot's next action. Praemonitus is the layer above it: task allocation across a mixed fleet, scheduling, Bayesian risk scoring, mission-abort thresholds, operator-in-the-loop confirmation. For compatible embodiments (Franka, WidowX) Praemonitus dispatches manipulation sub-tasks to SPEAR-1 on the robot; for others it uses different per-robot policies through the same interface. We adopted SPEAR-1 because its 3D-from-2D pretraining matches the data regime of civilian operations — many cameras, few demonstrations — and because an open, auditable model is the only kind we are willing to put behind a calibrated decision wrapper.

Attribution. SPEAR-1 weights and code are distributed by INSAIT under the Gemma license inherited from PaliGemma; the SPEAR-1 paper is released CC BY 4.0. We use the published checkpoints under those terms; attribution and integration enquiries belong to INSAIT at contact@insait.ai. Project site: spear.insait.ai.

Security, in the floorboards

  • Memory-hard password hashing (Argon2id) with an HMAC pepper
  • Opaque session tokens in KV — no JWTs in cookies, no client-side trust
  • PKCE-protected OAuth challenges for Google and Apple sign-in
  • Encrypted token storage on Android (AES-256-GCM) and the iOS Keychain
  • Canary tokens to detect prompt-injection leakage from AI outputs
  • Comprehensive audit log: actor, IP hash, outcome, timestamp
  • CSP, X-Frame-Options: DENY, Permissions-Policy off by default
  • Rockify selfies auto-deleted after 30 days; on-demand erasure within 72 hours