Skip to content

SIS Architecture

Version: 0.2 — March 2026


1. Overview & Philosophy

SIS is a multi-tenant Student Information System for multi-level schools (kindergarten through high school). The architecture prioritizes developer velocity for a small team (2–4 devs) shipping a beta in 6 months, while preserving a clear migration path toward scale.

Architecture philosophy:

  • Monolith-first: Ship fast, split later. NestJS modules map 1:1 to future services.
  • Single language: TypeScript end-to-end reduces context-switching and maximizes code sharing.
  • Managed infrastructure: Zero DevOps overhead during MVP. Railway and Cloudflare handle operations.
  • Defense in depth: Service-layer tenant filtering at the API, scope-based guards for field access, action-level guards for operation access, scope-level filtering at the response layer. PostgreSQL RLS is the target architecture for an additional database-level safety net (not yet deployed — see §3.2).
  • Multitenancy from day one: Every table carries tenant_id; service-layer filtering enforces isolation. RLS policies are designed as a future safety net (no schema-per-tenant for now).

2. Decision Summary

Area Decision Key Reason
Backend NestJS monolith + Prisma + PostgreSQL Fastest dev speed, single language with frontend, modular
Multitenancy tenant_id column + PostgreSQL RLS Simpler ops than schema-per-tenant, single migration path
Authentication Custom with Passport.js + JWT Full control over Entity-Scope permission model, no per-user cost
Frontend Micro frontends with React + Tailwind on Cloudflare Workers Edge-served, independent deployments, global low latency
Storage Cloudflare R2 S3-compatible, zero egress fees, pairs with Cloudflare edge
Queues/Jobs BullMQ + Redis Mature Node.js queue, cron/retry/priority support, Redis reusable for caching
Deployment Railway (backend) + Cloudflare Workers (frontend) Simplest PaaS, managed PostgreSQL, preview environments

For detailed technology comparisons and decision rationale, see docs/stack-analisys.md.


3. System Components

3.1 Backend — NestJS Monolith + Prisma + PostgreSQL

NestJS 11 with Prisma ORM on PostgreSQL. Modules (StudentsModule, TeachersModule, AttendanceModule) map directly to domain boundaries — each can be extracted into a microservice later without (too much) rewriting.

Guards and custom decorators (@RequireScopes(), @RequireAction()) integrate naturally with the Entity-Scope-Action permission model. Prisma provides type-safe database access generated from the schema with excellent migration tooling.

Trade-offs accepted:

  • Prisma has limitations with complex raw queries — mitigated by using $queryRaw for RLS policy setup and complex reporting queries.
  • Node.js memory management needs attention for large data exports — mitigated by streaming responses and offloading to BullMQ jobs.
  • Multitenancy is not built-in — implemented via service-layer tenantId filtering (RLS deferred to Phase 2).

3.2 Multitenancy — tenant_id + Row-Level Security

Current status: Only service-layer tenantId filtering is active. Every service method filters by tenantId in its where clause. No PostgreSQL RLS policies exist in the database yet. RLS is planned as a Phase 2 defense-in-depth safety net. The SQL patterns below show the target architecture, not current state.

Single shared schema with a tenant_id UUID column on every tenant-scoped table, enforced by PostgreSQL Row-Level Security (RLS) policies.

RLS setup pattern (target):

ALTER TABLE students ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON students
  USING (tenant_id = current_setting('app.current_tenant_id')::uuid);

SET LOCAL app.current_tenant_id = '<tenant-uuid>';

Tenant resolution at runtime:

The login flow uses a password-first two-step approach that eliminates Host header dependency:

POST /api/v1/auth/login { email, password }
  1. Find all active User records matching email across all active tenants
  2. Verify password against each with argon2.verify (in parallel)
  3. Collect matches (user + tenant pairs where password is valid)
Matches Response
0 401 — "Invalid credentials" (generic, no enumeration)
1 Normal login: cookies set, return { user }
2+ 200 with { requiresTenantSelection, tenants[], selectionToken }

If multi-match, the frontend shows a tenant picker and completes login:

POST /api/v1/auth/login/select-tenant { selectionToken, tenantId }
→ cookies set, return { user }

The selectionToken is a short-lived (60s) JWT containing { sub: "tenant-selection", matchedUserIds: [...] }. The tenant list is only returned after password verification, so it does not leak email-tenant mappings.

After authentication, tenantId is embedded in the signed JWT — all subsequent requests use the JWT payload, not the Host header. Tenant status (ACTIVE/TRIAL) is validated at login and re-validated on every token refresh.

Reverse proxy (trust proxy): Configured in main.ts via app.set('trust proxy', 1) so that req.ip returns the client's real IP behind Railway/Cloudflare. Required for accurate rate limiting and IP logging.

Trade-offs accepted:

  • Weaker data isolation than schema-per-tenant — acceptable for an EdTech SaaS where tenants are schools (not competing businesses with adversarial threat models).
  • Must be disciplined: every new table needs tenant_id and an RLS policy. Code review checklist item.
  • Cross-tenant analytics queries require bypassing RLS (superuser or SET ROLE) — acceptable, handled by admin-only reporting service.

3.3 Authentication — Custom with Passport.js + JWT

Self-hosted authentication using Passport.js strategies with JWT access tokens and refresh tokens, stored in the application database.

Auth flow:

Login → find all active users matching email across active tenants
     → verify password against each with argon2 (in parallel)
     → 0 matches: 401 "Invalid credentials" (dummy argon2 for timing consistency)
     → 1 match:   auto-login → issue JWT + refresh token → set HttpOnly cookies
     → 2+ matches: return { requiresTenantSelection, tenants[], selectionToken (60s JWT) }

Select-Tenant → verify selectionToken → find user matching tenantId from pre-validated list
     → issue JWT + refresh token → set HttpOnly cookies

JWT payload: { sub: userId, tenantId, roles: [...], isPlatformAdmin }
On each request: JwtStrategy extracts token from cookie (or Bearer header),
  validates token, ScopeGuard checks permissions (platform admins bypass)
Rate limiting: 10 req/60s global, 5 req/60s on login + select-tenant (@nestjs/throttler)

Refresh → validate refresh token (SHA-256 hash lookup)
       → replay detection: if token already revoked → revoke entire family
       → validate tenant still ACTIVE/TRIAL, user still isActive
       → log IP change warnings → rotate token (revoke old, issue new in same family)

Trade-offs accepted:

  • Security is our responsibility: must implement password hashing (argon2), token rotation, and CSRF protection correctly. Mitigated by using battle-tested libraries (Passport.js, argon2, helmet).
  • More initial development time compared to integrating a managed auth service. Acceptable given the long-term benefits of full control over the Entity-Scope permission model.

3.4 Frontend — Micro Frontends on Cloudflare Workers

Micro frontend (MFE) architecture with React + Tailwind CSS, deployed on Cloudflare Workers/Pages for edge serving. An orchestrator shell handles shared concerns (auth, routing, layout). Individual MFEs map to domain modules.

Trade-offs accepted:

  • Higher initial complexity compared to a monolithic SPA. Mitigated by starting with 2-3 MFEs (shell + students + teachers) and expanding.
  • Shared state between MFEs requires careful design (event bus or shared store in the shell).
  • Cloudflare Workers have a V8 runtime (not Node.js) — MFEs are static React builds served from Workers, not SSR on Workers.

3.5 Storage — Cloudflare R2

Cloudflare R2 for all file storage (documents, photos, uploads). Zero egress fees, S3-compatible API (@aws-sdk/client-s3), and Cloudflare edge pairing. Large file uploads go directly to R2 via presigned URLs, bypassing the NestJS backend.

3.6 Queues & Background Jobs — BullMQ + Redis

BullMQ for job queues and scheduled tasks, backed by Redis. Redis serves dual purpose: BullMQ backend and caching layer (permission caches, tenant configs).

Key use cases: Substitute teacher access revocation (scheduled), email/SMS dispatch, document processing, attendance report generation, admission workflow reminders.

Note: Redis and BullMQ are planned Phase 2 infrastructure — not yet deployed.

3.7 Deployment — Railway (Backend) + Cloudflare Workers (Frontend)

Railway (backend): Git-push deployment, managed PostgreSQL and Redis, preview environments per PR, EU region available, usage-based pricing.

Cloudflare Workers/Pages (frontend): Global edge serving from 300+ locations, instant deploys from Git, generous free tier, pairs with R2 and CDN.

Trade-offs accepted:

  • Two providers to manage instead of one. Acceptable because each excels at its job — Railway for server workloads, Cloudflare for edge/static content.
  • Railway is a smaller provider than AWS/GCP. The migration path to any Docker-compatible platform is straightforward (NestJS runs in a standard Docker container).

4. Data Access — Direct Prisma, No Repository Layer

Services inject PrismaService directly. No repository pattern. No domain model layer.

Current data access pattern:

Controller → Service → PrismaService → PostgreSQL
          toScopedResponse() → Response DTO

Services own business logic + data access. toScopedResponse() maps flat Prisma records to scope-grouped DTOs. FieldFilterInterceptor handles scope-level response filtering.

Tenant isolation is enforced by manually including tenantId in every Prisma where clause (RLS deferred to Phase 2 as a safety net — see §3.2).

Revisit triggers — introduce domain models and/or repositories per-module when:

  • Business rules don't map 1:1 to CRUD (e.g., enrollment workflow, grade promotion logic)
  • Cross-entity invariants appear (e.g., schedule conflict detection)
  • Aggregate roots coordinate multiple entities in a transaction
  • A module's query logic exceeds what fits cleanly in a service method

These should be adopted per bounded context, not as a codebase-wide mandate.

Trade-offs accepted:

  • Tenant filtering is repeated manually in every service method — accepted because RLS will be the long-term solution, and a utility helper introduces coupling for a temporary pattern.
  • Services mix business logic with data access — acceptable at current complexity level (5–10 methods per service). If a service grows beyond ~15 methods or contains complex orchestration, consider extracting a repository for that specific module.

5. Architecture Diagram

                            ┌─────────────────────────────────────────┐
                            │           CLOUDFLARE EDGE               │
                            │   CDN · WAF · DDoS Protection · DNS     │
                            └──────────────────┬──────────────────────┘
                    ┌──────────────────────────┼──────────────────────────┐
                    │                          │                          │
                    ▼                          ▼                          ▼
    ┌───────────────────────┐  ┌───────────────────────┐  ┌───────────────────────┐
    │   CLOUDFLARE WORKERS  │  │       RAILWAY          │  │   CLOUDFLARE R2       │
    │                       │  │                        │  │                       │
    │  ┌─────────────────┐  │  │  ┌──────────────────┐  │  │  Documents            │
    │  │ Orchestrator    │  │  │  │   NestJS API     │  │  │  Photos               │
    │  │ Shell (Auth,    │  │  │  │                  │  │  │  Uploads               │
    │  │ Routing, Layout)│  │  │  │  Passport.js JWT │  │  │                       │
    │  └────────┬────────┘  │  │  │  Prisma ORM      │  │  │  (S3-compatible)      │
    │           │           │  │  │  BullMQ Workers   │  │  │                       │
    │  ┌────────┴────────┐  │  │  └────────┬─────────┘  │  └───────────────────────┘
    │  │   MFE: Students │  │  │           │            │
    │  │   MFE: Teachers │  │  │  ┌────────┴─────────┐  │
    │  │   MFE: Attend.  │  │  │  │   PostgreSQL     │  │
    │  │   MFE: Admiss.  │  │  │  │   (Managed)      │  │
    │  │   MFE: Comms    │  │  │  │                  │  │
    │  │   ...           │  │  │  │  · tenant_id RLS │  │
    │  └─────────────────┘  │  │  │  · UUID PKs      │  │
    │                       │  │  │  · JSONB fields   │  │
    │  React + Tailwind     │  │  └──────────────────┘  │
    └───────────────────────┘  │                        │
                               │  ┌──────────────────┐  │
                               │  │   Redis          │  │
                               │  │   (Managed)      │  │
                               │  │                  │  │
                               │  │  · BullMQ queues │  │
                               │  │  · Permission    │  │
                               │  │    cache         │  │
                               │  │  · Session store │  │
                               │  └──────────────────┘  │
                               └────────────────────────┘

Note: Redis and BullMQ are planned Phase 2 infrastructure — not yet deployed. The current system uses request-scoped permission memoization and has no background job processing.

Request flow:

  1. User hits app.sis.example → Cloudflare DNS resolves to nearest edge
  2. Static MFE assets served from Workers (cached at edge)
  3. API calls go to api.sis.example → Cloudflare proxy → Railway backend
  4. NestJS extracts JWT from access_token cookie (or Authorization: Bearer header), validates it, extracts tenantId
  5. ScopeGuard checks route-level scope permissions; ActionGuard checks action permissions; services filter by tenantId in every query
  6. File uploads go directly to R2 via presigned URLs (bypass backend)

6. Role-Permission Model (Summary)

Full specification: docs/rbac-strategy.md — schema, compilation flow, runtime enforcement, write protection, record-level access, caching, custom fields, and frontend patterns.

The permission model controls three orthogonal dimensions:

Dimension Question Mechanism
Field access What can a user see/edit? Entity-Scope permissions (this section)
Action access What operations can a user perform? Action permissions — binary grants with scope requirements
Record access Which records? Tenant isolation (service-layer tenantId filtering). Per-user record filtering deferred.

Core concept

Entity (e.g., "students")
  ├── Scope (e.g., "anagraphic")
  │    └── ScopeAccess: NONE | READ | WRITE (WRITE implies READ)
  │         └── covers fields: [firstName, lastName, dateOfBirth, ...]
  └── Action (e.g., "create")
       └── Binary grant (granted/not granted)
            └── scope requirements: [anagraphic→WRITE, sensitive→WRITE]

Scopes are meaningful business groupings — "anagraphic data", "sensitive data", "financial data" — that map to how schools think about data access. A school admin toggles ~10 scope permissions per role instead of managing 200+ individual field toggles.

Actions are operation-level permissions — "create", "delete", "export" — that are orthogonal to scope-level access. An action has scope requirements: the user needs both the action grant AND the required scope access for the action to be effective.

Request pipeline

Request
  → JwtAuthGuard              Authenticate, attach { userId, tenantId, roles }
  → ScopeGuard                Check @RequireScopes() entity-level access → 403 INSUFFICIENT_SCOPE if denied
  → ActionGuard               Check @RequireAction() metadata → 403 ACTION_NOT_PERMITTED if denied (opt-in)
  → FieldWriteGuard           Compare body scope keys against writable scopes → 403 FORBIDDEN_FIELDS
  → Controller → Service      Business logic, tenantId filtering in every query
  → FieldFilterInterceptor    Strip unauthorized scope groups from response (keeps id, createdAt, updatedAt)
  → Response

Platform admins (isPlatformAdmin) bypass ScopeGuard, ActionGuard, FieldWriteGuard, and FieldFilterInterceptor.

Key design points

  • Roles are per-tenant. Preset roles are seeded and immutable; school admins create custom roles via clone-and-modify.
  • Temporal assignments. user_roles.validFrom/validUntil support substitute teacher access windows — expired roles are excluded at query time.
  • Entity-level gate. @RequireScopes('students', 'write') checks if the user has ANY write scope on the entity — no scope enumeration needed. Real field-level enforcement is handled by FieldWriteGuard (writes) and FieldFilterInterceptor (reads). See rbac-strategy.md §3.7 for details.
  • Action permissions. @RequireAction('students', 'create') checks both the role-action grant AND the action's scope requirements. Opt-in per route — routes without the decorator skip the check. Effective action = granted AND all scope requirements satisfied.
  • Scope-grouped DTOs. API responses use scope-grouped shapes: { id, anagraphic: {...}, sensitive: {...}, createdAt, updatedAt }. Guards and interceptors operate at the scope-group level, not per-field. This aligns the API shape with the permission model.
  • Request-scoped memoization. Permissions are compiled once per request and cached on request.permissions. Redis caching is deferred to Phase 2.
  • Record-level access is currently tenant-wide (WHERE tenantId = ?). Per-user record filtering (teacher → their classes, parent → their children) is planned — see rbac-strategy.md §7.

7. Setup Wizard — Multi-Group State Machine

The tenant setup wizard drives first-time configuration. It uses a flat global state machine stored as a single setupStep enum on the Tenant model, combined with a group routing layer for frontend navigation.

State Machine

Linear progression through all steps:

SCHOOL → YEAR → DEPARTMENTS → GRADES → STUDENTS → TEACHERS → STAFF → CURRICULUM → TIMETABLE → PERMISSIONS → SERVICES → COMPLETE

Setup completion is derived from setupStep === COMPLETE (no separate timestamp).

Groups

Steps are organized into logical groups for the frontend. Groups are constant ranges — the overview API maps each group to its steps:

Group ID Label Required Steps
school-identity School Identity Yes SCHOOL, YEAR, DEPARTMENTS, GRADES
people-import People Import Yes STUDENTS, TEACHERS, STAFF
teaching-schedule Teaching & Schedule No CURRICULUM, TIMETABLE
permissions-services Permissions & Services No PERMISSIONS, SERVICES

API Endpoints

All endpoints require JWT authentication only (no scope/action guards — admin is the only user during setup).

Method Path Description
GET /configure/setup/overview High-level overview of all groups with computed status (NOT_STARTED, IN_PROGRESS, DONE)
GET /configure/setup/:groupId Full wizard state (same response as before) — :groupId validated but not used for filtering
POST /configure/setup/:groupId Submit step data / navigate — :groupId validated but state machine logic is global

The :groupId param is validated via ParseGroupIdPipe (404 for unknown) but does not affect state machine logic. The frontend uses the overview to determine which group to route to. POST returns 200 (uses @HttpCode(200)).

Step Handlers

Steps with data have dedicated handlers that implement load(), save(), and isComplete(). Steps without handlers (COMPLETE, TEACHERS, STAFF, CURRICULUM, TIMETABLE, PERMISSIONS, SERVICES) return null data and advance freely. Handlers for new steps will be added as those features are implemented.

Step Completion Enforcement

Forward navigation requires the current step's handler to confirm completion via isComplete(). Each handler defines its own completion criteria (e.g., school record exists, academic year exists, all departments have grades, at least one student imported). The check runs after save() but before the step pointer advances — so data steps that save form data are automatically validated, while data-less steps (like STUDENTS, which imports via a separate endpoint) are still gated. Steps without handlers pass through freely. Same-step saves (drafts) and back-navigation do not check completion.

YEAR Step — Academic Year + Periods

The YEAR step uses named top-level keys: { academicYear, terms?, closingPeriods?, extraPeriods? }. Periods are optional arrays typed by period category. Each period type maps to the Period model's type field (TERM, CLOSING, EXTRA).

Validations: - Academic year: endDate > startDate - Each period: endDate > startDate, dates within academic year bounds - Same-type overlap: periods of the same type cannot overlap (CLOSING can overlap with TERM) - No duplicate period names across all types within the academic year

DEPARTMENTS Step — Business Rules

Validations: - Ordinal positions must be sequential starting from 1 - Department names must be unique (case-insensitive) within the academic year - P2002 unique constraint violation caught as belt-and-suspenders

GRADES Step — Nested Department Structure

The GRADES step uses a nested department structure: { departments: [{ id, grades: [...] }] }. Each department entry includes its UUID and a grades array. The GET response includes department metadata (name, ordinalPosition) alongside any persisted grades.

Validations: - All tenant departments must be represented (every department needs at least one grade) - Per-department: ordinal positions sequential from 1 - Per-department: grade names unique (case-insensitive) - All department IDs must belong to the tenant's academic year

Key Implementation Files

  • Constants: src/setup/constants/setup-steps.ts (enum + navigation helpers), src/setup/constants/setup-groups.ts (group registry)
  • Service: src/setup/setup.service.ts (state machine orchestration, getOverview())
  • Pipe: src/setup/pipes/parse-group-id.pipe.ts (validates :groupId param)
  • Controller: src/setup/setup.controller.ts (3 routes)
  • Step handlers: src/setup/step-handlers/ (per-step load/save logic)
  • DTOs: src/setup/dto/ (request/response shapes)