SIS Architecture¶

Version: 0.2 — March 2026

1. Overview & Philosophy¶

SIS is a multi-tenant Student Information System for multi-level schools (kindergarten through high school). The architecture prioritizes developer velocity for a small team (2–4 devs) shipping a beta in 6 months, while preserving a clear migration path toward scale.

Architecture philosophy:

Monolith-first: Ship fast, split later. NestJS modules map 1:1 to future services.
Single language: TypeScript end-to-end reduces context-switching and maximizes code sharing.
Managed infrastructure: Zero DevOps overhead during MVP. Railway and Cloudflare handle operations.
Defense in depth: Service-layer tenant filtering at the API, scope-based guards for field access, action-level guards for operation access, scope-level filtering at the response layer. PostgreSQL RLS is the target architecture for an additional database-level safety net (not yet deployed — see §3.2).
Multitenancy from day one: Every table carries tenant_id; service-layer filtering enforces isolation. RLS policies are designed as a future safety net (no schema-per-tenant for now).

2. Decision Summary¶

Area	Decision	Key Reason
Backend	NestJS monolith + Prisma + PostgreSQL	Fastest dev speed, single language with frontend, modular
Multitenancy	`tenant_id` column + PostgreSQL RLS	Simpler ops than schema-per-tenant, single migration path
Authentication	Custom with Passport.js + JWT	Full control over Entity-Scope permission model, no per-user cost
Frontend	Micro frontends with React + Tailwind on Cloudflare Workers	Edge-served, independent deployments, global low latency
Storage	Cloudflare R2	S3-compatible, zero egress fees, pairs with Cloudflare edge
Queues/Jobs	BullMQ + Redis	Mature Node.js queue, cron/retry/priority support, Redis reusable for caching
Deployment	Railway (backend) + Cloudflare Workers (frontend)	Simplest PaaS, managed PostgreSQL, preview environments

For detailed technology comparisons and decision rationale, see docs/stack-analisys.md.

3. System Components¶

3.1 Backend — NestJS Monolith + Prisma + PostgreSQL¶

NestJS 11 with Prisma ORM on PostgreSQL. Modules (StudentsModule, TeachersModule, AttendanceModule) map directly to domain boundaries — each can be extracted into a microservice later without (too much) rewriting.

Guards and custom decorators (@RequireScopes(), @RequireAction()) integrate naturally with the Entity-Scope-Action permission model. Prisma provides type-safe database access generated from the schema with excellent migration tooling.

Trade-offs accepted:

Prisma has limitations with complex raw queries — mitigated by using $queryRaw for RLS policy setup and complex reporting queries.
Node.js memory management needs attention for large data exports — mitigated by streaming responses and offloading to BullMQ jobs.
Multitenancy is not built-in — implemented via service-layer tenantId filtering (RLS deferred to Phase 2).

3.2 Multitenancy — `tenant_id` + Row-Level Security¶

Current status: Only service-layer tenantId filtering is active. Every service method filters by tenantId in its where clause. No PostgreSQL RLS policies exist in the database yet. RLS is planned as a Phase 2 defense-in-depth safety net. The SQL patterns below show the target architecture, not current state.

Single shared schema with a tenant_id UUID column on every tenant-scoped table, enforced by PostgreSQL Row-Level Security (RLS) policies.

RLS setup pattern (target):

ALTER TABLE students ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON students
  USING (tenant_id = current_setting('app.current_tenant_id')::uuid);

SET LOCAL app.current_tenant_id = '<tenant-uuid>';

Tenant resolution at runtime:

The login flow uses a password-first two-step approach that eliminates Host header dependency:

POST /api/v1/auth/login { email, password }

Find all active User records matching email across all active tenants
Verify password against each with argon2.verify (in parallel)
Collect matches (user + tenant pairs where password is valid)

Matches	Response
0	`401` — "Invalid credentials" (generic, no enumeration)
1	Normal login: cookies set, return `{ user }`
2+	`200` with `{ requiresTenantSelection, tenants[], selectionToken }`

If multi-match, the frontend shows a tenant picker and completes login:

POST /api/v1/auth/login/select-tenant { selectionToken, tenantId }
→ cookies set, return { user }

The selectionToken is a short-lived (60s) JWT containing { sub: "tenant-selection", matchedUserIds: [...] }. The tenant list is only returned after password verification, so it does not leak email-tenant mappings.

After authentication, tenantId is embedded in the signed JWT — all subsequent requests use the JWT payload, not the Host header. Tenant status (ACTIVE/TRIAL) is validated at login and re-validated on every token refresh.

Reverse proxy (trust proxy): Configured in main.ts via app.set('trust proxy', 1) so that req.ip returns the client's real IP behind Railway/Cloudflare. Required for accurate rate limiting and IP logging.

Trade-offs accepted:

Weaker data isolation than schema-per-tenant — acceptable for an EdTech SaaS where tenants are schools (not competing businesses with adversarial threat models).
Must be disciplined: every new table needs tenant_id and an RLS policy. Code review checklist item.
Cross-tenant analytics queries require bypassing RLS (superuser or SET ROLE) — acceptable, handled by admin-only reporting service.

3.3 Authentication — Custom with Passport.js + JWT¶

Self-hosted authentication using Passport.js strategies with JWT access tokens and refresh tokens, stored in the application database.

Auth flow:

Login → find all active users matching email across active tenants
     → verify password against each with argon2 (in parallel)
     → 0 matches: 401 "Invalid credentials" (dummy argon2 for timing consistency)
     → 1 match:   auto-login → issue JWT + refresh token → set HttpOnly cookies
     → 2+ matches: return { requiresTenantSelection, tenants[], selectionToken (60s JWT) }

Select-Tenant → verify selectionToken → find user matching tenantId from pre-validated list
     → issue JWT + refresh token → set HttpOnly cookies

JWT payload: { sub: userId, tenantId, roles: [...], isPlatformAdmin }
On each request: JwtStrategy extracts token from cookie (or Bearer header),
  validates token, ScopeGuard checks permissions (platform admins bypass)
Rate limiting: 10 req/60s global, 5 req/60s on login + select-tenant (@nestjs/throttler)

Refresh → validate refresh token (SHA-256 hash lookup)
       → replay detection: if token already revoked → revoke entire family
       → validate tenant still ACTIVE/TRIAL, user still isActive
       → log IP change warnings → rotate token (revoke old, issue new in same family)

Trade-offs accepted:

Security is our responsibility: must implement password hashing (argon2), token rotation, and CSRF protection correctly. Mitigated by using battle-tested libraries (Passport.js, argon2, helmet).
More initial development time compared to integrating a managed auth service. Acceptable given the long-term benefits of full control over the Entity-Scope permission model.

3.4 Frontend — Micro Frontends on Cloudflare Workers¶

Micro frontend (MFE) architecture with React + Tailwind CSS, deployed on Cloudflare Workers/Pages for edge serving. An orchestrator shell handles shared concerns (auth, routing, layout). Individual MFEs map to domain modules.

Trade-offs accepted:

Higher initial complexity compared to a monolithic SPA. Mitigated by starting with 2-3 MFEs (shell + students + teachers) and expanding.
Shared state between MFEs requires careful design (event bus or shared store in the shell).
Cloudflare Workers have a V8 runtime (not Node.js) — MFEs are static React builds served from Workers, not SSR on Workers.

3.5 Storage — Cloudflare R2¶

Cloudflare R2 for all file storage (documents, photos, uploads). Zero egress fees, S3-compatible API (@aws-sdk/client-s3), and Cloudflare edge pairing. Large file uploads go directly to R2 via presigned URLs, bypassing the NestJS backend.

3.6 Queues & Background Jobs — BullMQ + Redis¶

BullMQ for job queues and scheduled tasks, backed by Redis. Redis serves dual purpose: BullMQ backend and caching layer (permission caches, tenant configs).

Key use cases: Substitute teacher access revocation (scheduled), email/SMS dispatch, document processing, attendance report generation, admission workflow reminders.

Note: Redis and BullMQ are planned Phase 2 infrastructure — not yet deployed.

3.7 Deployment — Railway (Backend) + Cloudflare Workers (Frontend)¶

Railway (backend): Git-push deployment, managed PostgreSQL and Redis, preview environments per PR, EU region available, usage-based pricing.

Cloudflare Workers/Pages (frontend): Global edge serving from 300+ locations, instant deploys from Git, generous free tier, pairs with R2 and CDN.

Trade-offs accepted:

Two providers to manage instead of one. Acceptable because each excels at its job — Railway for server workloads, Cloudflare for edge/static content.
Railway is a smaller provider than AWS/GCP. The migration path to any Docker-compatible platform is straightforward (NestJS runs in a standard Docker container).

4. Data Access — Direct Prisma, No Repository Layer¶

Services inject PrismaService directly. No repository pattern. No domain model layer.

Current data access pattern:

Controller → Service → PrismaService → PostgreSQL
                ↓
          toScopedResponse() → Response DTO

Services own business logic + data access. toScopedResponse() maps flat Prisma records to scope-grouped DTOs. FieldFilterInterceptor handles scope-level response filtering.

Tenant isolation is enforced by manually including tenantId in every Prisma where clause (RLS deferred to Phase 2 as a safety net — see §3.2).

Revisit triggers — introduce domain models and/or repositories per-module when:

Business rules don't map 1:1 to CRUD (e.g., enrollment workflow, grade promotion logic)
Cross-entity invariants appear (e.g., schedule conflict detection)
Aggregate roots coordinate multiple entities in a transaction
A module's query logic exceeds what fits cleanly in a service method

These should be adopted per bounded context, not as a codebase-wide mandate.

Trade-offs accepted:

Tenant filtering is repeated manually in every service method — accepted because RLS will be the long-term solution, and a utility helper introduces coupling for a temporary pattern.
Services mix business logic with data access — acceptable at current complexity level (5–10 methods per service). If a service grows beyond ~15 methods or contains complex orchestration, consider extracting a repository for that specific module.

5. Architecture Diagram¶

                            ┌─────────────────────────────────────────┐
                            │           CLOUDFLARE EDGE               │
                            │   CDN · WAF · DDoS Protection · DNS     │
                            └──────────────────┬──────────────────────┘
                                               │
                    ┌──────────────────────────┼──────────────────────────┐
                    │                          │                          │
                    ▼                          ▼                          ▼
    ┌───────────────────────┐  ┌───────────────────────┐  ┌───────────────────────┐
    │   CLOUDFLARE WORKERS  │  │       RAILWAY          │  │   CLOUDFLARE R2       │
    │                       │  │                        │  │                       │
    │  ┌─────────────────┐  │  │  ┌──────────────────┐  │  │  Documents            │
    │  │ Orchestrator    │  │  │  │   NestJS API     │  │  │  Photos               │
    │  │ Shell (Auth,    │  │  │  │                  │  │  │  Uploads               │
    │  │ Routing, Layout)│  │  │  │  Passport.js JWT │  │  │                       │
    │  └────────┬────────┘  │  │  │  Prisma ORM      │  │  │  (S3-compatible)      │
    │           │           │  │  │  BullMQ Workers   │  │  │                       │
    │  ┌────────┴────────┐  │  │  └────────┬─────────┘  │  └───────────────────────┘
    │  │   MFE: Students │  │  │           │            │
    │  │   MFE: Teachers │  │  │  ┌────────┴─────────┐  │
    │  │   MFE: Attend.  │  │  │  │   PostgreSQL     │  │
    │  │   MFE: Admiss.  │  │  │  │   (Managed)      │  │
    │  │   MFE: Comms    │  │  │  │                  │  │
    │  │   ...           │  │  │  │  · tenant_id RLS │  │
    │  └─────────────────┘  │  │  │  · UUID PKs      │  │
    │                       │  │  │  · JSONB fields   │  │
    │  React + Tailwind     │  │  └──────────────────┘  │
    └───────────────────────┘  │                        │
                               │  ┌──────────────────┐  │
                               │  │   Redis          │  │
                               │  │   (Managed)      │  │
                               │  │                  │  │
                               │  │  · BullMQ queues │  │
                               │  │  · Permission    │  │
                               │  │    cache         │  │
                               │  │  · Session store │  │
                               │  └──────────────────┘  │
                               └────────────────────────┘

Note: Redis and BullMQ are planned Phase 2 infrastructure — not yet deployed. The current system uses request-scoped permission memoization and has no background job processing.

Request flow:

User hits app.sis.example → Cloudflare DNS resolves to nearest edge
Static MFE assets served from Workers (cached at edge)
API calls go to api.sis.example → Cloudflare proxy → Railway backend
NestJS extracts JWT from access_token cookie (or Authorization: Bearer header), validates it, extracts tenantId
ScopeGuard checks route-level scope permissions; ActionGuard checks action permissions; services filter by tenantId in every query
File uploads go directly to R2 via presigned URLs (bypass backend)

6. Role-Permission Model (Summary)¶

Full specification: docs/rbac-strategy.md — schema, compilation flow, runtime enforcement, write protection, record-level access, caching, custom fields, and frontend patterns.

The permission model controls three orthogonal dimensions:

Dimension	Question	Mechanism
Field access	What can a user see/edit?	Entity-Scope permissions (this section)
Action access	What operations can a user perform?	Action permissions — binary grants with scope requirements
Record access	Which records?	Tenant isolation (service-layer `tenantId` filtering). Per-user record filtering deferred.

Core concept¶

Entity (e.g., "students")
  ├── Scope (e.g., "anagraphic")
  │    └── ScopeAccess: NONE | READ | WRITE (WRITE implies READ)
  │         └── covers fields: [firstName, lastName, dateOfBirth, ...]
  └── Action (e.g., "create")
       └── Binary grant (granted/not granted)
            └── scope requirements: [anagraphic→WRITE, sensitive→WRITE]

Scopes are meaningful business groupings — "anagraphic data", "sensitive data", "financial data" — that map to how schools think about data access. A school admin toggles ~10 scope permissions per role instead of managing 200+ individual field toggles.

Actions are operation-level permissions — "create", "delete", "export" — that are orthogonal to scope-level access. An action has scope requirements: the user needs both the action grant AND the required scope access for the action to be effective.

Request pipeline¶

Request
  → JwtAuthGuard              Authenticate, attach { userId, tenantId, roles }
  → ScopeGuard                Check @RequireScopes() entity-level access → 403 INSUFFICIENT_SCOPE if denied
  → ActionGuard               Check @RequireAction() metadata → 403 ACTION_NOT_PERMITTED if denied (opt-in)
  → FieldWriteGuard           Compare body scope keys against writable scopes → 403 FORBIDDEN_FIELDS
  → Controller → Service      Business logic, tenantId filtering in every query
  → FieldFilterInterceptor    Strip unauthorized scope groups from response (keeps id, createdAt, updatedAt)
  → Response

Platform admins (isPlatformAdmin) bypass ScopeGuard, ActionGuard, FieldWriteGuard, and FieldFilterInterceptor.

Key design points¶

Roles are per-tenant. Preset roles are seeded and immutable; school admins create custom roles via clone-and-modify.
Temporal assignments. user_roles.validFrom/validUntil support substitute teacher access windows — expired roles are excluded at query time.
Entity-level gate. @RequireScopes('students', 'write') checks if the user has ANY write scope on the entity — no scope enumeration needed. Real field-level enforcement is handled by FieldWriteGuard (writes) and FieldFilterInterceptor (reads). See rbac-strategy.md §3.7 for details.
Action permissions. @RequireAction('students', 'create') checks both the role-action grant AND the action's scope requirements. Opt-in per route — routes without the decorator skip the check. Effective action = granted AND all scope requirements satisfied.
Scope-grouped DTOs. API responses use scope-grouped shapes: { id, anagraphic: {...}, sensitive: {...}, createdAt, updatedAt }. Guards and interceptors operate at the scope-group level, not per-field. This aligns the API shape with the permission model.
Request-scoped memoization. Permissions are compiled once per request and cached on request.permissions. Redis caching is deferred to Phase 2.
Record-level access is currently tenant-wide (WHERE tenantId = ?). Per-user record filtering (teacher → their classes, parent → their children) is planned — see rbac-strategy.md §7.

7. Setup Wizard — Multi-Group State Machine¶

The tenant setup wizard drives first-time configuration. It uses a flat global state machine stored as a single setupStep enum on the Tenant model, combined with a group routing layer for frontend navigation.

State Machine¶

Linear progression through all steps:

SCHOOL → YEAR → DEPARTMENTS → GRADES → STUDENTS → TEACHERS → STAFF → CURRICULUM → TIMETABLE → PERMISSIONS → SERVICES → COMPLETE

Setup completion is derived from setupStep === COMPLETE (no separate timestamp).

Groups¶

Steps are organized into logical groups for the frontend. Groups are constant ranges — the overview API maps each group to its steps:

Group ID	Label	Required	Steps
`school-identity`	School Identity	Yes	SCHOOL, YEAR, DEPARTMENTS, GRADES
`people-import`	People Import	Yes	STUDENTS, TEACHERS, STAFF
`teaching-schedule`	Teaching & Schedule	No	CURRICULUM, TIMETABLE
`permissions-services`	Permissions & Services	No	PERMISSIONS, SERVICES

API Endpoints¶

All endpoints require JWT authentication only (no scope/action guards — admin is the only user during setup).

Method	Path	Description
`GET`	`/configure/setup/overview`	High-level overview of all groups with computed status (`NOT_STARTED`, `IN_PROGRESS`, `DONE`)
`GET`	`/configure/setup/:groupId`	Full wizard state (same response as before) — `:groupId` validated but not used for filtering
`POST`	`/configure/setup/:groupId`	Submit step data / navigate — `:groupId` validated but state machine logic is global

The :groupId param is validated via ParseGroupIdPipe (404 for unknown) but does not affect state machine logic. The frontend uses the overview to determine which group to route to. POST returns 200 (uses @HttpCode(200)).

Step Handlers¶

Steps with data have dedicated handlers that implement load(), save(), and isComplete(). Steps without handlers (COMPLETE, TEACHERS, STAFF, CURRICULUM, TIMETABLE, PERMISSIONS, SERVICES) return null data and advance freely. Handlers for new steps will be added as those features are implemented.

Step Completion Enforcement¶

Forward navigation requires the current step's handler to confirm completion via isComplete(). Each handler defines its own completion criteria (e.g., school record exists, academic year exists, all departments have grades, at least one student imported). The check runs after save() but before the step pointer advances — so data steps that save form data are automatically validated, while data-less steps (like STUDENTS, which imports via a separate endpoint) are still gated. Steps without handlers pass through freely. Same-step saves (drafts) and back-navigation do not check completion.

YEAR Step — Academic Year + Periods¶

The YEAR step uses named top-level keys: { academicYear, terms?, closingPeriods?, extraPeriods? }. Periods are optional arrays typed by period category. Each period type maps to the Period model's type field (TERM, CLOSING, EXTRA).

Validations: - Academic year: endDate > startDate - Each period: endDate > startDate, dates within academic year bounds - Same-type overlap: periods of the same type cannot overlap (CLOSING can overlap with TERM) - No duplicate period names across all types within the academic year

DEPARTMENTS Step — Business Rules¶

Validations: - Ordinal positions must be sequential starting from 1 - Department names must be unique (case-insensitive) within the academic year - P2002 unique constraint violation caught as belt-and-suspenders

GRADES Step — Nested Department Structure¶

The GRADES step uses a nested department structure: { departments: [{ id, grades: [...] }] }. Each department entry includes its UUID and a grades array. The GET response includes department metadata (name, ordinalPosition) alongside any persisted grades.

Validations: - All tenant departments must be represented (every department needs at least one grade) - Per-department: ordinal positions sequential from 1 - Per-department: grade names unique (case-insensitive) - All department IDs must belong to the tenant's academic year

Key Implementation Files¶

Constants: src/setup/constants/setup-steps.ts (enum + navigation helpers), src/setup/constants/setup-groups.ts (group registry)
Service: src/setup/setup.service.ts (state machine orchestration, getOverview())
Pipe: src/setup/pipes/parse-group-id.pipe.ts (validates :groupId param)
Controller: src/setup/setup.controller.ts (3 routes)
Step handlers: src/setup/step-handlers/ (per-step load/save logic)
DTOs: src/setup/dto/ (request/response shapes)

SIS Architecture¶

1. Overview & Philosophy¶

2. Decision Summary¶

3. System Components¶

3.1 Backend — NestJS Monolith + Prisma + PostgreSQL¶

3.2 Multitenancy — tenant_id + Row-Level Security¶

3.3 Authentication — Custom with Passport.js + JWT¶

3.4 Frontend — Micro Frontends on Cloudflare Workers¶

3.5 Storage — Cloudflare R2¶

3.6 Queues & Background Jobs — BullMQ + Redis¶

3.7 Deployment — Railway (Backend) + Cloudflare Workers (Frontend)¶

4. Data Access — Direct Prisma, No Repository Layer¶

5. Architecture Diagram¶

6. Role-Permission Model (Summary)¶

Core concept¶

Request pipeline¶

Key design points¶

7. Setup Wizard — Multi-Group State Machine¶

State Machine¶

Groups¶

API Endpoints¶

Step Handlers¶

Step Completion Enforcement¶

YEAR Step — Academic Year + Periods¶

DEPARTMENTS Step — Business Rules¶

GRADES Step — Nested Department Structure¶

Key Implementation Files¶

3.2 Multitenancy — `tenant_id` + Row-Level Security¶