Skip to content

Files Model — Document Storage and Lifecycle

Status: Proposed (Phase B follow-up to missing-fields-design §11.3, non-blocking) Date: 2026-04-27 Source user stories: US-13 Add New Student Record, US-17 Manage Referent Record, US-18 Manage Guardians, US-19 Add New Teacher or Staff Record (all in ClickUp doc 8cnxf2d-7815).


1. Problem

Five entities — Student, Teacher, Staff, Referent, Guardian — reference uploaded scans of identity documents (passport / identity card). The ClickUp doc "Grace Period — Required Fields by Entity" lists passport_file_id / identity_card_file_id as completeness-required fields and the documents XOR completeness rule depends on a non-empty file reference.

Phase A shipped the FK columns as interim string mocks (passportFileId String?, identityCardFileId String?) so the rest of the data-model lift could land without blocking on storage design. The mocks accept opaque tokens written by some upstream upload mechanism (TBD); the file payloads themselves are not handled by the SIS yet.

This spec designs the real Files subsystem — model, lifecycle, storage backend, signed-URL semantics, security — so the eight interim string columns can be retyped to UUID FKs and the upload/download endpoints land. Curation is not blocked on this spec (Phase D shipped with the file-mock fields treated as opaque tokens). What is blocked: real file uploads, signed downloads, virus scanning, orphan cleanup, and the FK constraint that prevents pointing at non-existent file rows.

2. Scope

In scope

  • New File Prisma model (one row per uploaded file blob) with metadata: filename, MIME type, byte size, content hash, storage backend identifier, storage key, virus-scan status, soft-delete timestamp, uploader / tenant context.
  • Retype of the interim columns: passportFileId String? → String? @db.Uuid + FK constraint. Same for identityCardFileId across Student, Teacher, Staff, Referent, Guardian.
  • New src/files/ module: upload endpoint (POST /files), download endpoint (GET /files/:id returning a redirect to a signed URL), metadata endpoint (GET /files/:id/meta), delete endpoint (soft delete).
  • Storage backend abstraction (FileStoragePort) with at least two transports: LocalFileStorage (dev) and S3FileStorage (prod). Selected by env var.
  • Virus-scan port (VirusScanPort) with at least two transports: NoopVirusScan (dev / tests) and ClamAVVirusScan (prod). Files start as PENDING_SCAN, transition to CLEAN or INFECTED. References to INFECTED files refuse to render in download responses.
  • Orphan-cleanup mechanism: when a referencing column is overwritten or NULLed via PATCH, the previous file row is soft-deleted (or hard-deleted after a grace period — see decisions).
  • Tenant isolation: every file storage key includes the tenantId prefix; cross-tenant download attempts return 404.
  • MIME / size validation at upload: configurable per usage (passport, idCard, …) — defaults are PDF + JPEG + PNG up to 10 MB.
  • RBAC: new files entity with create (= upload) and delete actions. Reading a file requires read access on the referencing scope (e.g. students.documents) — there is no separate files.read scope, because file content is governed by the entity that owns the reference.

Out of scope (explicitly deferred)

  • Multipart upload / resumable uploads. Single-shot multipart/form-data only in v1.
  • Image transforms / thumbnails. Files are stored as-is; the frontend renders inline (PDFs via <embed>, images via <img>). Thumbnails are a follow-up.
  • Public files / signed-URL CDN. All downloads go through SIS-signed URLs (short-TTL, single-use). No public buckets.
  • CSV / batch upload. Files are entered through per-entity UI flows. Batch ingestion is a follow-up.
  • Cross-entity file sharing. Each file row is owned by one referencing column on one entity row. No "use this file as both Student passport and Referent passport" — the second use creates a second File row with a separate upload (re-upload by the user).
  • Audit log integration. US-34 View Audit Trail will pick up file events when it ships. This spec defines the event-emission seam but not the audit consumer.
  • Per-tenant storage quota / metering. Add when a real customer hits a limit.

3. Decisions log

  1. Single File table, no per-entity inheritance. A File row is identified by its id; the reference (which entity, which field) lives on the referencing column. Polymorphic ownership is implicit, not modeled as a discriminated FK.
  2. Storage key schema: {tenantId}/{usage}/{fileId}.{ext}. usage is a stable enum drawn from the referencing field name family (passport, id-card, …). Renames are done via copy + delete to keep keys stable per file id.
  3. Hash on upload. Compute SHA-256 during the upload stream; store as contentHash. Used for de-duplication detection in logs (warn on duplicate uploads of the same hash, do not auto-link — see "no cross-entity sharing").
  4. Virus scan: async, status-gated download. Upload returns 201 { id, status: 'PENDING_SCAN' }. A background worker (or the inline scan transport in dev) flips status. GET /files/:id returns 425 Too Early while PENDING_SCAN, 200 redirect when CLEAN, 410 Gone when INFECTED (and triggers the orphan-cleanup hook on the referencing column).
  5. Soft delete with retention window. deletedAt DateTime? plus a daily cleanup job that hard-deletes rows + storage objects past retention (default 30 days). Admin override to hard-delete immediately on legal-hold scenarios is a follow-up.
  6. Reference change → cleanup hook. When a referencing column is PATCHed (passportFileId set to a new id or NULL), the previous file id is soft-deleted in the same transaction. The hook is one function per entity (mirroring invalidateByRecipient from invitations) called by the entity's update path.
  7. No public buckets / no presigned-PUT uploads in v1. All uploads stream through the SIS for hash + size + MIME validation. Presigned-PUT is a future optimization once the proxy bandwidth cost surfaces.
  8. Signed URL TTL: 60 seconds, single-use. Short window, server-side single-use enforcement via a one-time-token table (FileDownloadToken). Avoids long-lived links leaking into screenshots / shares.
  9. MIME enforcement at the server. Magic-byte sniff (via file-type lib) against the declared Content-Type and against the per-usage allowlist. Reject mismatches.
  10. Storage backend selected by env (FILE_STORAGE_TRANSPORT=local|s3). Mirrors MAIL_TRANSPORT. Local backend for dev/tests, S3 (or compatible — MinIO, R2) in stage/prod.

4. Data model

enum FileScanStatus {
  PENDING_SCAN
  CLEAN
  INFECTED
  SCAN_ERROR
}

enum FileUsage {
  PASSPORT
  IDENTITY_CARD
  // future: AVATAR, CV, CERTIFICATION, ...
}

/// Uploaded blob — one row per file. Referenced from Student / Teacher /
/// Staff / Referent / Guardian via {passport,identityCard}FileId. Cross-
/// entity sharing is not allowed: each row is owned by exactly one
/// referencing column.
model File {
  id            String         @id @default(uuid()) @db.Uuid
  tenantId      String         @map("tenant_id") @db.Uuid
  usage         FileUsage
  fileName      String         @map("file_name") @db.VarChar(255)
  mimeType      String         @map("mime_type") @db.VarChar(100)
  byteSize      Int            @map("byte_size")
  contentHash   String         @map("content_hash") @db.Char(64) /// SHA-256 hex
  storageKey    String         @map("storage_key") @db.VarChar(500) /// Backend-relative path
  scanStatus    FileScanStatus @default(PENDING_SCAN) @map("scan_status")
  uploadedById  String?        @map("uploaded_by_id") @db.Uuid
  uploadedAt    DateTime       @default(now()) @map("uploaded_at")
  deletedAt     DateTime?      @map("deleted_at")
  createdAt     DateTime       @default(now()) @map("created_at")
  updatedAt     DateTime       @updatedAt @map("updated_at")

  tenant     Tenant @relation(fields: [tenantId], references: [id], onDelete: Cascade)
  uploadedBy User?  @relation(fields: [uploadedById], references: [id], onDelete: SetNull)

  @@index([tenantId, scanStatus])
  @@index([tenantId, uploadedAt])
  @@index([contentHash])
  @@map("files")
}

/// Single-use token for `GET /files/:id` redirects. Inserted on signed-URL
/// issuance, deleted on redemption. Expired rows are reaped by a periodic
/// cleanup job.
model FileDownloadToken {
  id        String   @id @default(uuid()) @db.Uuid
  tenantId  String   @map("tenant_id") @db.Uuid
  fileId    String   @map("file_id") @db.Uuid
  tokenHash String   @unique @map("token_hash") /// SHA-256 of the URL token
  expiresAt DateTime @map("expires_at")
  redeemedAt DateTime? @map("redeemed_at")
  createdAt DateTime @default(now()) @map("created_at")

  @@index([tenantId, expiresAt])
  @@map("file_download_tokens")
}

Retype of existing referencing columns

// Student / Teacher / Staff / Referent / Guardian — each:
passportFileId     String? @map("passport_file_id") @db.Uuid
identityCardFileId String? @map("identity_card_file_id") @db.Uuid

passportFile     File? @relation("PassportFile",     fields: [passportFileId],     references: [id], onDelete: SetNull, name: "<Entity>PassportFile")
identityCardFile File? @relation("IdentityCardFile", fields: [identityCardFileId], references: [id], onDelete: SetNull, name: "<Entity>IdentityCardFile")

(Each entity gets a uniquely-named relation to avoid Prisma's "ambiguous relation" error since both columns target the same File model.)

5. RBAC

// ENTITIES:
{ key: 'files', label: 'Files', description: 'Uploaded documents and scans', sortOrder: <next> },

// SCOPES:
files: [
  { key: 'metadata', label: 'File Metadata', description: 'Filename, MIME, size, scan status', sortOrder: 1 },
],

// ACTIONS:
{ entityKey: 'files', key: 'create', label: 'Upload File',  description: 'Upload a file blob', sortOrder: 1 },
{ entityKey: 'files', key: 'delete', label: 'Delete File',  description: 'Soft-delete a file', sortOrder: 2 },

// ACTION_REQUIREMENTS:
'files.create': { scopeKeys: ['files.metadata'] },
'files.delete': { scopeKeys: ['files.metadata'] },

Read access is governed by the referencing scope. A caller can GET /files/:id only when they can read at least one referencing column. This is implemented as a service-side check that traverses back-pointers (or, for performance, a denormalized referrerScope column on File populated by the reference-write path). Decision deferred to implementation.

6. Module layout

src/files/
├── files.module.ts
├── files.controller.ts             # POST /files, GET /files/:id, GET /files/:id/meta, DELETE /files/:id
├── files.service.ts                # Orchestrator (storage + scan + DB + tokens)
├── files.queries.ts
├── files.swagger.ts
├── storage/
│   ├── file-storage.port.ts        # FileStoragePort interface (put / get / delete)
│   ├── local-file-storage.ts       # Disk-backed transport
│   └── s3-file-storage.ts          # AWS SDK v3 transport
├── virus-scan/
│   ├── virus-scan.port.ts          # VirusScanPort interface
│   ├── noop-virus-scan.ts          # Dev / tests
│   └── clamav-virus-scan.ts        # Prod
├── dto/
│   ├── upload-file-response.dto.ts
│   ├── file-metadata.dto.ts
│   └── ...
└── index.ts

7. Endpoints

Method Path Auth Description
POST /files @RequireAction(FILES, 'create') multipart/form-data: file + usage field. Streams to storage, computes hash, queues virus scan, returns { id, status }.
GET /files/:id/meta (referencing scope read) Returns FileMetadataDto — filename, MIME, size, status, uploadedAt.
GET /files/:id (referencing scope read) Issues a single-use signed download token, redirects (302) to GET /files/:id/blob?t=<token>.
GET /files/:id/blob?t=... token-only (no JWT) Streams the file blob if token is valid, single-use, and unexpired. Returns 425 if PENDING_SCAN, 410 if INFECTED or deletedAt != null.
DELETE /files/:id @RequireAction(FILES, 'delete') Soft-delete. Sets deletedAt. Storage object kept until retention expires.

8. Cleanup hooks

// In files.service.ts
async invalidateOnReferenceChange(
  tx: Prisma.TransactionClient,
  previousFileId: string | null,
  reason: 'replaced' | 'cleared' | 'entity_deleted',
): Promise<void> {
  if (!previousFileId) return;
  await tx.file.update({
    where: { id: previousFileId },
    data: { deletedAt: new Date() },
  });
}

Each entity's update / delete service path calls this hook inside its transaction:

// e.g. ReferentsService.update()
if (dto.documents?.passportFileId !== undefined) {
  await this.filesService.invalidateOnReferenceChange(
    tx,
    existing.passportFileId,
    dto.documents.passportFileId === null ? 'cleared' : 'replaced',
  );
}

A periodic job (BullMQ when adopted; cron-tab + Nest schedule for v1) scans for deletedAt < now - 30 days and hard-deletes both row and storage object.

9. Migration

Two migrations, in order:

Migration 1 — Files tables

CREATE TABLE "files" ( ... );
CREATE TABLE "file_download_tokens" ( ... );
CREATE INDEX ...;
ALTER TABLE "files" ADD CONSTRAINT "files_tenant_id_fkey" FOREIGN KEY ...;
ALTER TABLE "files" ADD CONSTRAINT "files_uploaded_by_id_fkey" FOREIGN KEY ...;

Pure additions; safe.

Migration 2 — Retype existing FK columns to UUID

This is the load-bearing migration and must run when the interim columns are empty (or backfilled to all-NULL) — the cast from arbitrary String? to String? @db.Uuid will fail on any non-UUID value. Pre-flight:

-- Check for non-NULL, non-UUID values
SELECT 'students'   AS table, id, passport_file_id   FROM students   WHERE passport_file_id   IS NOT NULL AND passport_file_id   !~ '^[0-9a-f]{8}-...';
-- ... repeat per (entity × column)

If any row matches, NULL it out (data is interim mock anyway) before running the migration. Hazard checklist:

  • ALTER COLUMN ... TYPE UUID USING passport_file_id::uuid — fails on non-UUID values. Pre-flight protects against it. (Hazard #4.)
  • ADD CONSTRAINT FOREIGN KEY — fails on orphan values. The File table is empty when this runs, so any non-NULL referencing value is an orphan. Pre-flight NULL-out is mandatory. (Hazard #7.)

Recommended approach: ship Migration 2 with the SQL pre-flight inlined as a UPDATE ... SET passport_file_id = NULL WHERE passport_file_id IS NOT NULL; for every entity × column pair in the same migration. Documents this is a clean cutover from mocks to real FKs.

10. Testing

Suite Coverage
Unit (storage ports) Local backend round-trip; S3 backend mocked with aws-sdk-client-mock.
Unit (virus-scan ports) Noop transitions to CLEAN; ClamAV mocked sockets; INFECTED → reference-side cleanup hook fires.
Unit (FilesService) Upload happy path; oversized rejection; MIME mismatch rejection; orphan cleanup on reference change; signed-token issuance / redemption / expiry / single-use.
Integration E2E upload-then-attach-to-Student-then-replace, asserting old File row goes to soft-deleted.
Migration Pre-flight NULL-out idempotence; UUID cast on the empty / sanitized state; FK constraint accepts only valid file ids post-cutover.

11. Open questions

  • Storage prefix per environment. Dev / stage / prod buckets share a single AWS account or separate? Affects IAM policy shape — defer to infra spec.
  • Virus-scan SLA. ClamAV inline (synchronous, latency cost on upload) vs queued (async, status flip happens later)? Default to queued; revisit if files commonly sit PENDING_SCAN long enough to surface UX complaints.
  • Hard delete on entity removal. When a Student / Teacher / Staff / Referent / Guardian is hard-deleted (where supported), should referenced files be soft-deleted (consistent with reference-replacement) or hard-deleted immediately (legal compliance)? Default to soft-delete; flag for legal review before prod.
  • Quota / abuse. No per-tenant upload quota in v1. If abusive uploads become a problem, surface a MAX_FILES_PER_DAY env var and reject 429.

12. Cross-references

  • docs/superpowers/specs/2026-04-27-missing-fields-design.md §11.2 (file-mock invariant), §11.3 ("Files model — non-blocking").
  • docs/superpowers/specs/2026-04-27-guardian-model-design.md §4 (Guardian carries the same file-mock columns).
  • docs/12-migrations.md for the hazard checklist (Migration 2 hits hazard #4 + #7).
  • docs/04-rbac.md for entity / scope / action conventions.
  • The invitations spec (2026-04-20-invitations-design.md) for the precedent of port-based external integrations (MailerPortFileStoragePort / VirusScanPort).