Files Model — Document Storage and Lifecycle¶
Status: Proposed (Phase B follow-up to missing-fields-design §11.3, non-blocking)
Date: 2026-04-27
Source user stories: US-13 Add New Student Record, US-17 Manage Referent Record, US-18 Manage Guardians, US-19 Add New Teacher or Staff Record (all in ClickUp doc 8cnxf2d-7815).
1. Problem¶
Five entities — Student, Teacher, Staff, Referent, Guardian — reference uploaded scans of identity documents (passport / identity card). The ClickUp doc "Grace Period — Required Fields by Entity" lists passport_file_id / identity_card_file_id as completeness-required fields and the documents XOR completeness rule depends on a non-empty file reference.
Phase A shipped the FK columns as interim string mocks (passportFileId String?, identityCardFileId String?) so the rest of the data-model lift could land without blocking on storage design. The mocks accept opaque tokens written by some upstream upload mechanism (TBD); the file payloads themselves are not handled by the SIS yet.
This spec designs the real Files subsystem — model, lifecycle, storage backend, signed-URL semantics, security — so the eight interim string columns can be retyped to UUID FKs and the upload/download endpoints land. Curation is not blocked on this spec (Phase D shipped with the file-mock fields treated as opaque tokens). What is blocked: real file uploads, signed downloads, virus scanning, orphan cleanup, and the FK constraint that prevents pointing at non-existent file rows.
2. Scope¶
In scope¶
- New
FilePrisma model (one row per uploaded file blob) with metadata: filename, MIME type, byte size, content hash, storage backend identifier, storage key, virus-scan status, soft-delete timestamp, uploader / tenant context. - Retype of the interim columns:
passportFileId String? → String? @db.Uuid+ FK constraint. Same foridentityCardFileIdacross Student, Teacher, Staff, Referent, Guardian. - New
src/files/module: upload endpoint (POST /files), download endpoint (GET /files/:idreturning a redirect to a signed URL), metadata endpoint (GET /files/:id/meta), delete endpoint (soft delete). - Storage backend abstraction (
FileStoragePort) with at least two transports:LocalFileStorage(dev) andS3FileStorage(prod). Selected by env var. - Virus-scan port (
VirusScanPort) with at least two transports:NoopVirusScan(dev / tests) andClamAVVirusScan(prod). Files start asPENDING_SCAN, transition toCLEANorINFECTED. References toINFECTEDfiles refuse to render in download responses. - Orphan-cleanup mechanism: when a referencing column is overwritten or NULLed via PATCH, the previous file row is soft-deleted (or hard-deleted after a grace period — see decisions).
- Tenant isolation: every file storage key includes the
tenantIdprefix; cross-tenant download attempts return 404. - MIME / size validation at upload: configurable per
usage(passport, idCard, …) — defaults are PDF + JPEG + PNG up to 10 MB. - RBAC: new
filesentity withcreate(= upload) anddeleteactions. Reading a file requires read access on the referencing scope (e.g.students.documents) — there is no separatefiles.readscope, because file content is governed by the entity that owns the reference.
Out of scope (explicitly deferred)¶
- Multipart upload / resumable uploads. Single-shot
multipart/form-dataonly in v1. - Image transforms / thumbnails. Files are stored as-is; the frontend renders inline (PDFs via
<embed>, images via<img>). Thumbnails are a follow-up. - Public files / signed-URL CDN. All downloads go through SIS-signed URLs (short-TTL, single-use). No public buckets.
- CSV / batch upload. Files are entered through per-entity UI flows. Batch ingestion is a follow-up.
- Cross-entity file sharing. Each file row is owned by one referencing column on one entity row. No "use this file as both Student passport and Referent passport" — the second use creates a second
Filerow with a separate upload (re-upload by the user). - Audit log integration. US-34 View Audit Trail will pick up file events when it ships. This spec defines the event-emission seam but not the audit consumer.
- Per-tenant storage quota / metering. Add when a real customer hits a limit.
3. Decisions log¶
- Single
Filetable, no per-entity inheritance. AFilerow is identified by itsid; the reference (which entity, which field) lives on the referencing column. Polymorphic ownership is implicit, not modeled as a discriminated FK. - Storage key schema:
{tenantId}/{usage}/{fileId}.{ext}.usageis a stable enum drawn from the referencing field name family (passport,id-card, …). Renames are done via copy + delete to keep keys stable per file id. - Hash on upload. Compute SHA-256 during the upload stream; store as
contentHash. Used for de-duplication detection in logs (warn on duplicate uploads of the same hash, do not auto-link — see "no cross-entity sharing"). - Virus scan: async, status-gated download. Upload returns
201 { id, status: 'PENDING_SCAN' }. A background worker (or the inline scan transport in dev) flips status.GET /files/:idreturns 425 Too Early whilePENDING_SCAN, 200 redirect whenCLEAN, 410 Gone whenINFECTED(and triggers the orphan-cleanup hook on the referencing column). - Soft delete with retention window.
deletedAt DateTime?plus a daily cleanup job that hard-deletes rows + storage objects past retention (default 30 days). Admin override to hard-delete immediately on legal-hold scenarios is a follow-up. - Reference change → cleanup hook. When a referencing column is PATCHed (
passportFileIdset to a new id or NULL), the previous file id is soft-deleted in the same transaction. The hook is one function per entity (mirroringinvalidateByRecipientfrom invitations) called by the entity's update path. - No public buckets / no presigned-PUT uploads in v1. All uploads stream through the SIS for hash + size + MIME validation. Presigned-PUT is a future optimization once the proxy bandwidth cost surfaces.
- Signed URL TTL: 60 seconds, single-use. Short window, server-side single-use enforcement via a one-time-token table (
FileDownloadToken). Avoids long-lived links leaking into screenshots / shares. - MIME enforcement at the server. Magic-byte sniff (via
file-typelib) against the declaredContent-Typeand against the per-usageallowlist. Reject mismatches. - Storage backend selected by env (
FILE_STORAGE_TRANSPORT=local|s3). MirrorsMAIL_TRANSPORT. Local backend for dev/tests, S3 (or compatible — MinIO, R2) in stage/prod.
4. Data model¶
enum FileScanStatus {
PENDING_SCAN
CLEAN
INFECTED
SCAN_ERROR
}
enum FileUsage {
PASSPORT
IDENTITY_CARD
// future: AVATAR, CV, CERTIFICATION, ...
}
/// Uploaded blob — one row per file. Referenced from Student / Teacher /
/// Staff / Referent / Guardian via {passport,identityCard}FileId. Cross-
/// entity sharing is not allowed: each row is owned by exactly one
/// referencing column.
model File {
id String @id @default(uuid()) @db.Uuid
tenantId String @map("tenant_id") @db.Uuid
usage FileUsage
fileName String @map("file_name") @db.VarChar(255)
mimeType String @map("mime_type") @db.VarChar(100)
byteSize Int @map("byte_size")
contentHash String @map("content_hash") @db.Char(64) /// SHA-256 hex
storageKey String @map("storage_key") @db.VarChar(500) /// Backend-relative path
scanStatus FileScanStatus @default(PENDING_SCAN) @map("scan_status")
uploadedById String? @map("uploaded_by_id") @db.Uuid
uploadedAt DateTime @default(now()) @map("uploaded_at")
deletedAt DateTime? @map("deleted_at")
createdAt DateTime @default(now()) @map("created_at")
updatedAt DateTime @updatedAt @map("updated_at")
tenant Tenant @relation(fields: [tenantId], references: [id], onDelete: Cascade)
uploadedBy User? @relation(fields: [uploadedById], references: [id], onDelete: SetNull)
@@index([tenantId, scanStatus])
@@index([tenantId, uploadedAt])
@@index([contentHash])
@@map("files")
}
/// Single-use token for `GET /files/:id` redirects. Inserted on signed-URL
/// issuance, deleted on redemption. Expired rows are reaped by a periodic
/// cleanup job.
model FileDownloadToken {
id String @id @default(uuid()) @db.Uuid
tenantId String @map("tenant_id") @db.Uuid
fileId String @map("file_id") @db.Uuid
tokenHash String @unique @map("token_hash") /// SHA-256 of the URL token
expiresAt DateTime @map("expires_at")
redeemedAt DateTime? @map("redeemed_at")
createdAt DateTime @default(now()) @map("created_at")
@@index([tenantId, expiresAt])
@@map("file_download_tokens")
}
Retype of existing referencing columns¶
// Student / Teacher / Staff / Referent / Guardian — each:
passportFileId String? @map("passport_file_id") @db.Uuid
identityCardFileId String? @map("identity_card_file_id") @db.Uuid
passportFile File? @relation("PassportFile", fields: [passportFileId], references: [id], onDelete: SetNull, name: "<Entity>PassportFile")
identityCardFile File? @relation("IdentityCardFile", fields: [identityCardFileId], references: [id], onDelete: SetNull, name: "<Entity>IdentityCardFile")
(Each entity gets a uniquely-named relation to avoid Prisma's "ambiguous relation" error since both columns target the same File model.)
5. RBAC¶
// ENTITIES:
{ key: 'files', label: 'Files', description: 'Uploaded documents and scans', sortOrder: <next> },
// SCOPES:
files: [
{ key: 'metadata', label: 'File Metadata', description: 'Filename, MIME, size, scan status', sortOrder: 1 },
],
// ACTIONS:
{ entityKey: 'files', key: 'create', label: 'Upload File', description: 'Upload a file blob', sortOrder: 1 },
{ entityKey: 'files', key: 'delete', label: 'Delete File', description: 'Soft-delete a file', sortOrder: 2 },
// ACTION_REQUIREMENTS:
'files.create': { scopeKeys: ['files.metadata'] },
'files.delete': { scopeKeys: ['files.metadata'] },
Read access is governed by the referencing scope. A caller can GET /files/:id only when they can read at least one referencing column. This is implemented as a service-side check that traverses back-pointers (or, for performance, a denormalized referrerScope column on File populated by the reference-write path). Decision deferred to implementation.
6. Module layout¶
src/files/
├── files.module.ts
├── files.controller.ts # POST /files, GET /files/:id, GET /files/:id/meta, DELETE /files/:id
├── files.service.ts # Orchestrator (storage + scan + DB + tokens)
├── files.queries.ts
├── files.swagger.ts
├── storage/
│ ├── file-storage.port.ts # FileStoragePort interface (put / get / delete)
│ ├── local-file-storage.ts # Disk-backed transport
│ └── s3-file-storage.ts # AWS SDK v3 transport
├── virus-scan/
│ ├── virus-scan.port.ts # VirusScanPort interface
│ ├── noop-virus-scan.ts # Dev / tests
│ └── clamav-virus-scan.ts # Prod
├── dto/
│ ├── upload-file-response.dto.ts
│ ├── file-metadata.dto.ts
│ └── ...
└── index.ts
7. Endpoints¶
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/files |
@RequireAction(FILES, 'create') |
multipart/form-data: file + usage field. Streams to storage, computes hash, queues virus scan, returns { id, status }. |
GET |
/files/:id/meta |
(referencing scope read) | Returns FileMetadataDto — filename, MIME, size, status, uploadedAt. |
GET |
/files/:id |
(referencing scope read) | Issues a single-use signed download token, redirects (302) to GET /files/:id/blob?t=<token>. |
GET |
/files/:id/blob?t=... |
token-only (no JWT) | Streams the file blob if token is valid, single-use, and unexpired. Returns 425 if PENDING_SCAN, 410 if INFECTED or deletedAt != null. |
DELETE |
/files/:id |
@RequireAction(FILES, 'delete') |
Soft-delete. Sets deletedAt. Storage object kept until retention expires. |
8. Cleanup hooks¶
// In files.service.ts
async invalidateOnReferenceChange(
tx: Prisma.TransactionClient,
previousFileId: string | null,
reason: 'replaced' | 'cleared' | 'entity_deleted',
): Promise<void> {
if (!previousFileId) return;
await tx.file.update({
where: { id: previousFileId },
data: { deletedAt: new Date() },
});
}
Each entity's update / delete service path calls this hook inside its transaction:
// e.g. ReferentsService.update()
if (dto.documents?.passportFileId !== undefined) {
await this.filesService.invalidateOnReferenceChange(
tx,
existing.passportFileId,
dto.documents.passportFileId === null ? 'cleared' : 'replaced',
);
}
A periodic job (BullMQ when adopted; cron-tab + Nest schedule for v1) scans for deletedAt < now - 30 days and hard-deletes both row and storage object.
9. Migration¶
Two migrations, in order:
Migration 1 — Files tables¶
CREATE TABLE "files" ( ... );
CREATE TABLE "file_download_tokens" ( ... );
CREATE INDEX ...;
ALTER TABLE "files" ADD CONSTRAINT "files_tenant_id_fkey" FOREIGN KEY ...;
ALTER TABLE "files" ADD CONSTRAINT "files_uploaded_by_id_fkey" FOREIGN KEY ...;
Pure additions; safe.
Migration 2 — Retype existing FK columns to UUID¶
This is the load-bearing migration and must run when the interim columns are empty (or backfilled to all-NULL) — the cast from arbitrary String? to String? @db.Uuid will fail on any non-UUID value. Pre-flight:
-- Check for non-NULL, non-UUID values
SELECT 'students' AS table, id, passport_file_id FROM students WHERE passport_file_id IS NOT NULL AND passport_file_id !~ '^[0-9a-f]{8}-...';
-- ... repeat per (entity × column)
If any row matches, NULL it out (data is interim mock anyway) before running the migration. Hazard checklist:
ALTER COLUMN ... TYPE UUID USING passport_file_id::uuid— fails on non-UUID values. Pre-flight protects against it. (Hazard #4.)ADD CONSTRAINT FOREIGN KEY— fails on orphan values. TheFiletable is empty when this runs, so any non-NULL referencing value is an orphan. Pre-flight NULL-out is mandatory. (Hazard #7.)
Recommended approach: ship Migration 2 with the SQL pre-flight inlined as a UPDATE ... SET passport_file_id = NULL WHERE passport_file_id IS NOT NULL; for every entity × column pair in the same migration. Documents this is a clean cutover from mocks to real FKs.
10. Testing¶
| Suite | Coverage |
|---|---|
| Unit (storage ports) | Local backend round-trip; S3 backend mocked with aws-sdk-client-mock. |
| Unit (virus-scan ports) | Noop transitions to CLEAN; ClamAV mocked sockets; INFECTED → reference-side cleanup hook fires. |
| Unit (FilesService) | Upload happy path; oversized rejection; MIME mismatch rejection; orphan cleanup on reference change; signed-token issuance / redemption / expiry / single-use. |
| Integration | E2E upload-then-attach-to-Student-then-replace, asserting old File row goes to soft-deleted. |
| Migration | Pre-flight NULL-out idempotence; UUID cast on the empty / sanitized state; FK constraint accepts only valid file ids post-cutover. |
11. Open questions¶
- Storage prefix per environment. Dev / stage / prod buckets share a single AWS account or separate? Affects IAM policy shape — defer to infra spec.
- Virus-scan SLA. ClamAV inline (synchronous, latency cost on upload) vs queued (async, status flip happens later)? Default to queued; revisit if files commonly sit
PENDING_SCANlong enough to surface UX complaints. - Hard delete on entity removal. When a Student / Teacher / Staff / Referent / Guardian is hard-deleted (where supported), should referenced files be soft-deleted (consistent with reference-replacement) or hard-deleted immediately (legal compliance)? Default to soft-delete; flag for legal review before prod.
- Quota / abuse. No per-tenant upload quota in v1. If abusive uploads become a problem, surface a
MAX_FILES_PER_DAYenv var and reject 429.
12. Cross-references¶
docs/superpowers/specs/2026-04-27-missing-fields-design.md§11.2 (file-mock invariant), §11.3 ("Files model — non-blocking").docs/superpowers/specs/2026-04-27-guardian-model-design.md§4 (Guardian carries the same file-mock columns).docs/12-migrations.mdfor the hazard checklist (Migration 2 hits hazard #4 + #7).docs/04-rbac.mdfor entity / scope / action conventions.- The invitations spec (
2026-04-20-invitations-design.md) for the precedent of port-based external integrations (MailerPort↔FileStoragePort/VirusScanPort).