Images
Gallery unavailable.
Overview
Backend for Patagonia Dreams — a tourism operator with 180k+ passengers/year and 7,000+ five-star Google reviews. Built and led the platform from scratch: transactional reservations and payments (Mercado Pago, Stripe, Pix), multi-tenant backoffice, and bidirectional sync with an external activity panel. The core invariant: a reservation is only 'paid' when the webhook confirms it — never based on client state. Webhooks are HMAC-validated and processed idempotently by event_id. Availability is locked pessimistically (SELECT FOR UPDATE) to serialize concurrent bookings on the same slot. Identity via AWS Cognito with JWKS token verification; all critical config from AWS Secrets Manager. Stack: Django, DRF, PostgreSQL, AWS (SES, Cognito, Secrets Manager, ECR/K8s).
Architecture & Design
Transactional Flow Overview
System Invariants
- ·A payment intent cannot transition from failed to succeeded.
- ·Reservation "paid" is set only after a verified webhook; frontend and redirect cannot set it.
- ·Webhook events are processed idempotently by provider event_id.
- ·Availability for a slot is updated under pessimistic lock (SELECT FOR UPDATE); no optimistic commit.
- ·Idempotency keys are scoped per client and stored; duplicate key returns original response.
- ·Payment and reservation state changes for a webhook occur in a single database transaction.
Architecture Decision Records
- ADR-01Webhooks as single source of truth for payment status — client redirect cannot set 'paid'
- ADR-02Pessimistic locking (SELECT FOR UPDATE) on availability slot — concurrent bookings serialize, not race
- ADR-03Idempotency keys on reservation creation; event_id deduplication on all incoming webhooks
- ADR-04HMAC validation on every webhook payload before processing
- ADR-05AWS Cognito as sole identity entry point; ID token verified with JWKS before trusting any user data
- ADR-06All critical config via AWS Secrets Manager — no secrets in code or repo
Scale & Constraints
- Request volume
- Operator with 180k+ passengers/year. Online platform reservations + webhook bursts up to ~50/min on peak.
- Concurrency
- Pessimistic lock on availability row per slot; single writer for payment state. No cross-slot locking.
- External dependencies
- Mercado Pago, Stripe, Pix (payments); external activity Panel (availability, rates, and bidirectional booking sync); AWS Cognito, SES, Secrets Manager; Google (OAuth, My Business, Merchant Center); Meta. Webhooks are async; payment status only via webhook.
- Failure modes
- Provider timeout or webhook delay → reservation stays pending until webhook or manual reconciliation. Duplicate webhook → idempotent by event_id. Cognito/Panel down → degraded auth or catalog sync.
- Data consistency
- Single DB transaction for reservation + payment on webhook. Reservation "paid" only after webhook; frontend cannot set paid. Cognito ↔ Django user sync via get_or_create and ID token verification.
What was explicitly rejected
- ✕
Frontend or redirect callback as source of "paid"
Redirects and client state are unreliable; provider retries and multiple tabs would allow double-apply or missed updates.
- ✕
Optimistic locking on availability
Conflict rate on hot slots would cause high retry and poor UX; pessimistic lock gave predictable behaviour at observed load.
- ✕
Microservices per domain (payments, reservations, catalog)
Operational and consistency cost (distributed transactions, eventual consistency) not justified for current scale; modular monolith with clear boundaries chosen instead.
- ✕
CSV export for operations
Excel/CSV formula injection risk; replaced with JSON response and controlled data only.
- ✕
Secrets or sensitive URLs in code or repo
All critical config (FRONTEND_URL, Cognito, Stripe, Panel, etc.) via env from AWS Secrets Manager.