← BACK

Case Study

Transactional Booking & Payment Platform

Payments • Webhooks • Concurrency

Images

Gallery unavailable.

Overview

Backend for Patagonia Dreams — a tourism operator with 180k+ passengers/year and 7,000+ five-star Google reviews. Built and led the platform from scratch: transactional reservations and payments (Mercado Pago, Stripe, Pix), multi-tenant backoffice, and bidirectional sync with an external activity panel. The core invariant: a reservation is only 'paid' when the webhook confirms it — never based on client state. Webhooks are HMAC-validated and processed idempotently by event_id. Availability is locked pessimistically (SELECT FOR UPDATE) to serialize concurrent bookings on the same slot. Identity via AWS Cognito with JWKS token verification; all critical config from AWS Secrets Manager. Stack: Django, DRF, PostgreSQL, AWS (SES, Cognito, Secrets Manager, ECR/K8s).

Architecture & Design

Transactional Flow Overview

Internal System
Internal System

System Invariants

  • ·A payment intent cannot transition from failed to succeeded.
  • ·Reservation "paid" is set only after a verified webhook; frontend and redirect cannot set it.
  • ·Webhook events are processed idempotently by provider event_id.
  • ·Availability for a slot is updated under pessimistic lock (SELECT FOR UPDATE); no optimistic commit.
  • ·Idempotency keys are scoped per client and stored; duplicate key returns original response.
  • ·Payment and reservation state changes for a webhook occur in a single database transaction.

Architecture Decision Records

  • ADR-01Webhooks as single source of truth for payment status — client redirect cannot set 'paid'
  • ADR-02Pessimistic locking (SELECT FOR UPDATE) on availability slot — concurrent bookings serialize, not race
  • ADR-03Idempotency keys on reservation creation; event_id deduplication on all incoming webhooks
  • ADR-04HMAC validation on every webhook payload before processing
  • ADR-05AWS Cognito as sole identity entry point; ID token verified with JWKS before trusting any user data
  • ADR-06All critical config via AWS Secrets Manager — no secrets in code or repo

Scale & Constraints

Request volume
Operator with 180k+ passengers/year. Online platform reservations + webhook bursts up to ~50/min on peak.
Concurrency
Pessimistic lock on availability row per slot; single writer for payment state. No cross-slot locking.
External dependencies
Mercado Pago, Stripe, Pix (payments); external activity Panel (availability, rates, and bidirectional booking sync); AWS Cognito, SES, Secrets Manager; Google (OAuth, My Business, Merchant Center); Meta. Webhooks are async; payment status only via webhook.
Failure modes
Provider timeout or webhook delay → reservation stays pending until webhook or manual reconciliation. Duplicate webhook → idempotent by event_id. Cognito/Panel down → degraded auth or catalog sync.
Data consistency
Single DB transaction for reservation + payment on webhook. Reservation "paid" only after webhook; frontend cannot set paid. Cognito ↔ Django user sync via get_or_create and ID token verification.

What was explicitly rejected

  • Frontend or redirect callback as source of "paid"

    Redirects and client state are unreliable; provider retries and multiple tabs would allow double-apply or missed updates.

  • Optimistic locking on availability

    Conflict rate on hot slots would cause high retry and poor UX; pessimistic lock gave predictable behaviour at observed load.

  • Microservices per domain (payments, reservations, catalog)

    Operational and consistency cost (distributed transactions, eventual consistency) not justified for current scale; modular monolith with clear boundaries chosen instead.

  • CSV export for operations

    Excel/CSV formula injection risk; replaced with JSON response and controlled data only.

  • Secrets or sensitive URLs in code or repo

    All critical config (FRONTEND_URL, Cognito, Stripe, Panel, etc.) via env from AWS Secrets Manager.