Skip to content

Latest commit

 

History

History
218 lines (163 loc) · 5.9 KB

PITCHME.md

File metadata and controls

218 lines (163 loc) · 5.9 KB

Serverless Webhooks

Rocky Warren @therockstorm
Principal Software Engineer @Dwolla


@snap[north] Outline @snapend @title[Outline]

@snap[west list-content span-100] @ul

  • Original architecture and its limitations
  • New architecture with code walkthrough
  • Rollout strategy
  • Lessons learned
  • Results @ulend @snapend

@snap[north] Background @snapend @title[Background]

@snap[west list-content span-100] @ul

  • Dwolla provides a payment platform API
    • Bank transfers, user management, instant bank account verification
  • Certain actions trigger a webhook (also called web callback or push API)
    • HTTP POST to partner API providing real-time event details
    • Eliminates polling for updates @ulend @snapend

@snap[north] Original Architecture @snapend @title[Original Architecture]

@snap[south span-85] https://mermaidjs.github.io/mermaid-live-editor/#/edit/eyJjb2RlIjoiZ3JhcGggTFJcbiAgICBCW1NydmMgQV0gLS4tPiBEXG4gICAgQ1tTcnZjIEJdIC0uLT4gRFxuICAgIERbU3Vic10gLS0-IEUoZmE6ZmEtZGF0YWJhc2UpXG4gICAgQVtEd29sbGEgZmE6ZmEtc2VydmVyXSAtLT4gRFxuICAgIFxuICAgIEZbSGFuZGxlcnNdLS4tPkRcbiAgICBELS4tPiBGXG4gICAgRi0tPkdbQSBmYTpmYS1zZXJ2ZXJdXG4gICAgRi0tPkhbQiBmYTpmYS1zZXJ2ZXJdXG4gICAgRi0uLT58MTV8RlxuICAgIEYtLi0-fDQ1fEZcbiAgICBGLS4tPnwuLi58RlxuXG4gICAgY2xhc3NEZWYgZGIgZmlsbDojQUREOEU2XG4gICAgY2xhc3MgRSBkYjsiLCJtZXJtYWlkIjp7InRoZW1lIjoiZGVmYXVsdCJ9fQ @snapend

Note:

  • Partners create webhook subscriptions indicating URL for us to call
  • Subscriptions receives events from services and publishes to single, shared queue
  • Handlers read off queue, call partner API, and publish result
  • Subscriptions receives and stores result

@snap[north] Limitations @snapend @title[Limitations]

@snap[west list-content span-100] @ul

  • At peak load, delayed ~60 mins, defeating their purpose
    • Partner processes (notifications, etc.) are then delayed
  • One slow-to-respond or high-volume partner affects everyone
  • Scaling handlers causes parallel API calls for everyone
  • Non-trivial per-partner configuration @ulend @snapend

@snap[north] New Architecture @snapend @title[New Architecture]

@snap[south span-75] https://mermaidjs.github.io/mermaid-live-editor/#/edit/eyJjb2RlIjoiZ3JhcGggTFJcbiAgICBCW1NydmMgQV0gLS4tPiBEXG4gICAgQ1tTcnZjIEJdIC0uLT4gRFxuICAgIERbU3Vic10gLS0-IEUoZmE6ZmEtZGF0YWJhc2UpXG4gICAgQVtEd29sbGEgZmE6ZmEtc2VydmVyXSAtLT4gRFxuICAgIFxuICAgIEZbSGFuZGxlcnNdLS4tPkRcbiAgICBGLS4tPnxSZXRyeXxGXG4gICAgRC0uLT4gRlxuICAgIEYtLT5HW0EgZmE6ZmEtc2VydmVyXVxuXG4gICAgSFtIYW5kbGVyc10tLi0-RFxuICAgIEgtLi0-fFJldHJ5fEhcbiAgICBELS4tPiBIXG4gICAgSC0tPklbQiBmYTpmYS1zZXJ2ZXJdXG5cbiAgICBjbGFzc0RlZiBkYiBmaWxsOiNBREQ4RTZcbiAgICBjbGFzcyBFIGRiOyIsIm1lcm1haWQiOnsidGhlbWUiOiJkZWZhdWx0In19 @snapend

Note:

  • One queue per partner, dynamically provisioned on subscription creation
  • Individually configurable depending on scalability of partner APIs
  • Slow or high-volume partners only impact themselves

@snap[north] Why SQS and Lambda? @snapend @title[Why SQS and Lambda?]

@snap[west list-content span-100] @ul

  • Creating queue/handler with AWS SDKs simpler than custom code
  • Spiky workload perfect for pay-per-use pricing, auto-scaling
  • No server management maximizes time spent adding value, decreased attack surface
  • ~1 minute deployments reduce development cycle @ulend @snapend

Code Walkthrough

Note:

  • webhook-provisioner: Create, delete, disable
  • webhook-handler: postHook, publishResult, requeue, error, update-all
  • cloudwatch-alarm-to-slack

@snap[north] Rollout @snapend @title[Rollout]

@snap[west list-content span-100] @ul

  • Whitelist test partners in Sandbox via Feature Flags
  • Enable globally in Sandbox
  • Whitelist beta partners in Prod
  • Monitor, gather feedback
  • Migrate in batches based on webhook volume @ulend @snapend

@snap[north] Lessons Learned @snapend @title[Lessons Learned]

@snap[west list-content span-100] @ul

  • Audit dependencies to keep bundle size and memory usage low (e.g. HTTP libs)
  • CloudWatch can get expensive, defaults retention to forever
  • Follow Best Practices for avoiding throttling, dead-letter queues, idempotency, batch size
  • Lambda errors elusive, CloudWatch Insights helps
  • Include high cardinality values in log messages, take charge of monitors/alerts @ulend @snapend

Note:

  • TypeScript: Painter's Tape for JavaScript
  • 404 from customer, logs contained id, url, status with no issues

@snap[north] Lessons Learned @snapend @title[Lessons Learned]

@snap[west list-content span-100] @ul

  • One Lambda serving multiple queues limits configuration options
  • TypeScript, Serverless Framework, aws-cdk are great
  • Think twice before dynamically provisioning resources, concurrency, prepare to retry
  • Understand AWS Account Limits (IAM, Lambda, SQS, CloudFormation Stacks, etc.)
  • Utilize tagging to manage lots of resources @ulend @snapend

@snap[north] Results @snapend @title[Results]

@snap[west list-content span-100] @ul

  • Infinitely scalable, from 60 min delay at peak load to under one
  • Configurable to individual partner's needs
  • Low costs and maintenance, free when not in use @ulend @snapend

@snap[north] Free Code! @snapend @title[Free Code!]

@snap[west list-content span-100] @ul


Questions?