Rocky Warren @therockstorm
Principal Software Engineer @Dwolla
@snap[north] Outline @snapend @title[Outline]
@snap[west list-content span-100] @ul
- Original architecture and its limitations
- New architecture with code walkthrough
- Rollout strategy
- Lessons learned
- Results @ulend @snapend
@snap[north] Background @snapend @title[Background]
@snap[west list-content span-100] @ul
- Dwolla provides a payment platform API
- Bank transfers, user management, instant bank account verification
- Certain actions trigger a webhook (also called web callback or push API)
- HTTP POST to partner API providing real-time event details
- Eliminates polling for updates @ulend @snapend
@snap[north] Original Architecture @snapend @title[Original Architecture]
Note:
- Partners create webhook subscriptions indicating URL for us to call
Subscriptions
receives events from services and publishes to single, shared queueHandler
s read off queue, call partner API, and publish resultSubscriptions
receives and stores result
@snap[north] Limitations @snapend @title[Limitations]
@snap[west list-content span-100] @ul
- At peak load, delayed ~60 mins, defeating their purpose
- Partner processes (notifications, etc.) are then delayed
- One slow-to-respond or high-volume partner affects everyone
- Scaling handlers causes parallel API calls for everyone
- Non-trivial per-partner configuration @ulend @snapend
@snap[north] New Architecture @snapend @title[New Architecture]
Note:
- One queue per partner, dynamically provisioned on subscription creation
- Individually configurable depending on scalability of partner APIs
- Slow or high-volume partners only impact themselves
@snap[north] Why SQS and Lambda? @snapend @title[Why SQS and Lambda?]
@snap[west list-content span-100] @ul
- Creating queue/handler with AWS SDKs simpler than custom code
- Spiky workload perfect for pay-per-use pricing, auto-scaling
- No server management maximizes time spent adding value, decreased attack surface
- ~1 minute deployments reduce development cycle @ulend @snapend
Note:
webhook-provisioner
: Create, delete, disablewebhook-handler
: postHook, publishResult, requeue, error, update-allcloudwatch-alarm-to-slack
@snap[north] Rollout @snapend @title[Rollout]
@snap[west list-content span-100] @ul
- Whitelist test partners in Sandbox via Feature Flags
- Enable globally in Sandbox
- Whitelist beta partners in Prod
- Monitor, gather feedback
- Migrate in batches based on webhook volume @ulend @snapend
@snap[north] Lessons Learned @snapend @title[Lessons Learned]
@snap[west list-content span-100] @ul
- Audit dependencies to keep bundle size and memory usage low (e.g. HTTP libs)
- CloudWatch can get expensive, defaults retention to forever
- Follow Best Practices for avoiding throttling, dead-letter queues, idempotency, batch size
- Lambda errors elusive, CloudWatch Insights helps
- Include high cardinality values in log messages, take charge of monitors/alerts @ulend @snapend
Note:
- TypeScript: Painter's Tape for JavaScript
- 404 from customer, logs contained id, url, status with no issues
@snap[north] Lessons Learned @snapend @title[Lessons Learned]
@snap[west list-content span-100] @ul
- One Lambda serving multiple queues limits configuration options
- TypeScript, Serverless Framework,
aws-cdk
are great - Think twice before dynamically provisioning resources, concurrency, prepare to retry
- Understand AWS Account Limits (IAM, Lambda, SQS, CloudFormation Stacks, etc.)
- Utilize tagging to manage lots of resources @ulend @snapend
@snap[north] Results @snapend @title[Results]
@snap[west list-content span-100] @ul
- Infinitely scalable, from 60 min delay at peak load to under one
- Configurable to individual partner's needs
- Low costs and maintenance, free when not in use @ulend @snapend
@snap[north] Free Code! @snapend @title[Free Code!]
@snap[west list-content span-100] @ul
- webhook-provisioner: Provision AWS resources
- webhook-handler: POST webhooks to APIs
- webhook-receiver: Receive and verify webhooks
- cloudwatch-alarm-to-slack: Forward CloudWatch Alarms to Slack
- sqs-mv: Move SQS messages from one queue to another
- generator-serverless: Serverless Yeoman generator @ulend @snapend