SQS between services, not just for long jobs.
Synchronous calls are the default reflex. Half of them shouldn't be. A queue between producer and consumer changes the failure shape of your whole system.
Most teams reach for SQS only when a job is long-running: image processing, video transcoding, background reports. It works for that, and it leaves on the table the bigger reason queues exist: they change the failure mode of every coupling between services.
A synchronous HTTP call from service A to service B has one healthy state and a long list of broken ones. B is up with good latency, fine. B is slow, A's threads pile up. B is down, A starts returning 500s. B is restarting, A's connection pool spikes. A queue replaces all of those with one steady state: the message sits there until B can take it.
When a queue earns its place
Three signals tell you a sync call should probably be a queue.
The receiver doesn't need to be alive when the producer publishes. A user uploads a file, you want to process it, the processing takes time. A sync call means the upload waits. A queue means the upload returns immediately, the processing happens whenever capacity is available, and the user sees a state machine: queued → processing → done.
The traffic is bursty. Sales-time spikes. Cron jobs that fire across the fleet at the same minute. Webhooks from a vendor that batches retries. Without a queue, your bursty input becomes your service's load profile. With a queue, the queue absorbs the burst and the consumer drains it at its own pace.
The work can fail and you want to retry. A sync call's retry policy is whatever the caller decides, usually nothing. A queue's retry policy is built in: visibility timeout, redrive policy, dead-letter queue. The infrastructure carries the retry logic so the producer doesn't have to.
If none of those apply, a queue is overhead you don't need.
The shape
Producer publishes to a queue. Consumer polls the queue, processes one message, deletes it on success. On failure, the message returns to the queue after the visibility timeout. On repeated failures, it ends up in a dead-letter queue. That's the whole interface.
The producer doesn't know how many consumers exist or how fast they're processing. The consumer doesn't know how many producers are out there or what their burst profile looks like. They share only the message shape. That decoupling is the point.
What you have to configure right
Idempotency
SQS standard queues are at-least-once delivery. A message can be delivered to your consumer more than once. Even successful processing followed by a network failure on the delete call results in the message reappearing.
Your consumer has to be idempotent. Processing the same message twice has to produce the same result as processing it once. If your work is "create row in DB," include the message ID as an idempotency key. If your work is "send email," check whether you already sent for that message ID. If your work is "transfer money," do not deploy without idempotency keys.
The default mental model is "the queue won't deliver duplicates." That model is wrong. Build for "the queue might deliver duplicates."
Visibility timeout
When a consumer pulls a message, the queue hides it for the visibility timeout window. If the consumer doesn't delete the message within that window, it reappears for another consumer to pick up. The window covers "consumer crashed mid-processing."
The trap: if your processing takes longer than the visibility timeout, your message reappears while you're still working on it. Now two consumers process the same message in parallel. With idempotency it's wasteful but harmless, without it, it's a bug.
Set the visibility timeout to a generous overestimate of how long the work takes. Or use the long-running pattern: extend the visibility on a heartbeat from the consumer (ChangeMessageVisibility). The first is simpler, the second handles the case where some messages are 10× the typical duration.
Dead-letter queue
A poison message, one that consistently fails to process, will, without a DLQ, retry forever. The visibility timeout means it cycles back into the active queue, gets picked up, fails, and the cycle repeats until you intervene manually. Meanwhile the queue depth alarm is going off, your consumer's error rate looks bad, and you don't know whether one bad message is dragging the whole system or whether everything is broken.
A DLQ with a sensible maxReceiveCount (3-5 is the common range) gets the bad message out of the way. The active queue drains, metrics tell you which message specifically is poisoned, and you can investigate at your pace.
The DLQ is also where retry meets observability. A message in the DLQ is an alert. The number of messages in the DLQ over time is a health metric. Without the DLQ, neither signal exists.
Long polling
Short polling, the default, has the consumer hit the queue continuously, even when empty, paying a request charge per poll. Long polling tells the queue "if there's no message, wait up to 20 seconds before responding." The consumer makes far fewer requests.
Set WaitTimeSeconds to 20 (the maximum) on every consumer call. The latency penalty for the first message in an empty queue is at most one round trip. In steady state with traffic, the queue responds immediately.
Standard vs FIFO
Standard queues are at-least-once delivery and best-effort ordering. They scale to nearly unlimited throughput. FIFO queues are exactly-once (with deduplication) and strictly ordered, capped at lower throughput per group.
Most workloads are Standard. Order doesn't matter for image processing, email sending, log aggregation. Build for at-least-once and you get massive scale.
FIFO matters when "process B happens after process A on the same entity" is a correctness requirement. State machine transitions, ordered ledger entries, cross-system reconciliation. The throughput cap rarely bites at typical product scale.
What it gives you when done right
Your producer's reliability stops depending on the consumer's. The consumer can deploy, restart, scale, fail over, none of it shows up to the producer. The producer publishes and walks away.
Your bursts don't take down your fleet. The queue absorbs the spike, the workers drain at their natural rate. You pay for steady-state capacity, not peak capacity.
Your retries are infrastructure, not application code. The consumer that fails on a transient error gets the message again automatically. The consumer that fails on a poison message stops getting it after a few tries.
Your consumers can be different things. Some workers process the queue, an analytics tool can also subscribe, a debugging tool can sample messages. The queue is a shared interface.
What it doesn't give you
Lower latency. A queue adds a round trip. If the work is short and the producer needs the result, sync is faster.
Stronger consistency. Queues are eventual. If two messages affect the same entity and order matters, Standard won't preserve it. FIFO does, with throughput limits. Sometimes the right answer is a queue per entity.
Free observability. Queue depth, age of oldest message, DLQ count are your dashboards. Set them up before you ship, not after the first incident teaches you that you needed them.
The rule of thumb
That's most of the decision. The rest is idempotency, visibility timeout, DLQ, long polling, applied without exception every time you stand up a new queue.
The reflex to start with a sync call exists because HTTP gives it to you for free. Once the queue is in the toolbox, the question stops being "should this be async?" and becomes "what's stopping this from being async?".