CrazyFifth
CrazyFifth

How do you solve for peak concurrency? >10,000 users at one point?

Just how do you scale? Do you use AWS auto-scaler for that?

5mo ago14K views
DualWannabe
DualWannabe

Seems like you are early in your DevOps/BE journey. The way you do it is:

  1. Build performant APIs. <300ms is good and <100ms is great.
  2. Cache all db calls wherever you can.
  3. Cache all recurring calls to internal objects.
  4. Profile your code and reduce time complexity.
  5. Spread out API calls across several workers by mutithreading.
  6. Use a load balancer and dynamically allocate more or less servers to it basis CPU util or Requests per second.
debugging
debugging

How about Lambda?

ClearOyster
ClearOyster

I solved it using Lambda and Api gateway

SpryJunker
SpryJunker

From what I remember hotstar used an in-house tooling that scales infra based on request rate and concurrent users per unit time, rather than default metrics like CPU or network usage.

Then it is a little simpler because you increase number of servers proportionally to the demand.

Open-source alternative is also available called KEDA for k8s autoscaling

Deadpool93
Deadpool93

@SpryJunker just being curious.. Isn’t your request rate and concurrent users per unit time finally boils down to CPU usage and memory metrics? How does the scaling based on request rate justified here?

HandyYahoo
HandyYahoo

https://www.youtube.com/watch?v=9b7HNzBB3OQ

This is the best video on an Indian company handling insane concurrency, while 75% of all bandwidth available in India is being consumed at the same time.

inr
inr

I was recommended this just today..

Rakz
Rakz

There is a lot of information missing here but based on what you have given, let's take a crack at this.

  1. Identify where the problem lies - If you're seeing resource crunch then you need to benchmark the service properly before it's going live in production. Give due diligence to performance also along with functional testing.

  2. From a devops point of view if you're seeing a gradual increase in load, you can employ reactive scaling. You can use autoscaling groups (if deployed in ec2) or horizontal pod autoscaler (if using k8s).

  3. For sudden spikes, reactive scaling won't work as by the time it will take to scale, it will already start erroring out. For this scenario you need to go with proactive scaling. You need to study the pattern and build automation around it to predictively scale your infra based on the pattern.

Please note that this answer considers there are no design flaws in your application and it has been optimally designed to handle those many requests coming in.

tatiana_xoxo
tatiana_xoxo

@Rakz Very solid. The problem is that most Engineers have never faced problems at scale. So, very few engineers with experience at that going around.

I was about to answer the same. Seems like a Consumer Internet startup problem. SaaS usually is more chill.

Rakz
Rakz

I agree hence I mentioned performance has to be given due diligence along with functional testing. Lots of new startups tend to ignore it but if you're consumer facing, then customers won't come back if you're giving them a laggy experience.

PlayfulFob6
PlayfulFob6
Zeta5mo

I would analyze why you are seeing a spike, is there some sale gng on or your marketing team sent notifications or is it just a transient spike.

How you can scale

  1. You can scale based on the concurrent request count, CPU or disk usage
  2. If this spike is part of your traffic pattern you set up auto scaling as in that particular time frame
  3. If this part of some sale or marketing campaign you need to pro actively scale the fleet.

Thanks

tatiana_xoxo
tatiana_xoxo

Point 1 is probably the best and most generalizable.

Discover more
Curated from across