How do you solve for peak concurrency? >10,000 users at one point?
Just how do you scale? Do you use AWS auto-scaler for that?
Seems like you are early in your DevOps/BE journey. The way you do it is:
1. Build performant APIs. <300ms is good and <100ms is great.
2. Cache all db calls wherever you can.
3. Cache all recurring calls to internal objects.
4. Profile your code and reduce time complexity.
5. Spread out API calls across several workers by mutithreading.
6. Use a load balancer and dynamically allocate more or less servers to it basis CPU util or Requests per second.
See more comments
From what I remember hotstar used an in-house tooling that scales infra based on request rate and concurrent users per unit time, rather than default metrics like CPU or network usage.
Then it is a little simpler because you increase number of servers proportionally to the demand.
Open-source alternative is also available called KEDA for k8s autoscaling
@SpryJunker just being curious.. Isn’t your request rate and concurrent users per unit time finally boils down to CPU usage and memory metrics?
How does the scaling based on request rate justified here?
See more comments
https://www.youtube.com/watch?v=9b7HNzBB3OQ
This is the best video on an Indian company handling insane concurrency, while 75% of all bandwidth available in India is being consumed at the same time.
Rakz
Stealth
4 months ago
There is a lot of information missing here but based on what you have given, let's take a crack at this.
1. Identify where the problem lies - If you're seeing resource crunch then you need to benchmark the service properly before it's going live in production. Give due diligence to performance also along with functional testing.
2. From a devops point of view if you're seeing a gradual increase in load, you can employ reactive scaling. You can use autoscaling groups (if deployed in ec2) or horizontal pod autoscaler (if using k8s).
3. For sudden spikes, reactive scaling won't work as by the time it will take to scale, it will already start erroring out. For this scenario you need to go with proactive scaling. You need to study the pattern and build automation around it to predictively scale your infra based on the pattern.
Please note that this answer considers there are no design flaws in your application and it has been optimally designed to handle those many requests coming in.
@Rakz Very solid. The problem is that most Engineers have never faced problems at scale. So, very few engineers with experience at that going around.
I was about to answer the same. Seems like a Consumer Internet startup problem. SaaS usually is more chill.
Rakz
Stealth
4 months ago
I agree hence I mentioned performance has to be given due diligence along with functional testing. Lots of new startups tend to ignore it but if you're consumer facing, then customers won't come back if you're giving them a laggy experience.
I would analyze why you are seeing a spike, is there some sale gng on or your marketing team sent notifications or is it just a transient spike.
How you can scale
1. You can scale based on the concurrent request count, CPU or disk usage
2. If this spike is part of your traffic pattern you set up auto scaling as in that particular time frame
3. If this part of some sale or marketing campaign you need to pro actively scale the fleet.
Thanks