GoofyBagel
GoofyBagel

How do you solve for peak concurrency? >10,000 users at one point?

Just how do you scale? Do you use AWS auto-scaler for that?

6mo ago
14Kviews
Find out if you are being paid fairly.Download Grapevine
FuzzyPretzel
FuzzyPretzel

Seems like you are early in your DevOps/BE journey. The way you do it is:

  1. Build performant APIs. <300ms is good and <100ms is great.
  2. Cache all db calls wherever you can.
  3. Cache all recurring calls to internal objects.
  4. Profile your code and reduce time complexity.
  5. Spread out API calls across several workers by mutithreading.
  6. Use a load balancer and dynamically allocate more or less servers to it basis CPU util or Requests per second.
SqueakyBiscuit
SqueakyBiscuit

How about Lambda?

DizzyBoba
DizzyBoba

I solved it using Lambda and Api gateway

FloatingWalrus
FloatingWalrus

From what I remember hotstar used an in-house tooling that scales infra based on request rate and concurrent users per unit time, rather than default metrics like CPU or network usage.

Then it is a little simpler because you increase number of servers proportionally to the demand.

DizzyPotato
DizzyPotato

Open-source alternative is also available called KEDA for k8s autoscaling

SleepyPanda
SleepyPanda

@SpryJunker just being curious.. Isn’t your request rate and concurrent users per unit time finally boils down to CPU usage and memory metrics? How does the scaling based on request rate justified here?

JumpyUnicorn
JumpyUnicorn

https://www.youtube.com/watch?v=9b7HNzBB3OQ

This is the best video on an Indian company handling insane concurrency, while 75% of all bandwidth available in India is being consumed at the same time.

GoofyCupcake
GoofyCupcake

I was recommended this just today..

MagicalLlama
MagicalLlama

There is a lot of information missing here but based on what you have given, let's take a crack at this.

  1. Identify where the problem lies - If you're seeing resource crunch then you need to benchmark the service properly before it's going live in production. Give due diligence to performance also along with functional testing.

  2. From a devops point of view if you're seeing a gradual increase in load, you can employ reactive scaling. You can use autoscaling groups (if deployed in ec2) or horizontal pod autoscaler (if using k8s).

  3. For sudden spikes, reactive scaling won't work as by the time it will take to scale, it will already start erroring out. For this scenario you need to go with proactive scaling. You need to study the pattern and build automation around it to predictively scale your infra based on the pattern.

Please note that this answer considers there are no design flaws in your application and it has been optimally designed to handle those many requests coming in.

DizzySushi
DizzySushi

@Rakz Very solid. The problem is that most Engineers have never faced problems at scale. So, very few engineers with experience at that going around.

I was about to answer the same. Seems like a Consumer Internet startup problem. SaaS usually is more chill.

MagicalLlama
MagicalLlama

I agree hence I mentioned performance has to be given due diligence along with functional testing. Lots of new startups tend to ignore it but if you're consumer facing, then customers won't come back if you're giving them a laggy experience.

PeppyPotato
PeppyPotato
Zeta6mo

I would analyze why you are seeing a spike, is there some sale gng on or your marketing team sent notifications or is it just a transient spike.

How you can scale

  1. You can scale based on the concurrent request count, CPU or disk usage
  2. If this spike is part of your traffic pattern you set up auto scaling as in that particular time frame
  3. If this part of some sale or marketing campaign you need to pro actively scale the fleet.

Thanks

DizzySushi
DizzySushi

Point 1 is probably the best and most generalizable.

Discover more
Curated from across
Software Engineers
by PerkyDumplingAngel One

Scaling LB

For making highly scalable, highly available applications - applications are put behind a load balancer and LB will distribute traffic between them.

Let say load balancer is reaching its peak traffic then what ? How is traffic handled i...

3.4K
Indian Startups
by SparklyWalrusSpinny

Crickpe failed to scale on day 1

Crickpe failed to scale on day 1. They didn't expect this exponential traffic. Crickpe is build by code brew lab, chandigarh a service based company, their developer might not created such large scale app. Ashneer grover apologised for t...

Indian Startups
by DerpyCoconutStealth

War Room or Cry Room?

While the Internet was divided when Zomato, BlinkIt, OYO, Zepto founders were sharing crazy stats on social media on the evening of the New Year and their War Rooms. I found some of the stats literally nonsensical..

Do you think not hav...

Post image
Top comment
user

I would be more concentrated on how many customer issues were solved rather than how chips did Raju ordered.