SnoozyWalrus

Suggestions for tackling scaling issues

Hi all, currently working at startup and we have started to face scaling issues. Example Database tables have started increasing rows. So how do you handle this kind of data? How to think in the direction of improving and writing queries on billions of records to perform optimal? Can people with experience help me with any resources or personal experience they have gone through when going from 1 to 100? Minor of the minor suggestions would help. Grateful and thanks to all in advance.🙌

11mo ago

Find out if you are being paid fairly.Download Grapevine

FuzzyPickle

Amazon11mo

Check all indices (indexes)

Find queries from your monitoring tool which are consuming maxiumum CPU. Run Explain/Analyze on those.

Move read queries to Read Replica and write queries to master.

Look for caching popular queries results.

Creare appropriate shards if needed.

Run batch jobs to archive stale/unused data from DB to s3

Try to move tables to NoSql (tables which doesn’t need any joins to start off with, ignore tables where you need to have transactional properties)

SnoozyWalrus

Stealth11mo

Thanks for the detailed answers We have applied indexes as well, but if suppose we apple where query on indexed attribute and that attribute is given array of 50k IDs, then indexes are also not use ful. So can you suggest any better way for this?

FuzzyPickle

Amazon11mo

An alternative approach is to reconsider your data model. Depending on your use case, you might create a separate document for each ID or group IDs in a way that aligns with your query patterns. This can lead to more efficient queries and better scalability.

ZestyBagel

Swiggy11mo

Move to dynamodb from rds

FuzzyPickle

Amazon11mo

Why choose dynamoDB as default NoSQL- it totally depends on the use case

ZestyBagel

Swiggy11mo

Yeah i mean nosql only dynamodb offers much better scalability imo

SnoozyPotato

Stealth11mo

Which databases? What the current ingestion rate? What it will be in a year? Whats the current resource allocation. You expect ppl to answer without any of these. I can suggest a fancy system design which can support millions of read write but that doesn’t mean it can solve your problem. Each use case has its own solutions.

Ppl jumping in the thread without asking the constraints, if I was an interviewer would have rejected solely based on jumping to conclusions without asking finer details

SnoozyWalrus

Stealth11mo

Ok, so the database is MySQL. Ingestion rate is not known to me. It is more read heavy system as we have a no SQL db where we store computed data. So if specific table is updated and suppose this table impact 5 to 6 tables of no SQL then too much read will happen to compute data. And also tables in MySQL have started growing due to more clients coming. So on some tables basic where query is stating to take time. So how can I grow this system?

SnoozyPotato

Stealth11mo

Get slow queries logging enabled, once you have that use explain analyse to see if its doing seq/table scan. If thats the case the fields which are most queried on and the where conditions column create indexes on those columns, re-run the explain thing validate of its doing a index scan now. This is the starting point. You can further denormalise the tables if its possible

ZippyWalrus

Stealth11mo

Shard/read replica, index, partition Note: indexing will slow down writes Some types of indexes are faster than others

-I worked on a very large scale data warehouse at a financial institution

WigglyMuffin

Stealth11mo

I am sure you will be able to find it on Google. Just deep dive before asking here. It will help you in the long run.

Discover more

Curated from across

Ask a question on Grapevine.

Get the app on Android or iOS.

Privacy Terms

Guidelines Help