System Design : Metrics and Alerting System
Please review this design. Did not mention the FR and NFR but let's go with standard. Please review and ask questions. (And correct to). PS : Grill me but with respect please. Humble request to not lower my confidence.
The design looks good buddy , few points i would like to add -
1) Just timestamps is not a good enough reason for choosing time series db, you may opt for Cassandra ( As high write to read ratio) as well and it is more scalable, if u are selecting a particular db have root level reasons for backing it.
2) For the alerts service since alerts are of high priority you should add the alerts service before pushing to kafka which matches the metrics with rules and pushes them into a specific topic which and those can be stored in a seperate db with to which another service subscribes and directly sends to clients. Processing the alerts at a much later stage doesn't look much good to me.
3) Instead of storing in S3 you can have another db which is used for storing metrics history and an archivial service which has jobs running on a set frequency ( say weekly or in 14 days) that transfers the data in batches from main db to your stale data storage db, making your writes db lean and performant.
4) You can add master slave architecture to the dbs from where the metrics service is reading data as the master will be the one where the writes happen and slave with the ones where reads, also talk more about eventual consistency or strong consistency, in this case eventual consistency with slight inconsistent data among slaves would ensure better performance and availability making the system robust as metrics is not something where highly consistent data is always required as compared to transaction systems.
4) Try adding a redis caching layer for metrics service where key is "timestamp", would improve the performance of the queries.
Overall the design looks promising to me with a neat and crisp flow. Only part to work is the storing and retrieval of data from time series db which metrics service is consuming and I have told you my views on how it can be improved. Good initiative felt real good reviewing your design doc!!.
Here is the link for excalidraw :
(Because can't see this in landscape unless downloaded)
https://excalidraw.com/#json=nf5PIgLKfaB97ZUDeqheM,808OGvuvhnQm8tSAFj7Hcw
Also, it would be great if you mention yoe as well, I have almost 3yoe.
Discover More
Curated from across