Understanding Multiple Request to Azure OpenAI GPT-4 Vision Model API

Hi everyone, I hope you're all doing well. I have a question regarding the Azure OpenAI GPT-4 Vision model API and how it handles multiple requests. I'm trying to understand the specifics around "requests per minute" and how this impacts making calls to the API in both sequential and parallel fashions. Key Points I'm Curious About: 1. Requests Per Minute (RPM): What does "requests per minute" mean in the context of the Azure OpenAI GPT-4 Vision model API? How is RPM calculated and enforced by the API? 2. Sequential Requests: If I send requests one after another (sequentially), how does this impact the allowed RPM? Is there a recommended delay between requests to stay within the limits? 3. Parallel Requests: How does the API handle multiple parallel requests? Is there a difference in how the RPM limit is applied to parallel requests compared to sequential requests? 4. Best Practices: What are the best practices for managing and optimizing the number of requests to the API? Are there any specific strategies or tools recommended for handling high volumes of requests efficiently? Example Scenario: Imagine I have an application that needs to process a large number of images using the GPT-4 Vision model. I want to understand how I can maximize the throughput without hitting the rate limits imposed by the API. Any insights, documentation references, or personal experiences you can share would be greatly appreciated. Thank you in advance for your help! Best regards, Sparty_Aanvikshiki