Explore Companies

Software Engineers Community

by Anise Nadeen

4 months ago

How do you write performant code for large data sources?

Lately, I have been trying to get into some AI workflows. It appears that conventional programming is usually less performant that Cython/.pyx implementations.

Some learnings from my explorations:

When working with DataFrames in Python, there are different methods to perform operations on data. Usually you would use iterrows, apply, or vectorized operations. Each method has different performance characteristics, especially as the size of the DataFrame grows.

To investigate this, I went to GPT and asked it for some synthetic simulation code to test out this thesis.

Generate DataFrames of sizes 10, 100, 1000, 10000, and 100000.

Then profile the following funcs:
1. iterrows: Iterate through each row, adding a constant to a column's value.
2. apply: Use the apply function with a lambda to add a constant to each item in a column.
3. Vectorized: Directly add a constant to the column using vectorized operations.

So for large DataFrames, avoid iterrows due to its poor performance. Use vectorized operations for the best efficiency, and resort to apply only if vectorized operations are not feasible. Most of your usecases should ideally be solved via vectorized ops tbh.

Dezi Everett

Cred

4 months ago

You can get more advantage using Nvidia cuDF, Dask, Polars based on where the real bottleneck is for your usecase.

Aaron Carmden

Stealth

4 months ago

Jordon Carmden

Gojek

4 months ago

Polars is insane.

Aaron Lee

TCS

4 months ago

When will be promotion Cycle will be initiated in this quarter ?

Kalan Carmden

Stealth

4 months ago

tomorrow :)

Anise Carmden

TCS

4 months ago

From where you got this information that it is open from tomorrow ?

See more comments

Dezi Taye

Amazon

4 months ago

Why are you posting this here bro. Post this on Medium / stack overflow

Aaron Everett

Gojek

4 months ago

Ah I’ve been posting here for so long man. That’s why I post here. Medium pe it’s just vanity. Atleast here people will actually read it.

Jordon Lee

EY

4 months ago

+1 good

Sign in to a Grapevine account for the full experience.

Discover More

Curated from across

IT Company Discussion on

by Jordon Lee

TCS

Understanding Multiple Request to Azure OpenAI GPT-4 Vision Model API

Indian Startups on

by Kendall Lee

Stealth

Do you use code interpreter for your tasks?

Product Managers on

by Aaron Carmden

Unicorn-Startup

AMA 🎙️: Hi, I am diffusedbandit, Senior ML Engineer at an Indian Unicorn

Software Engineers on

by Jordon Gabriel

PayU

Most common question in an interview

Personal Finance on

by Jordon Nadeen

The/Nudge Institute

Career guidance for a switch into Data Analytics

Indian Startups on

by Aaron Everett

Stealth

AMA - I work at a VC fund

Software Engineers on

by Anise Dean

Others

LeetCode is a psyop designed by Big Tech to hire rule-followers who can grind away at pointless tasks

Software Engineers on

by Dezi Lee

Stealth

Suggestions for tackling scaling issues

Data Scientists on

by Dezi Nadeen

Student

Possibility of getting a fresher role in the data science/ ML field

Software Engineers on

by Kendall Nadeen

Stealth

Interesting tough engineering problems you ask in interviews?

Indian Startups on

by Kendall Nadeen

Student

Exploring Seed Funding Options for Our AI Innovation

Data Scientists on

by Dezi Lee

Stealth

What kind of DS work do you do?

Software Engineers on

by Kalan Denver

Stealth

Do you guys think chatgpt is just another bubble

Software Engineers on

by Aaron Hyrum

Stealth

Trying to get into Data science

Software Engineers on

by Jordon Carmden

Walmart

How do you solve for peak concurrency? >10,000 users at one point?

Indian Startups on

by Blair Everett

Series B Startup

Unsure about career progression

Home
How do you write performant code for large data sources?

Download the Grapevine app.

Help & Support support@gvine.app

Privacy Policy Community Guidelines

Grapevine™ 2024, All rights reserved