JazzyPotato

BigData- How does data parsing happen in your company?

Hi community,

Context: In my current company, we have a data-pipeline, which in short works like this: we get raw events from Kafka dumped in S3. We run a batch job (Airflow), this job essentially picks up the raw jsons in s3, enforces data parser logic (we have a service written in python where we explicitly define what attributed we want from raw json, these attributes are accordingly parsed), the parsed data is then converted to CSV/parquet formats and dumped in s3 in another folder and later loaded into tables, which is used for analytics, etc.

Problem: Today for every new event we generate, we have to write a parser logic from scratch, if the event structure is different. In case of small changes we can update attributes we want to parse in code itself. But post that we have to deploy changes which takes time. Is there a smarter way of doing it? For example, having a UI interface, where we select the attributes from json (that could include nested attributes), and that is parsed and dumped in s3, later loading happens. And if we want to update parser, we can do so from UI itself than going into code updating things, deploying, etc.

Do we have any open source alternatives here? Or any good engineering blogs which has covered such/similar scenario?

11mo ago

Find out if you are being paid fairly.Download Grapevine

PeppyBanana

Athenahealth11mo

Try asking it in subreddit here https://www.reddit.com/r/dataengineering/s/mcFo9ng1t0

JazzyPotato

Amazon11mo

Noted. Thanks buddy.

JazzyPotato

Amazon11mo

In simple words, I want to have an abstraction over the raw data I wanna parse, and make things language agnostic.

Discover more

Curated from across

Data Scientists13mo

by ZoomyBagelTCS

Any Data engineers here? Need some suggestions.

Hello everyone,

I'm currently working as a data engineer and trying to upskill myself. My current tech stack is python, pyspark,pandas and SQL. Use s3 for storage and Apache airflow for jobs. I have a few questions regarding the same an...

Software Engineers14mo

by ZestyBagelShipRocket

How so you end up managing tracking event stream of customer data?

How do you manage event stream, external product/in house?

Same for Data pipelines

Same for reporting

Software Engineers13mo

by ZoomyBagelTCS

Any Data engineers here? Need some suggestions

Hello everyone,

Ask a question on Grapevine.

Get the app on Android or iOS.

Privacy Terms

Guidelines Help