Generative AI Meets Data Engineering: Automating Code, Query Generation, and Data Insights in Large scale Enterprises
Main Article Content
Abstract
Generative AI for Large-scale Enterprise Data Engi- neering: Automating Code and Query Generation, Data Insights, and Analytics. In a data-driven economy, organizations across verticals show a growing demand for ingesting terabytes (or petabytes) of data frequently and performing rapid analytics on the data. However, a sizable effort is required to transform the data into a standardized, query-optimized format. Data scientists have long wished to automate code and query generation for their projects. The advent of generative AI holds the promise to enable such automation. Major technology companies have incorporated generative AI modules within their products. For instance, publicly available large language models (LLMs) can take natural language text as input and generate code in Python, Java, and SQL. OpenAI publicly hosts a jacuzzi to invoke these LLMs, and the open-source community has built similar services. These tools have gained immense popularity, with over 100 million users in just one year. This paper discusses the integration of generative AI tools into large-scale enterprise data engineering workflows for code and query generation, data insights, and analytics.