Organizations are facing mounting challenges in moving operational data into data warehouses and data lakes, despite growing reliance on data streaming to support operational and artificial intelligence (AI) systems. These findings emerge from a new research study released by Conduktor, the intelligent data hub for streaming data and AI, based on a survey of 200 senior IT and data executives at large enterprises with annual revenues exceeding USD 50 million.
The study reveals that while enterprises increasingly depend on streaming data, the diversity of tools and platforms used for data ingestion is creating complexity, inefficiencies, and governance gaps. The top challenges cited by respondents include maintaining reliable infrastructure, protecting sensitive data as it moves into storage systems, and integrating and synchronizing multiple data sources across lakes and warehouses. Governance and internal skills shortages were also identified as major barriers, particularly as organizations struggle to control, validate, and track data at scale.
“Fragmented data pipelines are slowing down decision-making at a time when organizations need real-time insights more than ever,” said Nicolas Orban, CEO, Conduktor.
Respondents reported using a wide array of data lakes and platforms, including Amazon S3 and Lake Formation, Databricks Delta Lake, and Google Cloud Platform. On the warehouse side, Google BigQuery, Amazon Redshift, Azure Synapse Analytics, and IBM Db2 Warehouse were among the most commonly used solutions. To move data from streaming systems into these environments, organizations rely on a mix of approaches—ranging from custom-built pipelines using Spark or Flink to tools such as Kafka Connect, managed services like Firehose or Snowpipe, micro-batching techniques, and ELT or ETL platforms.
However, this fragmented ecosystem is creating significant operational pain points. According to the research, the three most pressing issues are time inefficiency in collecting and analyzing data, growing complexity caused by frequent schema changes, and the burden of managing parallel architectures that require additional resources and expertise.
Commenting on the findings, Nicolas Orban, CEO of Conduktor, said that as streaming data adoption accelerates—particularly for AI use cases—organizations must place greater emphasis on governance. He noted that fragmented data environments often result in missed insights, duplicated efforts, and poor business decisions, undermining the value of real-time data.
The study aligns with broader market trends. Dataintelo estimates that the global streaming data processing software market, valued at USD 9.5 billion in 2023, is projected to reach nearly USD 23.8 billion by 2032, driven by the surge in real-time data from IoT devices, social platforms, and enterprise systems.
Conduktor believes that unifying access, governance, and observability across streaming data operations is critical for enterprises seeking to reduce complexity, improve productivity, and make their operational data trusted and AI-ready.
