Unlocking the Code: AI’s Data Dilemma
With the capabilities of artificial intelligence (AI) developing at such a rapid pace, one of the biggest challenges for data teams and engineers is how to process the large volume of unstructured and heterogeneous data sources.
Unlike structured data, which fits neatly into tables and databases, unstructured data is built from a variety of formats, including video, text, and images. These formats all have their own complexity, and the heterogeneity of these data sources can add even more complexity.
With this in mind, can teams find a way to optimize the collection and analysis of their data to maximize the impact of AI on their business? Given how activity is evolving, agent-based systems and agent-agent communication seem to be the golden idea that will take the AI movement to the next level.
Senior Partner and Global Lead for AI & Data at Kearney.
The historical challenge of unstructured data
Historically, unstructured data such as audio, video, and social media interactions has posed a significant challenge for businesses attempting to interpret and transform it into formats that are properly structured for analytics and AI applications. For many organizations, the sheer complexity and expense of processing this unstructured data has meant that it has remained severely underutilized until recently.
As a result, organizations are increasingly turning to structured data, such as Excel files and tags for search engine optimization (SEO), despite the fact that unstructured data makes up the majority of available data and has significant untapped potential.
In recent years, however, technological developments in AI, along with generative AI, have changed the way unstructured data can be interpreted and extracted.
For example, major cloud companies including Microsoft and Google have expanded their cloud services to support the creation of “data lakes” of unstructured data. Microsoft’s Azure AI now uses a combination of text analytics, optical character recognition, speech recognition, and machine vision to interpret an unstructured data set that can include text or images. With these advancements, businesses can now access this richer source of data and finally unlock its value.
What are the current problems with unstructured data?
Organizations can now tap into a wealth of information that was previously inaccessible.
However, this still comes with its own challenges. For example, navigating the varying levels of content quality, scope, and detail of this unstructured data can be a significant hurdle. With unstructured data, there is often much more irrelevant noise. If there is too much of this noise, it can be challenging even for AI to accurately identify answers when sifting through information.
Additionally, the lack of regulation when it comes to creating unstructured data can impact its usability. While these larger datasets generally offer a higher degree of consistency, it is still a challenge to adapt them to be used by AI and thus more effectively leveraged by organizations.
To effectively use unstructured data, you typically need to incorporate it into an organization’s existing data framework. A comprehensive understanding of the data’s properties, connections, and potential applications is necessary for this integration. A major challenge for many of these unstructured projects is simply defining a clear goal so that these models can be accurately trained.
Many organizations still struggle to leverage these existing data assets to generate business value.
While the problem of unlocking and obtaining the data has largely been solved, being able to predict its potential value and applications remains a major hurdle.
What else is expected from the GenAI movement?
In the future, we should expect to see a decline in human involvement in data sourcing and interpretation. Instead, we will likely see an increase in agent-based systems, along with agent-to-agent communication, which minimizes the need for human intervention in data processing. The boom in generative AI has paved the way for specialized agents, including:
- “Engineering agents” for code generation
- “Data generation agents” for creating synthetic data for testing
- “Code Test Agents” for validating and testing code
- “Documentation agents” for generating documentation for various aspects such as code, use cases and processes
There is no doubt that a system in which specialized AI agents communicate with each other can accelerate development, make it more accurate and consistent.
Organizations can now focus more resources on using data instead of preparing it. It is very likely that in the near future we will see these AI agents being offered as a product by service providers. These service providers could take a company’s requirements and then produce fully tested, spec-compliant code that is produced by AI agents.
By outsourcing these technical tasks, companies would significantly reduce the time it takes to perform these types of tasks, as well as the need for large in-house development teams. It seems that now is the time for companies to consider the specific roles that generative AI can play in maximizing the value of their data programs and ultimately achieving much better results from their investment in these newly expanded areas.
It has been known for some time that generative AI has the potential to revolutionize the way organizations operate. However, implementing it effectively in organizations still means addressing its weaknesses before reaching its full potential.
Organizations have yet to fully embrace AI-friendly data acquisition and integration. Those that adapt can maximize investment value and change their fortunes for the better.
We’ve highlighted the best AI chatbots for businesses.
This article was produced as part of TechRadarPro’s Expert Insights channel, where we showcase the best and brightest minds in the technology sector today. The views expressed here are those of the author and do not necessarily represent those of TechRadarPro or Future plc. If you’re interested in contributing, you can read more here: