The Essentials of an AI Estate

Sanjiv Singh | 24 Jul 2024

14 min. read

In 2024, there has been a confluence of industry, academia, and government emphasising the importance of artificial intelligence (AI) as the next frontier in innovation.

AI is touted to spur on new ways to automate business tasks, streamline complex processes, and as a “co-worker” in our everyday lives, much like the personal computer revolution of the 1980s and the smartphone revolution of the mid-2000s.

High-quality data is essential for the business application of AI. This article, intended for the business or IT reader working on AI initiatives, touches on AI and the capabilities it demands from a modern platform that underpins effective data management and the AI solutions it powers.

What is Artificial Intelligence?

AI, put broadly, is the application of intelligence – ways of perceiving, analysing, and learning – in a machine.

AI, insofar that computing is concerned, is not a new technology or process. AI has existed as long as integrated circuits and central processing units have in various forms. It started in the 1940s and was formally established as an academic discipline in 1956 by late Professor John McCarthy, one of the founding fathers of AI. Alan Turing, an English computer scientist who formalised the concepts of algorithm and computation with the Turing machine, posited in 1950 that a computer would be powerful enough to “imitate” the thinking of a human being; and any such computer or application would then pass the “Turing test.”

The first attempt with AI was made in 1966 with the ELIZA natural language processing program. Of course, we now have access to machines with several magnitudes greater processing power than that of ELIZA.

Since then, AI has witnessed many “winters”, periods of reduced funding and interest, followed by periods of renewed interest.

Today, machine learning (ML) – a branch of AI that enables computers to learn patterns from data without human intervention – are predictive or heuristic models that can mimic human intelligence, be trained to seek and analyse patterns in data, and operate independently of user input. During the rise of big data and analytics, ML applications were working behind the scenes to speed up manual data analysis tasks.

In recent years (2022-present) generative AI – producing text or images or organised data or code based on human input – has been helped along by faster and more capable models trained using unstructured data.

Generative AI machines are being engaged as intelligent agents capable of communicating in human language freeing up time from mundane tasks.

There is a shift from experimentation and prototyping to practical aspects of securely deploying AI-infused business automation tasks.

AI is being increasingly seen in customer journeys, recognising and mining documents to read information and derive insights, comparing products and proposals, and creating new content for personalised product offers and documents.

What does this mean for a business looking to harness the power of AI?

Applications and Scope of AI in Modern Business

AI can be applied to a growing number of business tasks, hitherto exclusively achieved through human expertise and judgement. Models are versatile in generating predictions on text, images, videos, and human and computer languages.

The main aim of AI-infused automation is productivity.

Though not all business applications are suited to AI. Some automation solutions are more cost-effective with traditional application software engineering and apps. In some scenarios, AI may bring unacceptable risks beyond the business appetite. For executives, these are important considerations before green-lighting an AI initiative as a worthy investment.

There is a common theme we hear from business leaders: AI must perform extremely well in its designated task and not be cost-prohibitive. Where AI is a good fit, and there are several use cases where it is, it should not require years to implement or to see value from it. They expect that the AI solution is practical, aligned, and robust within their business domain.

Here are few practical examples of business AI adoption. These examples, and more, rely on high-quality data.

Identifying documents, extracting information, and deriving insights freeing up staff from preforming basic and mundane tasks.
Increasing the confidence of credit providers in the cash reserves they set aside for expected credit loss of impaired loans.
Making marketing campaigns more relevant with micro-segmentation of customers creating targeted recommendations and personalised offers without human intervention.
Speeding the process and improving the quality of drafting proposals and grant applications against tender requests.
Assisting procurement teams and technical buyers evaluate tender submissions, proposals and product specifications saving time and low productive work for expensive resources.
Identifying customers most likely to churn and taking pre-emptive steps to improve retention.
Easier information searching, summarising, and insights retrieved using human language queries from knowledge repositories residing deep within the enterprise.
Virtual assistants reducing call centre volume and costs because they understand human commands and complete tasks without human intervention.
Staff rostering linked with labour demand forecasting.
Reduced factory machine downtime with failure predictions.

null

Quality Data is Critical for Business AI

The algorithms of AI models rely on data for learning; data that is high-quality, accurate, representative, of sufficient volume, digitised, and stored.

The right data are crucial for AI. Not getting the data right significantly impact how the models behave. The well-worn principle from computer science – garbage in, garbage out – equally applies to AI. If models are trained on non-representative data, their predictions will be biased. Incomplete or insufficient training data will make predictions unreliable and not meaningful. Conversely, more data does not necessarily mean more or better insights.

A properly designed technology infrastructure is necessary for reliable, trusted, and transparent data supply chains. It comprises collecting, storing, transforming, moving, disposing, securing, and accessing data.

Once the AI models are developed, trained, and tested they are packaged into solutions, deployed securely, integrated with the data supply chain, and monitored in operations.

We consider this combined infrastructure for the data supply chain and AI solutions as the AI Estate.

No matter where you are on the AI journey – improving insights for decision-making, evaluating business AI, deploying intelligent agents for automation – a well functioning AI Estate is essential.

Essential Capabilities of a Modern AI Estate

The application of an AI Estate to a financial or data-driven business is crucial to the next stage of business development. As such, the capabilities of this AI Estate need to address key issues facing business right now.

According to Forrester, a data architecture needs to accommodate novel and edge-related data repositories that are consistent, protected against nefarious actors, work in real-time, and integrate the ever-growing ecosystem of external and third-party data.

A complete data platform will have AI as the cornerstone for much of its efficiency, security, transparency, automation, and collaborative improvements, dispensing with much of the need for ground-up engineered solutions to specific, niche tasks. This is all capable and achievable with AI in its supervised, unsupervised, and generative types.

The AI Estate has capabilities for collecting, securing, organising, storing, preparing, exploring, and analysing data from multiple streams itself facilitated by AI. It supports different types of workloads for data engineering, data science, data warehousing, analytics, and reporting. It also includes modules, inherent or bolted-on, for security, data governance and information control. It seamlessly integrates with development tools and ML production systems.

Data Ingestion

Without good data, an AI is useless, like an automobile without any fuel in the tank. A comprehensive data ingestion capability is required for any data-driven business application. Standard out the box connectors bootstraps wiring the Estate to multiple enterprise systems to source and ingest internal operational data and that from external third parties. The connectors move data of varying volumes, formats, and types at different frequencies (batch, real-time) into the Estate stores.

Data ingestion usually demanded high-level programming knowledge. Now, this has become more inclusive, owing much to generative AI. It is now possible to create queries, perform data transformation steps, and generate new queries based on existing queries using natural language inputs. For example, if you require data cross-referenced between two years focusing on sales, you need only type that (or even vocalise that) into a generative AI model to engage the query.

Unified Storage

Once the data is ingested it lands in a “data lakehouse” – an architecture pattern combining data lake and relational data warehouse (RDW). The Lakehouse (portmanteau of lake and warehouse) simplifies data management and reduces data duplication.

The data lake part accommodates semi structured and unstructured data (documents, images, videos, documents, JSON, XML). Typically, the data scientists and power users with higher technical skills make best use of it due to its complex folder-file structure and separation of metadata. They use compute sandboxes of varying sizes to explore, refine, and clean the data before promoting to downstream processes.

RDW style capabilities are added to the data lake by adding “delta lake” features. This adds support for data manipulation language (DML) with commands that simplify complex data management tasks, make data handling more flexible, and add atomicity, consistency, isolation, durability (ACID) properties to the data lake. It provides a “time travel” feature to query data at a specified point in time.

The RDW part has been around longer, and the non-technical users are accustomed to accessing data from here. The RDW holds data that is structured, offers low latency and is much faster at querying. They support high-performance operational reporting and dashboards and makes self-service BI much easier with a metadata layer on top.

The lakehouse allows organisations to virtualise data across clouds, accounts, and domains without overwriting or changing ownership.

Shortcuts and AI-powered queries assist every department of an organisation to use data insights at their command.

The lakehouse should foster a lasting and lucrative data-oriented culture.

Analytics, Data Science and Engineering

A modern AI Estate powered by machine learning enables data scientists and engineers to manipulate and perceive data as they wish using SQL, familiar programming languages like Python, R, Java, and Scala, and use natural language queries. Harnessing data from the lakehouse the data scientists and engineers can visualise data while taking advantage of intelligent workload management, workload isolation, and limitless concurrency. In the modern AI Estate, specialist or generalist data scientists are empowered to create analytics dashboards or proofs of concept quickly without extensive coding, improving business efficiencies.

Governance and Information Control

Data security is paramount at any organisation. According to the Office of the Australian Information Commissioner, notifiable data security breach reports in 2023 jumped by 19% compared to the previous year, totalling 483. The Australian Signals Directorate estimates that the average cost of a data breach or cybersecurity intrusion costs a business an average of AUD $71,600 per incident.

A data platform should be able to govern system-wide information security through domains and data loss prevention techniques powered by AI labels and classification systems. Sensitivity labels can be automatically applied to new data, and data inputted or retrieved can be audited at the user level. AI can detect sensitive data uploads or downloads and alert administrators or data owners, preventing breaches or inappropriate use. A modern AI Estate of this calibre will also have capabilities for organising data using AI generated metadata, powering a master data management system.

Security control and assurance capabilities are weaved into the architecture. It provides specifying the access rights and privileges to data and environments by authorising who can log in (users and systems) and what they can access based on their roles. Controls over data and records can be applied at a more granular level with row-level security and data masking for privacy. Active monitoring and alerting prevent data related issues from occurring. Immutable logs of access and user behaviours help in auditing and assurance.

Information security management is multidimensional. It is not only the realm of technology architecture and its operating controls but extends to data management and loss prevention controls and a culture with staff knowledgable on safe data handling procedures.

Generative AI Automation

The AI Estate weaves all the capabilities of the data estate and enables users of all stripes to create reports, analyses, insights, and virtually any type of visualisation or data transformation using semantic (language) prompts, powered by a pre-trained generative Large Language Model (LLM) AI. Generative AI can complete code, automate routine tasks, summarise complex data sets or reports, and nearly anything else a layperson or data engineer would need from their data lakehouse and analysis platform. It can be programmed to take on different personas and view data from those perspectives. It reduces costs in time and labour through executing instructions in natural language instead of waiting for specific code to be written, debugged, and deployed. The AI should unify, automate, and simplify all the functions and results of the data architecture.

Reporting, Dashboards, and Insights

A modern AI Estate is only as useful as the insights it brings; and this requires a robust and simple to use reporting dashboard and interface. The data lakehouse and data engineering side of the equation usually required high-level programming language familiarity to produce reports with an acceptable level of insights for business. The AI Estate requires a dashboard which can be configured using LLMs interpreting natural language. This should harness all facets of the modern AI estate, encapsulated in a few words (e.g. “Show me the sales figures cross-referenced with inquiries for every Tuesday over the last twelve months”). This should be able to ingest data, capture snapshots, and output insights in real-time, without making significant changes for other users of the fabric. This data should also have the capability of being routed to other apps and dashboards to unlock greater insights and create further in-depth analyses. AI should also be able to detect significant changes to in-stream data, alerting users to emerging or anomalous patterns.

DevOps and MLOps

The development of AI models and implementing solutions that use AI requires DevOps and MLOps components. DevOps, a practice that combines development and operations, streamlines and automates the software development lifecycle. It enhances the collaboration between development and IT operation teams ensuring rapid, reliable, and consistent software delivery. MLOps extends DevOps principles to machine learning operations for managing model training, versioning, monitoring and continuous improvement. Together, DevOps and MLOps form a crucial part of the AI Estate enabling efficient deployment, scaling, and maintenance of AI-driven solutions.

AI Monitoring

AI monitoring is the ongoing process of overseeing and evaluating the performance, accuracy, and reliability of AI systems. This involves tracking key metrics (e.g. error rate), identifying potential issues (e.g. errors trending out of band), and ensuring that the AI models operate as intended (e.g., correcting model drift). Monitoring is an important part of AI success because it helps maintain the integrity and effectiveness of AI applications. The AI Estate that provides the ability to monitor and audit AI operations to quickly detect and address problems, optimise performance, and ensure compliance safeguards the AI investments and enhances the trust and transparency in its outcomes.

null

Adoption and Success

There are several data cloud platforms offered by the mainstream platform providers.

Similarly, several AI platforms (predictive and generative) are available from mainstream and niche providers.

A plethora of lifecycle management tools complement the AI Estate for data management, AI modelling, and software engineering.

The platform and tools for data and AI must ultimately align to business strategy, fit the use cases, and continue to develop on a well-established roadmap.

With the platform and tools selected, the next action is to acquire, architect, design, configure, integrate, test, and deploy the AI Estate within the enterprise. Short and well-conceived proofs of concept and protypes help understand the risks and clarify the intended impact of technology.

The AI Estate powers the technology infrastructure. It alone cannot assure success. It must be accompanied by changes to management processes, roles and responsibilities, and the organisational culture.

Policies are needed to guide the safe and responsible use of data and AI. Operational risk frameworks must include managing risks relating to data, its quality, and AI. Planning cycles actively consider budget allocations for AI initiatives.

As staff become adept at data and AI usage, their job descriptions must be adjusted. Training technical and business users helps increase their data and AI literacy relevant to their jobs. Supporting them in operations bolsters adoption.

The mix of talent profile changes over time with an emphasis on staff with a combined competency in business and technology. AI experts become necessary for those organisations building an AI development capability in-house.

Business leaders play a critical role in fostering a positive and supportive culture for data and AI. Feedback from operations reenforce successes, address issues, and expand AI utility in a continuous cycle.

The Costs of Standing Still

An AI Estate with a supportive culture and data architecture is proving to be a transformative, not an additive, proposition to AI-infused business task automation and generating value from data. Just as we cannot imagine a modern business running without computers, it equally is unthinkable for a business to forgo the multiplicative improvements in productivity, security, and customer outcomes using a sound foundation for AI.

In any “S-curve” innovation diffusion model, the late-adopters or laggards (as researcher Everett Rogers called them) will be at a disadvantage to businesses that seize on the opportunity that innovations afford them. Using Rogers’ model, AI satisfies all five criteria for adoption; it provides instant relative advantage to what came before; is already compatible with current systems and as such, already familiar; can be tried in isolation before implementation; and produces tangible and observable results.

How we can help

Irada assists growth-minded organisations apply practical, aligned, and robust AI solutions in business operations.

References, Resources, Readings

Yuhanna, N. (et. al,) ‘Data Fabric 2.0 For Connected Intelligence’ at Forrester. (February 16, 2023)
Office of the Australian Information Commissioner, ‘Notifiable data breaches report July to December 2023’. (22 February 2024)
Australian Signals Directorate, ‘ASD Cyber Threat Report 2022-2023’. (14 Nov 2023)
Boston University School of Public Health, ‘Diffusion of Innovation Theory’. (November 3, 2022)

Links to external websites were correct at the time of publishing. Irada is not responsible for the content of external websites.

The information in this article is general in nature. Your circumstances and needs may vary.

This work is licensed under CC BY-NC-SA 4.0