How do databases support AI algorithms?
The The Transform Technology Summits begin October 13 with Low-Code / No Code: Enabling Enterprise Agility. Register now!
Databases have always been able to do simple office work, like finding particular records that match certain criteria, for example all users between the ages of 20 and 30. Lately, database companies have added artificial intelligence routines to databases so that users can explore the power of these smarter, more sophisticated algorithms on their own data stored in the database.
AI algorithms also find a place below the surface, where AI routines help optimize internal tasks such as reindexing or scheduling queries. These new features are often billed as an addition to automation as they take the burden off the user from housework. Developers are encouraged to let them do their job and forget about them.
However, there is much more interest in AI routines open to users. These machine learning algorithms can categorize data and make smarter decisions that evolve and adapt over time. They can unlock new use cases and improve the flexibility of existing algorithms.
In many cases, the integration is largely pragmatic and essentially cosmetic. The calculations are no different than those that would occur if the data were exported and sent to a separate AI program. Inside the database, the AI routines are separate and simply take advantage of any internal access to the data. Sometimes this faster access can speed up the process considerably. When data is important, moving it can sometimes take a significant amount of time.
Integration can also limit analysis to algorithms that are officially part of the database. If users want to deploy a different algorithm, they have to go back to the old process of exporting data to the correct format and importing it into the AI routine.
The integration can take advantage of some of the newer distributed in-memory databases that easily distribute the load and data storage across multiple machines. These can easily handle a large amount of data. If a complex analysis is required, it can be easy to increase the CPU capacity and RAM allocated to each machine.
Some AI-powered databases are also capable of taking advantage of GPU chips. Some AI algorithms use the highly parallel architecture of GPUs to train machine learning models and run other algorithms. There are also custom chips designed specifically for AI that can dramatically speed up analysis.
One of the main advantages, however, can be the standard interface, which is often SQL, a language already familiar to many programmers. Many software packages already interact easily with SQL databases. If anyone wants more AI analysis, it’s no more complex than learning the new SQL statements.
What are established businesses doing?
Artificial intelligence is a very competitive field now. All the major database companies are exploring the integration of algorithms into their tools. In many cases, companies offer so many options that it is impossible to summarize them here.
Oracle has integrated AI routines into its databases in a number of ways, and the company offers a wide array of options in almost every corner of its stack. At the lower levels, some developers, for example, run machine learning algorithms in the Python interpreter built into Oracle’s database. There are also more integrated options like Oracle’s Machine Learning for R, a version that uses R to analyze data stored in Oracle databases. Many services are integrated at higher levels, for example as analysis functionalities in data science or analysis tools.
IBM also has a number of AI tools built into their various databases, and the company sometimes calls Db2 “the AI database”. At the lowest level, the database includes functions in its version of SQL to tackle common parts of building AI models, such as linear regression. These can be assembled into custom stored procedures for training. Many IBM AI tools, such as Watson Studio, are designed to connect directly to the database to speed up model building.
Hadoop and its ecosystem of tools are commonly used to analyze large data sets. While they are often thought of as more data processing pipelines than databases, there is often a database like HBase buried inside. Some people use the Hadoop distributed file system to store data, sometimes in CSV format. A variety of AI tools are already integrated into the Hadoop pipeline using tools like Submarine, effectively making it a database with built-in AI.
All the big cloud companies offer both databases and artificial intelligence products. The degree of integration between a particular database and a particular AI varies widely, but it’s often quite easy to connect the two. Amazon’s Comprehend, a natural language text analysis tool, accepts data from S3 buckets and stores responses in many locations, including some AWS databases. Amazon’s SageMaker can access data in S3 buckets or Redshift data lakes, sometimes using SQL through Amazon Athena. While it is legitimate to wonder if these count as true integration, there is no doubt that they simplify the journey.
In Google’s cloud, the AutoML tool for automated machine learning can retrieve data from BigQuery databases. Firebase ML offers a number of tools to address common challenges for mobile developers, such as image classification. It will also deploy any TensorFlow Lite model trained to work on your data.
Microsoft Azure also offers a collection of databases and AI tools. The Databricks tool, for example, relies on the Apache Spark pipeline and comes with connections to Azure’s Cosmos DB, its Data Lake storage, and other databases like Neo4j or Elasticsearch that can run in Azure. Its Azure Data Factory is designed to find data in the cloud, both in databases and in generic storage.
What are the upstarts doing?
A number of database startups are also emphasizing their direct support for machine learning and other AI routines. SingleStore, for example, offers quick scans to track incoming telemetry in real time. This data can also be evaluated according to various AI models as it is ingested.
MindsDB adds machine learning routines to standard databases such as MariaDB, PostgreSQL or Microsoft SQL. It extends SQL to include learning functionality from data already present in the database to make predictions and classify objects. These features are also easily accessible in more than a dozen business intelligence applications, such as Salesforce’s Tableau or Microsoft’s Power BI, which work closely with SQL databases.
Many companies effectively bury the database in the product and only sell the service itself. Riskified, for example, tracks financial transactions using artificial intelligence models and offers merchant protection through “chargeback guarantees.” The tool ingests transactions and keeps historical data, but there is little discussion about the database layer.
In many cases, companies that can present themselves as pure AI companies are also database providers. After all, the data has to stay somewhere. H2O.ai, for example, is just one of the AI cloud providers that offer integrated data preparation and artificial intelligence analysis. Data storage, however, is more hidden, and many people think of software like H2O.ai’s first for its analytical power. Yet it can both store and analyze data.
Is there something that built-in AI databases can’t do?
Adding AI routines directly to a database’s feature set can make life easier for developers and database administrators. It can also make the scan a bit faster in some cases. But beyond the convenience and speed of working with a single set of data, it does not offer any significant and ongoing benefit over exporting the data and importing it into a separate program.
The process can limit developers who can choose to explore only algorithms that are directly implemented in the database. If the algorithm is not part of the database, this is not an option.
Of course, there are many problems that cannot be solved with machine learning or artificial intelligence at all. Integrating AI algorithms into the database doesn’t change the power of the algorithms, it just speeds them up.
VentureBeat’s mission is to be a digital public place for technical decision-makers to learn about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in managing your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the topics that interest you
- our newsletters
- Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
- networking features, and more
Become a member