2023 has been one of the most exciting years to witness the breakthrough of AI technology and Generative AI in particular, with the increasing popularity of ChatGPT (Generative Pretrained Transformer) and LLM (Large Language Models). This is thanks to its impressive ability to comprehend human languages and make decisions that remarkably mimic human intelligence.
ChatGPT reached an unprecedented milestone of 1 million users within five days. Since then, big tech giants have been quickly entering the race, releasing dozens of LLMs both open source and proprietary, such as LaMDA (Google AI), Megatron-Turing NLG (NVIDIA), PaLM (Google AI), Llama-2 (Meta AI), Bloom (Hugging Face), Wu Dao 2.0 (Beijing Academy of Artificial Intelligence), Jurassic-1 Jumbo (AI21 Labs) and Bard (Google AI), etc.
Alongside the race of big tech giants, the adoption of ChatGPT and LLMs in business is growing rapidly. According to the Master of Code Global report “Statistics of ChatGPT & Generative AI in business: 2023 Report”, 49 per cent of companies presently use ChatGPT, while 30 per cent intend to use it in the future.
Another report by Forbes suggests that 70 per cent of organisations are currently exploring generative AI, which includes LLMs. This suggests that LLMs are gaining traction in the enterprise world and that more and more companies are seeing the potential of this technology to revolutionise their businesses.
Multimodal Generative AI
Although ChatGPT and most other LLMs have been demonstrating superior performance in human language understanding (in text form), text is just one kind of data modal human beings perceive every day. However, multimodal data is ubiquitous in the real world, as humans often communicate and interact with all types of information, including images, audio, and video.
Multimodal data also poses significant challenges for artificial intelligence systems, such as data heterogeneity, data alignment, data fusion, data representation, model complexity, computational cost, and evaluation metrics. The AI community, therefore, often opts to successfully address the unimodal data before dealing with more challenging ones.
Inspired by the tremendous success of LLMs, the AI community has been creating Large Multimodal Models (LMMs) that can achieve similar levels of generality and expressiveness in the multimodal domain. LMMs can leverage massive amounts of multimodal data and perform diverse tasks with minimal supervision.
Also Read: Navigate in a cookie-less world, leverage AI and think community-first
Incorporating the other modalities into LLMs creates LMMs, which solve many challenging tasks involving text, images, audio, videos, etc., such as captioning images, visual question answering, and editing images by natural language commands etc.
OpenAI has been pioneering the development of GPT-4V, the upgraded multimodal version of the GPT-4 model that can understand and generate information from both text and image inputs. GPT-4V can perform various tasks, such as generating images from textual descriptions, answering questions about images, and editing images with natural language commands.
- LLaVA-1.5: This is a model that can understand and generate information from both text and images. It can perform tasks such as answering questions about images, generating captions for images, and editing images with natural language commands.
- Alpaca-LoRA: This is a model that can perform various natural language tasks by providing natural language instructions or prompts.
Adept, on the other hand, has been aiming at a bigger ambition: building an AI model that can interact with everything on your computer. “Adept is building an entirely new way to get things done. It takes your goals in plain language and turns them into actions on the software you use every day.” They believe that AI models reading and writing text are still valuable, but ones using computers like human beings are even more valuable to enterprise businesses.
This is driving the race among big tech companies to deliver Large Multimodal Models. It will take a few years for LMMs to reach the same levels as LLMs today.
Generating vs leveraging Large Foundation Models
Producing AI applications for many diverse tasks has never been easier and more efficient than before. Recalling several years ago, if we would like to make a sentiment analysis application, for example, it may take a few months to implement POC with both in-house and public datasets.
It also takes a few months to deploy the sentiment analysis models into the production system. Now, LLMs facilitate the development of such applications in a few days, simply formulating a prompt for LLMs to evaluate a text as positive, neutral, or negative.
In the field of computer vision, visual prompting techniques, introduced by Landing AI, also leverage the power of Large Vision Models (LVMs) to solve a variety of vision tasks, such as object detection, object recognition, semantic segmentation, etc.
Also Read: How Asia Pacific startups propel the evolution of Generative AI
Visual Prompting uses visual cues, such as images, icons, or patterns, to reprogram a pre-trained Large Vision Model for a new downstream task. Visual prompting can reduce the need for extensive data labelling and model training and enable faster and easier deployment of computer vision applications.
Generating pre-trained Large Foundation Models (LFMs), including LLMs and LVMs, requires not only AI expertise but also a huge investment in infrastructure, i.e., data lake and computing servers. Hence, the race to create pre-trained LFMs among big tech companies this year will continue in 2024 and in the years to come.
Some are proprietary, but many others are open source, leading to diverse alternatives for enterprises. Meanwhile, small and medium enterprises (SMEs) and AI startups will be the main forces in realising the commercials of LFMs. Thus, they will primarily focus on the creation of LFMs applications.
Agent concept in Generative AI
The agent concept is a new trend in Generative AI that has the potential to revolutionise the way we interact with computers. Agents are software modules that can autonomously or semi-autonomously spin up sessions (in this case, language models and other workflow-related sessions) as needed to pursue a goal.
One of the key benefits of using agents is that they can automate many of the tasks that are currently performed by humans. This can free up humans to focus on more strategic and creative tasks. Agents can be designed to be more user-friendly and easier to use than traditional Generative AI tools, making Generative AI more accessible to a wider range of users.
Here are some of the trends of agent concept in Generative AI:
- Increased use of agents to automate tasks: As Generative AI becomes more powerful and sophisticated, we can expect to see a greater use of agents to automate tasks that are currently performed by humans. For example, agents can be used to automate the process of creating and deploying AI models.
- Increased use of agents to make Generative AI more accessible: As agents become more user-friendly and easier to use, we can expect to see greater use of agents to make Generative AI more accessible to a wider range of users. This could lead to a new wave of innovation as more and more people are able to use Generative AI to create new products and services.
- Development of new agent-based Generative AI tools and platforms: As the agent concept becomes more popular, we can expect to see the development of new agent-based Generative AI tools and platforms. These tools and platforms will make it easier for developers to create and deploy agent-based Generative AI applications.
Also Read: Gen AI in banking: How to ensure a successful transformation for an age-old industry
Here are some specific examples of how the agent concept is being used in Generative AI today:
- Agent-based Generative AI tools: There are a number of agent-based Generative AI tools that are currently available. For example, Auto-GPT and BabyAGI are two tools that allow users to create and deploy agent-based Generative AI applications.
- Agent-based Generative AI platforms: There are also a number of agent-based Generative AI platforms that are currently available. For example, Google’s AI Platform and Amazon Web Services’ SageMaker platform both allow users to deploy and manage agent-based Generative AI applications.
- Agent-based Generative AI applications: There are a number of agent-based Generative AI applications that are currently in use. For example, agent-based Generative AI applications are being used to create new products and services, automate tasks, and make Generative AI more accessible to a wider range of users.
Overall, the agent concept is a new and promising trend in Generative AI. It is being used to develop new tools, platforms, and applications that are having a significant impact on a variety of industries.
—
Editor’s note: e27 aims to foster thought leadership by publishing views from the community. Share your opinion by submitting an article, video, podcast, or infographic
Join our e27 Telegram group, FB community, or like the e27 Facebook page
Image courtesy of the author
The post How AI will shape the future: A look ahead to 2024 and beyond appeared first on e27.