Complete guide to building a Multimodal Chatbot with Gemini API using the Vertex AI for Firebase SDK

5 minute read

Watch the video or follow along with the post below.

Previously we’ve seen how to use the Gemini AI Generative Swift SDK to chat with the LLM using text-based requests. We then explored how to use the Firebase extension in order to interface with Gemini. In this post we’ll see yet another way to interact with Gemini API using Vertex AI for Firebase SDK’s. The Firebase setup is similar to using the extension so if you’ve been following my posts, the next few steps should feel familiar.

Head on over to https://console.firebase.google.com & sign in with your Google account.

Select Go to Console

Select an existing iOS+ project. If you don’t have one, select Add Project or Create a project

Give your project a name & hit Continue

The next step will recommend that you enable Google Analytics for your project. It is not required for this guide, so you can turn off the toggle & select Create project. Wait a while as the project is created. If you do choose to enable Google Analytics, you will be asked to select a Google Analytics account in the next step before you create the project.

You will now see the project overview panel. Select the iOS+ option to create an Apple app in Firebase.

Add a bundle id & note it down somewhere to create an Xcode project with the same bundle id later. Select Register app

Download the GoogleService-Info.plist file on your system. Select Next

Open Xcode & create a new project in SwiftUI with the bundle identifier you entered earlier. Drag & drop the GoogleService-Info.plist you downloaded in the previous step into your Xcode project.

While still in Xcode, go to File > Add Package Dependencies... to add the Firebase SDK.

Enter the following url in the package search bar https://github.com/firebase/firebase-ios-sdk & select Add Package.

Go back to the Firebase console to Add initialisation code

Copy the code snippet as you see & add it your Xcode project’s App.swift file.

Go back to the Firebase console, Click Next & then click Continue to the console

In the console, you will see your Project Overview. Click the Build with Gemini option in the side bar.

Once the pane loads, scroll to the bottom & click on Get started in the Build AI-powered apps with the Gemini API pane.

Select your billing account in the popup window that opens & click Continue. Proceed to select a budget to complete upgrading your project. (Refer to my previous post if you need help with setting up billing for your project from scratch)

Close the Upgrade Complete dialog above & then click on Enable APIs in the Build with the Gemini API pane.

Then click Continue in step 3 to Add Vertex AI SDK for Firebase.

Select + Register for App Check in the next pane. This is an important auth step to prevent unauthorized access to backend resources. On iOS you can do so using DeviceCheck or AppAttest which is recommended. However for the sake of this tutorial we will setup debug app check for the iOS Simulator. Click Get started in the App Check pane.

Click on the 3-vertical dots to Open the menu against your app. Then click on Manage debug tokens.

Click Add debug token

Enter a name in the My debug token field, say, Simulator. For the Value we will get this from our iOS app.

Go back to your Xcode project & ensure that the FirebaseAppCheck framework is linked.

Switch to your App.swift file & add the following code to setup AppCheckDebugProviderFactory.

Add -FIRDebugEnabled in the project scheme. This will enable verbose Firebase logging in the console & allow us to see the app check debug token.

Run the app & search for App Check debug token in the console. Copy the value that you see next to it.

Paste this value in the Value field back in the browser. Click Save, then click Done.

Next, open https://console.cloud.google.com, & select your Firebase project from the dropdown menu.

Select the hamburger menu on the left > Solutions > All products. Then search for Vertex AI API & Firebase ML API respectively & ensure that these are enabled for your project.

This concludes the Firebase Vertex AI setup. Let’s build the client iOS app now. There are 2 parts to this, the chat UI which is pretty much the same as we have built in the past & the chat service in order to coordinate sending & receiving messages. We’ll focus only on the ChatService. As in previous implementations, we need to initialise an instance of the model:

private var model = VertexAI.vertexAI().generativeModel(modelName: "gemini-1.5-pro-preview-0409")

Next, in order to send messages the method is essentially the same as working with the Gemini Generative AI Swift SDK. We need to send a multipart request with the actual message & media if any was selected. We define our Media model as follows in order to capture the mimeType, the actual media data & a thumbnail created for use in the chat UI.

struct Media {
    let mimeType: String
    let data: Data
    let thumbnail: UIImage
}

We then send images &/or video, make an async request to the model & extract the response text received as follows::

var chatMedia = [any ThrowingPartsRepresentable]()
// Append image data
chatMedia.append(ModelContent.Part.jpeg(mediaItem.data))
// Append video data
chatMedia.append(ModelContent.Part.data(mimetype: mediaItem.mimeType, mediaItem.data))

let response = try await model.generateContent(message, chatMedia)

guard let text = response.text else { return }
print(text)

We use the received response text to construct our custom chat messages & display in the UI. If you’d like you can also stream the response so as to provide a more real time experience.

Coming back to working with the media be it image or video, one needs to ensure that the file size is within acceptable limits. As per the documentation, the maximum request size is 20 MB. You get an error if you exceed this. When sending media data inline, it is encoded to base64 further increasing the size of the request. If you’d like to send media that is larger in size, consider using a Cloud Storage for Firebase URL to include in the request. For chat display as well as to reduce media data size in the request we process the image by resizing the image thumbnail & compressing it. For video, we use AVFoundation’s AVAssetImageGenerator class to extract a frame of the video. Checkout the ThumbnailService code in the source code linked below for more info.

Run the app & chat on!

And that’s it for this post! The complete code can be found here

Leave a comment if you have any questions!

Resources:

https://firebase.google.com/docs/vertex-ai/get-started?platform=ios
https://firebase.google.com/docs/vertex-ai/text-gen-from-multimodal?platform=ios#text-&-multi-images
https://firebase.google.com/docs/app-check/ios/debug-provider?authuser=0
https://github.com/firebase/firebase-ios-sdk/tree/main/FirebaseVertexAI

Share on

Twitter Facebook LinkedIn

Anup D'Souza

Complete guide to building a Multimodal Chatbot with Gemini API using the Vertex AI for Firebase SDK

Share on

You may also enjoy

Generating Thumbnails for PDF Files in SwiftUI

Creating Buttons with Depth in SwiftUI: Adding 3D Effects

Building a Horoscope app with Gemini AI & SwiftUI

Complete Guide to building a Gemini AI Chatbot on iOS using Firebase API Extension