- Overview
- Installation
- Dependencies
- Streaming ARFoundation Camera to browser
- Sending Web browser Input to Unity
- Applying RayCasting in Unity
- Object detection on the broser client
- Web Server
This Project aims to apply remote guidance using Augmented Reality annotation on a Video stream between the web browser and an android phone. The android application streams an ARFoundation scene to the browser which is responsible for plane detection, then the browser can annotate that stream with AR objects e.g. Arrows.
- The unity apllication can Built for Android [
API>=version 7.0
] - The web application is can be used on Desktop devices and is competible with most of the modern broswers that support the required WebRTC API. see documentation for WebRTC below.
- Clone this repository
https://github.com/MohammedShetaya/AR_Video_Streaming.git
- Open project from disk using Unity Hub
- After the project is open, check if the Build setting is using Android platform
- On the player settings, check if the Auto graphics API is checked, buIld the project for ARM64, and Android version is 7.0 or above.
- On the mobile phone, Allow developer mode, then allow USB debugging and Connect the device to the labtop.
- On Build settings, click build and run.
- Navigate to the project directory using powershell, then
cd WebApp
, then run the commandnpm run dev -- -w
- the console will show a link to the local web application
There are two modes, the first one is receiving a video steam from unity and showing this stream on the browser:
- Open the application on mobile and allow access to the camera.
- Then go back to the local web app on the browser and click receiver sample.
- Click on play video button, then the stream should be started.
The second mode is video receiver with AR annotation:
- Open the application on mobile and allow access to the camera.
- Then go back to the local web app on the browser and click video player.
- Click on play video button, then the stream should be started.
- By now the mobile application should have started detecting the planes and the video stream is sent to the browser.
- Click on any of the detected planes on the browser, then an arrow should point to the click position on both clients.
The project is based on:
- WebRTC protocol for the browser and unity (version
2.4.0-exp.6
). - UnityRenderStreaming package (version
3.1.0-exp.3
).
This package is an API for WebRTC protocol, but in unity with the same browser implementation which gives a great benifit in using this protocol in unity AR and VR applications. This package is compitble with the browser API so it can be used to allow real-time, peer-to-peer, media exchange between unity-unity application or unity-browser application. A connection is established through a discovery and negotiation process called signaling. The signaling between two peers is not supported by WebRTC protocol because every peer is connecting to the Internet behind a NAT so each peer has no information about his public IP address therefore each peer cannot give his IP to the other peer. The solution for this is to use a signaling server.
The signaling server acts as an interface between the Unity Android application and the browser clients so they can Start sending signaling messages to each other. Using an HTTP server would not help in the case of WebRTC as the signaling messages are being generated asyncrounosly, so it is optimal to use a WebSocket server. WebSocket connection is statefull (FullDuplex) connection. Unlike the HTTP connection where the server cannot send responses to the client unless the client sends a request. Websocket servers can send and receive requests at any moment in the connection lifetime. In the case of WebRTC, the server will never know when a client will send a signaling message so it can be forwarded to the other client.
- The browser client sends and Offer message to the websocket server
- The Signaling server receives the message from the broswer and forwards it to the Unity client
- The Unity client receives the offer and set this Offer as its RemoteDescription
- The Unity client create an Answer and set this Answer as its LocalDescription
- The Unity client sends the Answer to the browser client
- The Signaling server receives the message from the broswer and forwards it to the Unity client
- The browser client receives the Answer and set this Answer as its RemoteDescription
- The Two clients register to the
onIceCandidate
event and once the event handler is called, it should send the collected iceCandidate to the other Peer. - Once the other Peer receives an iceCandidate, it should call
AddIceCanidate
in order to set the SDP.
Once the two peers set their LocalDescription
and RemoteDescription
They can start exchaning real-time data (Video, Audio, etc..). A MediaStream
object can be sent over the RTCPeerConnection
using the AddTrack
method. The other peer can register to the OnTrack
event which is will be called once a track is received.
- The
MediaStream
object should be added using theAddTrack
method before sending an Offer/Answer to the other peer. Adding a track should be followed by new Offer/Answer in order for the other peer to have an updated SDP. - The
IceCandidate
should be handled on the remote peer after theRemoteDescritption
is set. - The sending peer needs to register to
OnNegotiationNeeded
event which is called onceAddTrack
finishes execution. The handler of this event should send a new Offer to the remote peer with the new SDP.
Unity Render streaming is based on the WebRTC protocol. It provides a high level implementation for the sinaling, sending, and receiving process. It allows streaming real-time data on a peer to peer connection using WebRTC. This package also allows sending input data from the browser to Unity by maping the browser Events to Unity Actions. With this package, it is possible to build streaming applications in Unity for both Windows and Andriod.
In order to stream the Unity camera. The following components must be added to the scene:
- ARSession Origin: This is an origin to the scene when the player start the application. It is responisble for managing all the Trackables that will be added on the run e.g. 3D cubes.
- AR Session: Controls the lifecycle and configuration options for an AR session
The following scripts are added to the arCamera component which is a subComponent of the ARSession origin component:
This is the base class for the Unity render streaming package. It is responsible for connecting to the signaling server and streaming the provided real-time data. It supports two types of Signaling (HTTP/WebSocket). In this project the Websocket signaling were used as dicussed in the WebRTC part above. This script is expecting inputs of type SignalingHandlerBase
which is a parent class for the BroadCast
class used in this project.
This script is responsible for handling the singlaing messages (offer, answer, ice-candidate) and Sending the input streams to be used in RenderStreaming
script. This class is expecting inputs of type StreamSenderBase
which is the parent class for the ARCameraSender used in this project.
This script extends from Unity.RenderStreaming.VideoStreamSender class. It is responsible for sending the video stream as a RenderTexture. This script changes the TargetTexture of the camera to be a RenderTexture instead of rendering to the screen. It should be attached to the arCamera object.
This script is responsible for creating a copy from the Rendered image from the camera and print this image to a RenderTexture
which will be used in the ARCameraSender script. This script should be attached to the arCamera object and should be used only if no other arCamera is used for screen rendering.
This script is sample script that holds all the logic for sending and receiving a messages between the Unity client and the Web Server and it was used in the project early stages only.
In this project the only browser input that is used is the mouse click, although it can be extendend to any browser event. Once the video is received from unity and is shown on the browser video element, the user can start clicking on any place on that video element. The click coordinates is sent to unity as a buffer array of bytes. The following files are used in this part:
In the figure, the red point represents the place where the browser client clicks. The browser events will call the onCkick event handler of the video element and attach the coordindates to the mouseEvent. These coordinates are not the projected unity coordinates where the rendered image was projected in the first place, So we needed to calculate the x,y portions colored in blue. xPortion = X - X'
and yPortion = Y - Y'
, then we needed to divide by the video scale. Where the video scale is ratio between the original video size and the size displayed on the screen. To calculate the video scale first we need to decide if the video is in lanscape or portrait mode and this is done by checking if W/H
is greater than orignalVideo.width/originalVideo.height
. If the video is in landscape mode then Video Scale = browser video element with / original video with
else it will be Video Scale = browser video elemnt height / original video element height
. The coordinates in Unity 2d World will be:
X = xPortion / Scale
Y = original video height - yPortion / Scale
The implementation of the coordinate calculation on the browser client side can be found in the files:
WepApp/client/public/js/register-events.js
registerMouseEvents
Once the coordinates are calculated they will be sent to unity through an RTCDataChannel
defined in the file Peer.js
. The channel can send buffer arrays, so it is better for sending both x & y coordinates in one chunck of data instead of sending each one on a different message. The following code is responsible for sending the coordinates
data.setFloat32(0,x,true);
data.setFloat32(4,y,true);
_videoPlayer && _videoPlayer.sendMsg(data.buffer);
Then, The bytes array is being parsed at the unity client with the following code:
float x = BitConverter.ToSingle(bytes, 0);
float y = BitConverter.ToSingle(bytes, 4);
First, the BrowserInput
script must be added to the camera that holds the RenderStreaming
and BroadCast
scripts. BrowserInput
is inherited from the render streaming class InputChannelReceiverBase
and contains an implementation for the setChannel function that is called once an RTCDataChannel is created by the RenderStreaming Object. Now, it is possible to register to onMessage
Action of the data channel. Once a message is received with the coordinated then it can be passed to the ARRayCasting
object by calling shootArrow
method.
The ray casting is a method in Graphics to shoot an object along a 3D vector with the promise that it may hit another object. The Ray will hit the object with the least z-index along its direction. In this project, the browser client should be able to point on a real world object on the stream it receives. For example the browser client can click on a book and this book should be pointed to with an arrow in the unity scene. In the Prefabs folder of this project, there is an arrow prefab that will be ray casted to point to a certain object. The implmentation of Ray Cating follows the following:
- Adding a
ARRaycastManager
to the AR Session origin object on the game scene. - Adding
ARRayCasting
script to the AR Session origin object which contains the implementation for the shooting a ray. In theAwake
method the script searches for theARRayCastManager
component which will be used in theshootArrow
method that is being called once a message is received from the the data channel on the Browser Input script. - Providing the arrow prefab to the ARRayCasting script.
The web server follows the same implementation of the Webserver provided with the unity render streaming package. The server consists of two parts. The first is the signaling server which is responsible for exchanging the signaling messages between the untiy client and the browser client.The second part of the web server is the browser client.
The implmentation for the signaling server can be found in the file WebApp/src
. It consists of two components.
WebSocket.ts
The Websocket that is responsible for creating a websocket connection between the server and the clients and forwarding the messages depending on the message type.Server.ts
Web server implementation that is depending on express.js server. It contains the routes for the signaling messages (/offer, /answer, /candidate).
The impentation for the client can be found in the file WebApp/public
. It contains the implementation for the two receiving modes either receiving video stream without annotaion or receiving the video and anotating that stream with ray casting.
The components representing the browser client:
Peer.js
This is the impmentation for the RTCPeerConnection that handles the main functionalities of creating the peer and setting the local and remote descriptions.register-events.js
This is the component that registers to the mouse events and send the coordinates to the remote peer through the data channel.