Visionaire

Streamlined, ergonomic APIs around Apple's Vision framework

The main goal of Visionaire is to reduce ceremony and provide a concise set of APIs for Vision tasks.

Some of its features include:

Centralized list of all tasks, available via the VisionTaskType enum (with platform availability checks).
Automatic image handling for all supported image sources.
Convenience APIs for all tasks, along with all available parameters for each task (with platform availability checks).
Support for custom CoreML models (Classification, Image-To-Image, Object Recognition, Generic VNCoreMLFeatureValueObservations).
Support for multiple task execution, maintaining task type information in the results.
Support for raw VNRequests.
All calls are synchronous (just like the original calls) - no extra 'magic', assumptions or hidden juggling.
SwiftUI extensions for helping you rapidly visualize results (great for evaluation).

Installation

Visionaire is provided as a Swift Package. You can add it to your project via this repository's address.

Supported Vision Tasks

All Vision tasks are supported (including iOS 17 & macOS 14, which are the latest production releases).

Expand to see a detailed list of all available tasks

Task	Vision API	Visionaire Task	iOS	macOS	Mac Catalyst	tvOS
Generate Feature Print	VNGenerateImageFeaturePrintRequest	.featurePrintGeneration	13.0	10.15	13.1	13.0
Person Segmentation	VNGeneratePersonSegmentationRequest	.personSegmentation	15.0	12.0	15.0	15.0
Document Segmentation	VNDetectDocumentSegmentationRequest	.documentSegmentation	15.0	12.0	15.0	15.0
Attention Based Saliency	VNGenerateAttentionBasedSaliencyImageRequest	.attentionSaliency	13.0	10.15	13.1	13.0
Objectness Based Saliency	VNGenerateObjectnessBasedSaliencyImageRequest	.objectnessSaliency	13.0	10.15	13.1	13.0
Track Rectangle	VNTrackRectangleRequest	.rectangleTracking	11.0	10.13	13.1	11.0
Track Object	VNTrackObjectRequest	.objectTracking	11.0	10.13	13.1	11.0
Detect Rectangles	VNDetectRectanglesRequest	.rectanglesDetection	11.0	10.13	13.1	11.0
Detect Face Capture Quality	VNDetectFaceCaptureQualityRequest	.faceCaptureQuality	13.0	10.15	13.1	13.0
Detect Face Landmarks	VNDetectFaceLandmarksRequest	.faceLandmarkDetection	11.0	10.13	13.1	11.0
Detect Face Rectangles	VNDetectFaceRectanglesRequest	.faceDetection	11.0	10.13	13.1	11.0
Detect Human Rectangles	VNDetectHumanRectanglesRequest	.humanRectanglesDetection	13.0	10.15	13.1	13.0
Detect Human Body Pose	VNDetectHumanBodyPoseRequest	.humanBodyPoseDetection	14.0	11.0	14.0	14.0
Detect Human Hand Pose	VNDetectHumanHandPoseRequest	.humanHandPoseDetection	14.0	11.0	14.0	14.0
Recognize Animals	VNRecognizeAnimalsRequest	.animalDetection	13.0	10.15	13.1	13.0
Detect Trajectories	VNDetectTrajectoriesRequest	.trajectoriesDetection	14.0	11.0	14.0	14.0
Detect Contours	VNDetectContoursRequest	.contoursDetection	14.0	11.0	14.0	14.0
Generate Optical Flow	VNGenerateOpticalFlowRequest	.opticalFlowGeneration	14.0	11.0	14.0	14.0
Detect Barcodes	VNDetectBarcodesRequest	.barcodeDetection	11.0	10.13	13.1	11.0
Detect Text Rectangles	VNDetectTextRectanglesRequest	.textRectanglesDetection	11.0	10.13	13.1	11.0
Recognize Text	VNRecognizeTextRequest	.textRecognition	13.0	10.15	13.1	13.0
Detect Horizon	VNDetectHorizonRequest	.horizonDetection	11.0	10.13	13.1	11.0
Classify Image	VNClassifyImageRequest	.imageClassification	13.0	10.15	13.1	13.0
Translational Image Registration	VNTranslationalImageRegistrationRequest	.translationalImageRegistration	11.0	10.13	13.1	11.0
Homographic Image Registration	VNHomographicImageRegistrationRequest	.homographicImageRegistration	11.0	10.13	13.1	11.0
Detect Human Body Pose (3D)	VNDetectHumanBodyPose3DRequest	.humanBodyPoseDetection3D	17.0	14.0	17.0	17.0
Detect Animal Body Pose	VNDetectAnimalBodyPoseRequest	.animalBodyPoseDetection	17.0	14.0	17.0	17.0
Track Optical Flow	VNTrackOpticalFlowRequest	.opticalFlowTracking	17.0	14.0	17.0	17.0
Track Translational Image Registration	VNTrackTranslationalImageRegistrationRequest	.translationalImageRegistrationTracking	17.0	14.0	17.0	17.0
Track Homographic Image Registration	VNTrackHomographicImageRegistrationRequest	.homographicImageRegistrationTracking	17.0	14.0	17.0	17.0
Generate Foreground Instance Mask	VNGenerateForegroundInstanceMaskRequest	.foregroundInstanceMaskGeneration	17.0	14.0	17.0	17.0

Supported Image Sources

CGImage
CIImage
CVPixelBuffer
CMSampleBuffer
Data
URL

Examples

The main class for interfacing is called Visionaire.

It's an ObservableObject and reports processing through a published property called isProcessing.

You can execute tasks on the shared Visionaire singleton or on your own instance (useful if you want to have separate processors reporting on their own).

There are two sets of apis: convenience methods & task-based methods.

Convenience methods have the benefit of returning typed results while tasks can be submitted en masse.

Single task execution (convenience apis):

DispatchQueue.global(qos: .userInitiated).async {
    do {
        let image   = /* any supported image source, such as CGImage, CIImage, CVPixelBuffer, CMSampleBuffer, Data or URL */
        let horizon = try Visionaire.shared.horizonDetection(imageSource: image) // The result is a `VNHorizonObservation`
        let angle   = horizon.angle
        // Do something with the horizon angle
    } catch {
        print(error)
    }
}

Custom CoreML model (convenience apis):

// Create an instance of your model
let yolo: MLModel = {
    // Tell Core ML to use the Neural Engine if available.
    let config = MLModelConfiguration()
    config.computeUnits = .all
    // Load your custom model
    let yolo = try! yolo(configuration: config)
    return yolo.model
}()
    
// Optionally create a feature provider to setup custom model attributes
class YoloFeatureProvider: MLFeatureProvider {
    var values: [String : MLFeatureValue] {
        [
            "iouThreshold": MLFeatureValue(double: 0.45),
            "confidenceThreshold": MLFeatureValue(double: 0.25)
        ]
    }

    var featureNames: Set<String> {
        Set(values.keys)
    }

    func featureValue(for featureName: String) -> MLFeatureValue? {
        values[featureName]
    }
}

// Perform the task
let detectedObjectObservations = try visionaire.customRecognition(imageSource: image,
                                                                        model: try! VNCoreMLModel(for: yolo),
                                                        inputImageFeatureName: "image",
                                                              featureProvider: YoloFeatureProvider(),
                                                      imageCropAndScaleOption: .scaleFill)

Single task execution (task-based apis):

DispatchQueue.global(qos: .userInitiated).async {
    do {
        let image       = /* any supported image source, such as CGImage, CIImage, CVPixelBuffer, CMSampleBuffer, Data or URL */
        let result      = try Visionaire.shared.perform(.horizonDetection, on: image) // The result is a `VisionTaskResult`
        let observation = result.observations.first as? VNHorizonObservation
        let angle       = observation?.angle
        // Do something with the horizon angle
    } catch {
        print(error)
    }
}

Multiple task execution (task-based apis):

DispatchQueue.global(qos: .userInitiated).async {
    do {
        let image   = /* any supported image source, such as CGImage, CIImage, CVPixelBuffer, CMSampleBuffer, Data or URL */
        let results = try Visionaire.shared.perform([.horizonDetection, .personSegmentation(qualityLevel: .accurate)], on: image)
        for result in results {
            switch result.taskType {
            case .horizonDetection:
                let horizon = result.observations.first as? VNHorizonObservation
                // Do something with the observation
            case .personSegmentation:
                let segmentationObservations = result.observations as? [VNPixelBufferObservation]
                // Do something with the observations
            default:
                break
            }
        }   
    } catch {
        print(error)
    }
}

Task configuration

All tasks can be configured with "modifier" style calls for common options.

An example using all the available options:

let segmentation = VisionTask.personSegmentation(qualityLevel: .accurate)
    .preferBackgroundProcessing(true)
    .usesCPUOnly(false)
    .regionOfInterest(CGRect(x: 0, y: 0, width: 0.5, height: 0.5))
    .latestRevision() // You can also use .revision(n)

let result = try Visionaire.shared.perform([.horizonDetection, segmentation], on: image) // The result is a `VisionTaskResult`

SwiftUI Extensions

There are also some SwiftUI extensions available in order to help you visualize results for quick evaluation.

Detected Object Observations

Image(myImage)
    .resizable()
    .aspectRatio(contentMode: .fit)
    .drawObservations(detectedObjectObservations) {
        Rectangle()
            .stroke(Color.blue, lineWidth: 2)
    }

Rectangle Observations

Image(myImage)
    .resizable()
    .aspectRatio(contentMode: .fit)
    .drawQuad(rectangleObservations) { shape in
        shape
            .stroke(Color.green, lineWidth: 2)
    }

Face Landmarks

Note: For Face Landmarks you can specify individual characteristics or groups for visualization. The available options are available through the FaceLandmarks OptionSet and they are:

constellation, contour, leftEye, rightEye, leftEyebrow, rightEyebrow, nose, noseCrest, medianLine, outerLips, innerLips, leftPupil, rightPupil, eyes, pupils, eyeBrows, lips and all.

Image(myImage)
    .resizable()
    .aspectRatio(contentMode: .fit)
    .drawFaceLandmarks(faceObservations, landmarks: .all) { shape in
        shape
            .stroke(.red, style: .init(lineWidth: 2, lineJoin: .round))
    }

Person Segmentation Mask

Image(myImage)
    .resizable()
    .aspectRatio(contentMode: .fit)
    .visualizePersonSegmentationMask(pixelBufferObservations)

Human Body Pose

Image(myImage)
    .resizable()
    .aspectRatio(contentMode: .fit)
    .visualizeHumanBodyPose(humanBodyPoseObservations) { shape in
        shape
            .fill(.red)
    }

Contours

Image(myImage)
    .resizable()
    .aspectRatio(contentMode: .fit)
    .visualizeContours(contoursObservations) { shape in
        shape
            .stroke(.red, style: .init(lineWidth: 2, lineJoin: .round))
    }

alladinian / Visionaire