6
\$\begingroup\$

I want to show an interactive audio waveform like this.

I want to show an interactive audio waveform like this.

I've extracted the sample data using AVAssetReader. Using this data, I'm drawing a UIBezierPath in a Scrollview's contentView. Currently, when I pinch zoom-in or zoom-out the scrollView, I'm downsampling the sample data to determine how many samples are to be shown.

class WaveformView: UIView {
    var amplitudes: [CGFloat] = [] {
        didSet {
            setNeedsDisplay()
        }
    }

    override func draw(_ rect: CGRect) {
        guard let context = UIGraphicsGetCurrentContext(), !amplitudes.isEmpty else { return }

        // Set up drawing parameters
        context.setStrokeColor(UIColor.black.cgColor)
        context.setLineWidth(1.0)
        context.setLineCap(.round)

        let midY = rect.height / 2
        let widthPerSample = rect.width / CGFloat(amplitudes.count)

        // Draw waveform
        let path = UIBezierPath()

        for (index, amplitude) in amplitudes.enumerated() {
            let x = CGFloat(index) * widthPerSample
            let height = amplitude * rect.height * 0.8

            // Draw vertical line for each sample
            path.move(to: CGPoint(x: x, y: midY - height))
            path.addLine(to: CGPoint(x: x, y: midY + height))
        }

        path.stroke()
    }
}

Added gesture handle

@objc private func handlePinch(_ gesture: UIPinchGestureRecognizer) {
        switch gesture.state {
        case .began:
            initialPinchDistance = gesture.scale
            
        case .changed:
            let scaleFactor = gesture.scale / initialPinchDistance
            var newScale = currentScale * scaleFactor
            newScale = min(max(newScale, minScale), maxScale)
            
            // Update displayed samples with new scale
            updateDisplayedSamples(scale: newScale)
            print(newScale)
            // Maintain zoom center point
            let pinchCenter = gesture.location(in: scrollView)
            let offsetX = (pinchCenter.x - scrollView.bounds.origin.x) / scrollView.bounds.width
            let newOffsetX = (totalWidth * offsetX) - (pinchCenter.x - scrollView.bounds.origin.x)
            scrollView.contentOffset.x = max(0, min(newOffsetX, totalWidth - scrollView.bounds.width))
            
            view.layoutIfNeeded()
            
        case .ended, .cancelled:
            currentScale = scrollView.contentSize.width / (baseWidth * widthPerSample)
            
        default:
            break
        }
    }
private func updateDisplayedSamples(scale: CGFloat) {
        let targetSampleCount = Int(baseWidth * scale)
        displayedSamples = downsampleWaveform(samples: rawSamples, targetCount: targetSampleCount)
        waveformView.amplitudes = displayedSamples
        
        totalWidth = CGFloat(displayedSamples.count) * widthPerSample
        contentWidthConstraint?.constant = totalWidth
        scrollView.contentSize = CGSize(width: totalWidth, height: 300)
    }
private func downsampleWaveform(samples: [CGFloat], targetCount: Int) -> [CGFloat] {
        guard samples.count > 0, targetCount > 0 else { return [] }
        
        if samples.count <= targetCount {
            return samples
        }
        
        var downsampled: [CGFloat] = []
        let sampleSize = samples.count / targetCount
        
        for i in 0..<targetCount {
            let startIndex = i * sampleSize
            let endIndex = min(startIndex + sampleSize, samples.count)
            let slice = samples[startIndex..<endIndex]
            
            // For each window, take the maximum value to preserve peaks
            if let maxValue = slice.max() {
                downsampled.append(maxValue)
            }
        }
        
        return downsampled
    }

The following approach works very inefficiently as every time gesture.state is changed, I'm calculating the downsampled data and perform UI operation based on that. How can I implement this functionality more efficiently for smooth interaction?

\$\endgroup\$
1
  • \$\begingroup\$ Use a sparse table \$\endgroup\$ Commented Feb 6 at 2:23

1 Answer 1

6
\$\begingroup\$

How can I implement this functionality more efficiently for smooth interaction?

Pre-compute at different resolutions.

maxValue = slice.max()

Side note: it's not clear that .max() is ideal for this. Maybe use median of window? Or the 80-th or 90-th percentile value of a window?

Upon initial loading of the waveform we're going to be displaying everything, so slice it into windows, compute each window value as max or median or whatever, hang onto those values, and display them.

Now pretend the user asked to see half of the timespan. There's been no gesture, no user interaction, so we do not yet know the starting point, but that's OK. We'll just compute window values for everything at that resolution, and hang onto the values.

Repeat for quarter, eighth, and so on. At some point we bottom out -- RAM to store the values becomes annoyingly large, and time to recompute exact values on the fly for a "small" timeslice is conveniently small.

Now we start accepting gestures. As the user pinches and pinches, we will dive down into using the "half timespan" or the "quarter timespan" data. Of course the user's requested {start, stop} timestamps won't match the precomputed data exactly. But we can go to the slightly higher resolution data, generate appropriate indexes, and display a subset of the stored data, skipping values occasionally.

Why is this effective? Because the number of pre-computed values approximately matches the display size, exceeding it at most by a factor of two.

If you're a stickler for accuracy, have a background thread do the unchanged OP calculation, and use double buffering to replace the "approximate view" with the "exact view" if it turns out the user went idle for a moment. OTOH if gesture events keep arriving, the background computational effort is wasted and is discarded, while the foreground thread keeps quickly displaying pre-computed values.

A background thread can also help with the "time to become interactive" startup latency upon loading a new waveform.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.