As you can see here, the definition of the return value for esp_camera_fb_get() is
typedef struct {
    uint8_t * buf;              /*!< Pointer to the pixel data */
    size_t len;                 /*!< Length of the buffer in bytes */
    size_t width;               /*!< Width of the buffer in pixels */
    size_t height;              /*!< Height of the buffer in pixels */
    pixformat_t format;         /*!< Format of the pixel data */
} camera_fb_t;
..
camera_fb_t* esp_camera_fb_get();
..
typedef enum {
    PIXFORMAT_RGB565,    // 2BPP/RGB565
    PIXFORMAT_YUV422,    // 2BPP/YUV422
    PIXFORMAT_GRAYSCALE, // 1BPP/GRAYSCALE
    PIXFORMAT_JPEG,      // JPEG/COMPRESSED
    PIXFORMAT_RGB888,    // 3BPP/RGB888
    PIXFORMAT_RAW,       // RAW
    PIXFORMAT_RGB444,    // 3BP2P/RGB444
    PIXFORMAT_RGB555,    // 3BP2P/RGB555
} pixformat_t;
 Meaning that fb->buf and bf->len hold thehthe raw data in the format specified by pixformat_t. Aka this is not a "string", these are raw bytes, which you can still base64-encode perfectly fine.
So for the base64 library
    static String encode(const uint8_t * data, size_t length);
it already accepts the right data type and you can do
  fb = esp_camera_fb_get();  
  ..
  //will be allocated on the heap. Takes about 4/3 of the input size, so basically it doubles your memory requirements
  String imgDataB64 = base64::encode(fb->buf, fb->len);
  //add to a JSON object wit the metadata width, height and format so that it can be decoded
 You should output the fb->format value to check what format the data is, and add this and the width & height information so that the image may be constructted on the other side. Beware high memory requierements since the base64 encoding basically creates a new buffer to store the base64 representation of it. That may be optimized by writing the framebuffer data into an initially bigger buffer which is then transformed in-place. But that would have to be changed at the image driver level.