13
#include <iostream>
using namespace std;

class Base1 {
public:
    virtual void foo() { cout << "Base1::foo\n"; }
    int b1_data = 1;
};

class Base2 {
public:
    virtual void bar() { cout << "Base2::bar\n"; }
    int b2_data = 2;
};

class Derived : public Base1, public Base2 {
public:
    int derivedData = 999;

    void foo() override { cout << "Derived::foo, derivedData=" << derivedData << "\n"; }
    void bar() override { cout << "Derived::bar, derivedData=" << derivedData << "\n"; }
};

Derived* d = new Derived();
Base1* b1 = d;
Base2* b2 = d;

b1->foo();  // Calls Derived::foo()
b2->bar();  // Calls Derived::bar()

Is b2->bar() safe?

It accesses derivedData in Derived. The actual C function is like Derived::bar(this).

Whose this pointer is passed to this function?

Is it the address of the Base2 component?

6
  • 10
    Usage note: Pop the floating code at the end into a main and make yourself a real minimal reproducible example. Remember: Every change the readers of your code have to make to get a runnable example is a chance to add a new mistake and answer based on that mistake or accidentally fix the problem and not answer at all. Commented Oct 1 at 20:53
  • 5
    "Is b2->bar() safe?" -- Why wouldn't it be? You should describe the scenario in words so that 1) other people with the same question can find this, and 2) people see what you intend for them to see. (If you show a dozen people the same piece of code, you could end up with a score of interpretations.) Commented Oct 1 at 21:04
  • 7
    Side note: Remember that the current run of AI are Large Language Models (LLM) and all they can do is provide strings of tokens that are statistically likely based on your input question and their training data. Since LLMs don't actually know C++, they count on the correct answer being seen as statistically likely. For the weird stuff, the original stuff, or the stuff rarely written about you'll only get the correct answer by dumb luck. If you want to solve a common problem, AI probably has your back. If you want to know how to defrobulate a flux capacitor with an oscillation overthruster... Commented Oct 1 at 21:50
  • 1
    Somewhat related: stackoverflow.com/q/1321062/179910 Commented Oct 1 at 22:45
  • 6
    I wouldn't focus so much on the "this" pointer (also in the naming of this example). this is not important to this question. You have a pointer to a class, if it has virtual methods and you call this through that pointer it will call the most derived method. That's the whole point of virtual functions. And even though implicit first argument is "this" and vtables and are a common implementation of member and virtual functions, they are not important for the observable behavior (calling the most derived virtual function), which is in the end what really matters Commented Oct 2 at 3:08

3 Answers 3

20

It's fine.

The actual function that appears in the Base2 subobject v-table is a trampoline. Something like

void Derived__Base2_bar(Base2* _Base2_this)
{
    Derived* _Derived_this = static_cast<Derived*>(_Base2_this); // intptr_t(_Base2_this) - offsetofbase(Derived, Base2)
    return Derived__bar(_Derived_this);
}

If you had virtual inheritance, it would get complicated. But this is not.

You could also get a compiler-generated trampoline like this when using return type covariance, and then the "return" downcast would also be computing an offset.

Sign up to request clarification or add additional context in comments.

9 Comments

so, this pointer for Derived::bar() is the start of Base2 subject here because bar() overrides Base2?
No. When you defined the overriding member function Derived::bar() in C++, you got two "ordinary" functions, which I called void Derived__Base2_bar(Base2*) and void Derived__bar(Derived*) for exposition. Derived__bar is the one that gets the body you wrote, it has the address of the whole Derived object and can find the derivedData member. Derived__Base2_bar is as shown, just a forwarder often called thunk or trampoline.
The call b2->bar() doesn't directly call Derived::bar(), which needs the address of the Derived object. Instead the v-table lookup finds the thunk, which receives the address b2 (which is a Base2 subobject of a Derived), and finds the full object with a simple pointer subtraction before calling the real Derived::bar().
Worth considering migrating these two comments, especially this one, into the answer for guaranteed preservation.
"Derived* _Base2_this= static_cast<Derived*>(_Base2_this); // intptr_t(_Base2_this) - offsetofbase(Derived, Base2)" The whole memory layout: Base1 (vptr _ fields) + Base2 (vptr _ fields) + Derived (fields). Does _Base2_this pointing to Base2? What does _Base2_this point to, start address of Base1, or start address of Derived in the memory layout? In another word, offsetofbase(Derived, Base2) == the size of Base1 subject?
that's roughly correct, except you left out Derived's ptr-to-vtable (Needed because Derived isn't final). Despite user4581301's appreciation for the comments, they didn't say anything that isn't already in the answer. "Does _Base2_this pointing to Base2 [the subobject]?" Yes, you know that from its type, Base2 *. "What does _Base2_this point to, start address of Base1, or start address of Derived in the memory layout?" Neither of those (which are almost certainly the same as each other). It points to the start address of the Base2 subobject of Derived. The type says this
And yes, offsetofbase(Derived, Base2) is very likely to be equal to sizeof(Base1), but could be greater for alignment reasons, or if either of the Bases have virtual bases, etc.
Note: this is not the only way this could be implemented. The other way is to have the vtable contain this pointer offsets in addition to function pointers.
That approach is generally reserved for virtual inheritance (where it is needed)
9
  1. Yes, b2->bar() is safe.

  2. The this pointer inside Derived::bar() points to the start of the Derived object, not just the Base2 subobject.

  3. This is called pointer adjustment in multiple inheritance, and every major C++ compiler (GCC, Clang, MSVC) handles it correctly.

Comments

5

This is a supplement to Ben Voigt's answer that shows what assembly code GCC actually generates for this scenario. I'll start by showing the generated code for Base1 and Base2 which is fairly easy to understand:

Base1* new_base1() { return new Base1(); }
Base2* new_base2() { return new Base2(); }

The construction of Base1 and Base2 are nearly identical (code for stack manipulation, calling new, etc. omitted):

new_base1():
        ...
        mov     QWORD PTR [rax], OFFSET FLAT:vtable for Base1+16
        mov     DWORD PTR [rax+8], 1
        ...
new_base2():
        ...
        mov     QWORD PTR [rax], OFFSET FLAT:vtable for Base2+16
        mov     DWORD PTR [rax+8], 2
        ...

We put a pointer into the object's vtable at offset 0, and the actual data at offset 8. Here are the vtables:

vtable for Base1:
        .quad   0
        .quad   typeinfo for Base1
        .quad   Base1::foo()
vtable for Base2:
        .quad   0
        .quad   typeinfo for Base2
        .quad   Base2::bar()

The vtable pointer in the object (OFFSET FLAT:vtable for Base1+16) is offset to point directly at the methods. When we call the methods this pointer is dereferenced to get the function pointer:

void Base1_call_foo(Base1 * b1) { b1->foo(); }
void Base2_call_bar(Base2 * b2) { b2->bar(); }
Base1_call_foo(Base1*):
        mov     rax, QWORD PTR [rdi]
        jmp     [QWORD PTR [rax]]
Base2_call_bar(Base2*):
        mov     rax, QWORD PTR [rdi]
        jmp     [QWORD PTR [rax]]

At first this looks like a problem: if we pass a pointer that is actually an instance of Derived, however its vtable is organized it appears that these functions would both call the same function pointer! Obviously this doesn't happen, so let's see what Derived looks like:

Derived* new_derived() {
    return new Derived();
}
new_derived():
        ...
        mov     QWORD PTR [rdx], OFFSET FLAT:vtable for Derived+16
        mov     DWORD PTR [rdx+8], 1
        mov     QWORD PTR [rdx+16], OFFSET FLAT:vtable for Derived+48
        mov     DWORD PTR [rdx+24], 2
        mov     DWORD PTR [rdx+28], 999
        ...
vtable for Derived:
        .quad   0
        .quad   typeinfo for Derived
        .quad   Derived::foo()
        .quad   Derived::bar()
        .quad   -16
        .quad   typeinfo for Derived
        .quad   non-virtual thunk to Derived::bar()

We can see that Derived actually contains two vtable pointers; one to Derived::foo and Derived::bar, and one that points to "non-virtual thunk to Derived::bar" (this is the "trampoline" that Ben Voigt's answer mentions). These pointers are interleaved with the class data. To see how they are used, we can first look at how derived->foo() and derived->bar() are called:

void Derived_call_foo(Derived * d) { d->foo(); }
void Derived_call_bar(Derived * d) { d->bar(); }
Derived_call_foo(Derived*):
        mov     rax, QWORD PTR [rdi]
        jmp     [QWORD PTR [rax]]
Derived_call_bar(Derived*):
        mov     rax, QWORD PTR [rdi]
        jmp     [QWORD PTR [rax+8]]

They both call their respective function pointers from the first vtable. So what is the second vtable for? Finally we can look at what happens when we cast Derived to Base1 or Base2:

Base1* Derived_cast_to_Base1(Derived * d) { return d; }
Base2* Derived_case_to_Base2(Derived * d) { return d; }
Derived_cast_to_Base1(Derived*):
        mov     rax, rdi
        ret
Derived_case_to_Base2(Derived*):
        mov     rax, rdi
        test    rdi, rdi
        je      .L19
        add     rax, 16
.L19:
        ret

Casting to Base1 is a no-op (we just copy the pointer). This works because the beginning of both the Derived object's layout and vtable match the layout of Base1: foo is the first function in the vtable and b1_data immediately follows the vtable pointer.

However casting to Base2 instead returns a pointer 16 bytes into the object, where the second vtable pointer is located. (The test makes sure we leave a null pointer unchanged.) This vtable starts with a pointer to the Derived::bar trampoline and is followed by b2_data, so this is compatible with a Base2 pointer! Indeed this is a sort of "Base2 component" inside Derived. Visually:

         +--------+---------+
Base1:   | vtable | b1_data |
         +---|----+---------+
             V 
         +------------+
         | Base1::foo |
         +------------+
                            +--------+---------+
Base2:                      | vtable | b2_data |
                            +---|----+---------+
                                V 
                            +------------+
                            | Base2::bar |
                            +------------+

         +--------+---------+--------+---------+-------------+
Derived: | vtable | b1_data | vtable | b2_data | derivedData |
         +---|----+---------+---|----+---------+-------------+
             |                  V
             |              +--------------------+
             |              | Derived::bar thunk |
             V              +--------------------+
         +--------------+--------------+
         | Derived::foo | Derived::bar |
         +--------------+--------------+

The one remaining function to look at is the trampoline:

        .set    .LTHUNK0,Derived::bar()
non-virtual thunk to Derived::bar():
        sub     rdi, 16
        jmp     .LTHUNK0

Here we subtract the offset of 16 that we added when converting from a Derived to get back the pointer to the full object, then jump to the actual Derived::bar implementation. This is safe because the vtable containing this function is only used by instances of Base2 that were converted from Derived.

See the full code here: https://godbolt.org/z/fdd1K4665

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.