Skip to main content
Notice removed Draw attention by CommunityBot
Bounty Ended with G. Sliepen's answer chosen by CommunityBot
Tweeted twitter.com/StackCodeReview/status/1206227580807974913
Notice added Draw attention by trent
Bounty Started worth 100 reputation by trent
added 109 characters in body
Source Link
trent
  • 1.1k
  • 7
  • 15

My best languages are Python and Rust, but I want to improve my C++ skills, so I decided to try my hand at Advent of Code 2019 in C++. So far I have not encountered too manyany major issues, but I want to know what I am doing that is subtly (or not so subtly) wrong.

It's not my intention to summarize the entire Intcode specification, but hereHere are some salient points aboutproperties of the Intcode language and computer:

  • Intcode programs may assume shared instruction and data memory, and most or all of the example programs make use of this.
  • An Intcode computer has two registers: the instruction pointer, which I called the program counter (_pc), and the relative base, which I called the stack pointer (_sp). Both are defined to initialize to 0.
  • Intcode instructions are of variable size, and the program counter has to be incremented by the width of the current instruction each cycle.
  • Intcode instructions support arguments with three different addressing modes: immediate, "positional" (that is, direct addressing) and "relative" (that is, stack-relative addressing). The addressing modes of all the arguments of an instruction are encoded into the high digits of its opcode.
  • To solve certain problems, it's necessary to start multiple Intcode computers, connect their inputs and outputs and run them concurrently (although not literally at the same time).
  • I created mnemonics for each of the Intcode instruction opcodes as I encountered them and invented an assembly language for debugging purposes; these are not part of the specification but you can observe them by uncommenting the appropriate lines.
  • The memory size is unspecified, but all memory locations are initially zero. In my code, the memory starts at the size of the input program and increases whenever an out-of-bounds write occurs.

My best languages are Python and Rust, but I want to improve my C++ skills, so I decided to try my hand at Advent of Code 2019 in C++. So far I have not encountered too many issues, but I want to know what I am doing that is subtly (or not so subtly) wrong.

It's not my intention to summarize the entire Intcode specification, but here are some salient points about the Intcode language and computer:

  • Intcode programs assume shared instruction and data memory, and most or all of the example programs make use of this.
  • An Intcode computer has two registers: the instruction pointer, which I called the program counter (_pc), and the relative base, which I called the stack pointer (_sp). Both are defined to initialize to 0.
  • Intcode instructions are of variable size, and the program counter has to be incremented by the width of the current instruction each cycle.
  • Intcode instructions support arguments with three different addressing modes: immediate, "positional" (that is, direct addressing) and "relative" (that is, stack-relative addressing). The addressing modes of all the arguments of an instruction are encoded into the high digits of its opcode.
  • To solve certain problems, it's necessary to start multiple Intcode computers, connect their inputs and outputs and run them concurrently (although not literally at the same time).
  • I created mnemonics for each of the Intcode instruction opcodes as I encountered them and invented an assembly language for debugging purposes; these are not part of the specification but you can observe them by uncommenting the appropriate lines.

My best languages are Python and Rust, but I want to improve my C++ skills, so I decided to try my hand at Advent of Code 2019 in C++. So far I have not encountered any major issues, but I want to know what I am doing that is subtly (or not so subtly) wrong.

Here are some properties of the Intcode language:

  • Intcode programs may assume shared instruction and data memory, and most or all of the example programs make use of this.
  • An Intcode computer has two registers: the instruction pointer, which I called the program counter (_pc), and the relative base, which I called the stack pointer (_sp). Both are defined to initialize to 0.
  • Intcode instructions are of variable size, and the program counter has to be incremented by the width of the current instruction each cycle.
  • Intcode instructions support arguments with three different addressing modes: immediate, "positional" (that is, direct addressing) and "relative" (that is, stack-relative addressing). The addressing modes of all the arguments of an instruction are encoded into the high digits of its opcode.
  • To solve certain problems, it's necessary to start multiple Intcode computers, connect their inputs and outputs and run them concurrently (although not literally at the same time).
  • I created mnemonics for each of the Intcode instruction opcodes as I encountered them and invented an assembly language for debugging purposes; these are not part of the specification but you can observe them by uncommenting the appropriate lines.
  • The memory size is unspecified, but all memory locations are initially zero. In my code, the memory starts at the size of the input program and increases whenever an out-of-bounds write occurs.
Source Link
trent
  • 1.1k
  • 7
  • 15

Complete Intcode Computer (Advent of Code 2019: Day 9)

My best languages are Python and Rust, but I want to improve my C++ skills, so I decided to try my hand at Advent of Code 2019 in C++. So far I have not encountered too many issues, but I want to know what I am doing that is subtly (or not so subtly) wrong.

The following program is my solution to the Day 9 problem. It's essentially an interpreter for a contrived machine language called Intcode. The Intcode specification is spread among Day 2, Day 5, and Day 9. The Day 7 problems had additional constraints that may explain some of my architecture decisions.

It's not my intention to summarize the entire Intcode specification, but here are some salient points about the Intcode language and computer:

  • Intcode programs assume shared instruction and data memory, and most or all of the example programs make use of this.
  • An Intcode computer has two registers: the instruction pointer, which I called the program counter (_pc), and the relative base, which I called the stack pointer (_sp). Both are defined to initialize to 0.
  • Intcode instructions are of variable size, and the program counter has to be incremented by the width of the current instruction each cycle.
  • Intcode instructions support arguments with three different addressing modes: immediate, "positional" (that is, direct addressing) and "relative" (that is, stack-relative addressing). The addressing modes of all the arguments of an instruction are encoded into the high digits of its opcode.
  • To solve certain problems, it's necessary to start multiple Intcode computers, connect their inputs and outputs and run them concurrently (although not literally at the same time).
  • I created mnemonics for each of the Intcode instruction opcodes as I encountered them and invented an assembly language for debugging purposes; these are not part of the specification but you can observe them by uncommenting the appropriate lines.

My code passes all the tests on the site. I'm more concerned about eradicating bad habits and avoiding pitfalls than whether the Intcode logic itself is right.

#include <fstream>
#include <iomanip>
#include <iostream>
#include <optional>
#include <sstream>
#include <stdexcept>
#include <vector>

using Int = long;

enum Opcode {
    ADD = 1,
    MUL = 2,
    INPUT = 3,
    OUTPUT = 4,
    JNZ = 5,
    JZ = 6,
    TESTLT = 7,
    TESTEQ = 8,
    ADDSP = 9,
    HALT = 99,
};

enum class Mode {
    Pos/*ition*/ = 0,
    Imm/*ediate*/ = 1,
    Rel/*ative*/ = 2,
};

class Argument {
public:
    Argument(Mode mode, Int value) : _mode(mode), _value(value) {}

    Mode mode() const {
        return _mode;
    }

    Int value() const {
        return _value;
    }

private:
    Mode _mode;
    Int _value;
    friend std::ostream& operator<<(std::ostream& os, const Argument& self);
};

std::ostream& operator<<(std::ostream& os, const Argument& self) {
    switch (self._mode) {
        case Mode::Imm:
            return os << self._value;
        default:
            std::cerr << "Warning: unknown addressing mode" << std::endl;
            [[fallthrough]];
        case Mode::Pos:
            return os << "[" << self._value << "]";
        case Mode::Rel:
            if (self._value < 0) {
                return os << "[sp - " << -self._value << "]";
            } else if (self._value > 0) {
                return os << "[sp + " << self._value << "]";
            } else {
                return os << "[sp]";
            }
    }
}

class Instruction {
public:
    Instruction(const Int* mem) {
        if (mem[0] < 0) {
            throw std::runtime_error("negative opcode");
        }
        _opcode = (Opcode)(mem[0] % 100); // the last two digits are the opcode
        Int addr_modes = mem[0] / 100;
        // parse arguments:
        for (size_t a = 1; a < length(); ++a) {
            auto mode = static_cast<Mode>(addr_modes % 10);
            _args.emplace_back(mode, mem[a]);
            addr_modes /= 10;
        }
        if (addr_modes > 0) {
            std::cerr << "Warning: unused addressing flags";
        }
    }

    Opcode opcode() const {
        return _opcode;
    }

    const Argument& arg(size_t n) const {
        return _args[n];
    }

    /* The code size of this instruction. The amount by which the program
     * counter should be incremented after executing it. */
    size_t length() const {
        switch (_opcode) {
            case ADD:
            case MUL:
            case TESTLT:
            case TESTEQ:
                return 4;
            case JNZ:
            case JZ:
                return 3;
            case INPUT:
            case OUTPUT:
            case ADDSP:
                return 2;
            case HALT:
                return 1;
            default:
                throw std::logic_error("bad opcode in Instruction::length");
        }
    }

private:
    std::vector<Argument> _args;
    Opcode _opcode;
    friend std::ostream& operator<<(std::ostream& os, const Instruction& self);

    /* The number of input arguments this instruction has. Input arguments are
     * distinguished from output arguments because they may be immediate. In
     * the Intcode machine language, input arguments always precede output
     * arguments. */
    size_t inputs() const {
        switch (_opcode) {
            case ADD:
            case MUL:
            case JNZ:
            case JZ:
            case TESTLT:
            case TESTEQ:
                return 2;
            case OUTPUT:
            case ADDSP:
                return 1;
            case INPUT:
            case HALT:
                return 0;
            default:
                throw std::logic_error("bad opcode in Instruction::inputs");
        }
    }
};

std::ostream& operator<<(std::ostream& os, const Instruction& self) {
    constexpr const char *opcode_names[] = {
        NULL, "ADD", "MUL", "INPUT", "OUTPUT", "JNZ", "JZ", "TESTLT", "TESTEQ",
        "ADDSP", NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
        NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
        NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
        NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
        NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
        NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
        NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
        NULL, NULL, NULL, NULL, NULL, NULL, NULL, "HALT"};
    os << opcode_names[self._opcode];
    for (auto& arg: self._args) {
        os << " " << arg;
    }
    return os;
}

class Cpu {
public:
    Cpu(std::ostream& out = std::cout, std::istream& in = std::cin)
        : _pc(0), _sp(0), _out(out), _in(in) {}

    /* for debugging purposes
    void set_name(std::string name) {
        _name = name;
    }
    */

    /* Populates this cpu's memory with data read from a file.
     * Also resets the program counter and stack pointer to 0. */
    void load(const std::string& file_name)
    {
        _mem.clear();
        _pc = 0;
        _sp = 0;
        auto fp = std::ifstream(file_name);
        if (!fp.is_open()) {
            throw std::runtime_error("failed to open input file");
        }
        std::string line;
        Int arg;
        while (fp >> arg) {
            _mem.emplace_back(arg);
            char comma;
            if (!(fp >> comma)) {
                break;
            } else if (comma != ',') {
                throw std::runtime_error("expected ','");
            }
        }
        fp.close();
    }

    /* Runs this Cpu until it halts or stops to wait for input. */
    void run() {
        while (true) {
            Instruction inst(&_mem[_pc]);

            //debug() << std::setw(7) << inst << std::endl;
            auto pc_nxt = execute(inst);
            if (pc_nxt) {
                _pc = *pc_nxt;
            } else {
                break;
            }
        }
    }

    /* Returns true if the program is halted, false if it is blocked waiting
     * for input. */
    bool is_halted() {
        return _mem[_pc]%100 == HALT;
    }

private:
    std::vector<Int> _mem;
    size_t _pc;
    size_t _sp;
    std::ostream& _out = std::cout;
    std::istream& _in = std::cin;
    //std::optional<std::string> _name;

    /* Returns a (reference to a) value in memory. */
    Int& lvalue(const Argument& arg) {
        size_t i;
        switch (arg.mode()) {
            case Mode::Imm:
                throw std::logic_error("immediate mode is not available for lvalues");
            default:
                std::cerr << "Warning: unknown addressing mode" << std::endl;
                [[fallthrough]];
            case Mode::Pos:
                i = arg.value();
                break;
            case Mode::Rel:
                i = _sp + arg.value();
                break;
        }
        // resize _mem so i is in range
        if (i >= _mem.size()) {
            _mem.resize(i + 1);
        }
        return _mem[i];
    }

    /* Evaluates an argument to an instruction, either immediate or from memory
     * according to the addressing mode of the argument. */
    Int rvalue(const Argument& arg) const {
        size_t i;
        switch (arg.mode()) {
            case Mode::Imm:
                return arg.value();
            default:
                std::cerr << "Warning: unknown addressing mode" << std::endl;
                [[fallthrough]];
            case Mode::Pos:
                i = arg.value();
                break;
            case Mode::Rel:
                i = _sp + arg.value();
                break;
        }
        // return 0 for out of range
        if (i >= _mem.size()) {
            return 0;
        }
        return _mem[i];
    }

    /* Executes a single instruction, modifying memory and the stack pointer
     * but not the program counter. */
    std::optional<size_t> execute(const Instruction& inst) {
        auto arg = [&](size_t n) { return rvalue(inst.arg(n)); };
        switch (inst.opcode()) {
            case ADD:
                lvalue(inst.arg(2)) = arg(1) + arg(0);
                break;
            case MUL:
                lvalue(inst.arg(2)) = arg(1) * arg(0);
                break;
            case INPUT:
                if (_in >> lvalue(inst.arg(0))) {
                    //debug() << "input = " << arg(0) << std::endl;
                    break;
                } else {
                    _in.clear(); // clear the eof state
                    //debug() << "waiting for input" << std::endl;
                    return std::nullopt;
                }
            case OUTPUT:
                _out << arg(0) << std::endl;
                //debug() << "output = " << arg(0) << std::endl;
                break;
            case JNZ:
                if (0 != arg(0)) {
                    return arg(1);
                }
                break;
            case JZ:
                if (0 == arg(0)) {
                    return arg(1);
                }
                break;
            case TESTLT:
                lvalue(inst.arg(2)) = arg(0) < arg(1);
                break;
            case TESTEQ:
                lvalue(inst.arg(2)) = arg(0) == arg(1);
                break;
            case ADDSP:
                _sp += arg(0);
                break;
            case HALT:
                return std::nullopt;
            default:
                throw std::logic_error("bad opcode in execute");
        }
        return _pc + inst.length();
    }

    /*
    std::ostream& debug() {
        if (_name) {
            std::cerr << std::setw(10) << *_name << ": ";
        }
        return std::cerr;
    }
    */
};

int main() {
    Cpu cpu;
    cpu.load("day09.in");
    cpu.run();
}

I am most interested in knowing what about my code is error-prone, buggy, or just kind of "weird" to an experienced C++ programmer. I've seen several C++ styles in use and I picked what made sense to me, but that doesn't mean they make sense collectively.

Some specific things that seem a little awkward are:

  • Cpu::run() returns either when the program halts or when it is waiting for input. (The client can call Cpu::is_halted to distinguish these cases.) When the program stops to wait for input, I had to call _in.clear() inside execute to clear the EOF flag from the stream. This seemed a little risky to me, because I might be clearing an error flag instead. I suppose I could test for EOF or error separately, but it just seems a bit awkward. Is there some better way of structuring the input that I haven't thought of?

  • I tried to find a way to write a value function that would contextually interpret an argument as input or output, so instead of

      lvalue(inst.arg(2)) = arg(1) + arg(0);
    

    I would be able to write

      value(2) = value(1) + value(0);
    

    and argument 2 would be parsed as an output parameter (for which immediate addressing mode is not allowed) while arguments 1 and 0 would be parsed as input parameters (for which any addressing mode is allowed). Possibly this was a silly idea; anyway, I thought it would be possible by dispatching on the constness of this, but I could not find a way to do it. Still, if it's possible, I'd like to know how.

  • There seems to be a difference between enum and enum class when it comes to how integer values are assigned. I used enum for opcodes and enum class for addressing modes, not really with any rhyme or reason. Should I have used enum class or enum for both? With enum class it seems you have to give them all explicit values if you want static_cast to work consistently. I did not find much information about this online.

  • friend ostream& operator<<: is this a good way to create a debug-ready output format?

  • Is const std::string& a good way to accept a string argument when you don't need to mutate or copy it? I tried std::string_view but it seems you can't make one from a char *.

  • One of the mistakes I made was as follows: in lvalue, when testing whether the memory needed to be resized, I used > instead of >=. This caused a subtle bug that only triggered when the address being written was exactly one beyond the end of the vector. Valgrind could not detect this bug (at least, not with default options). Is there some compiler switch I can turn on, or another tool I could use to easily detect index-out-of-range errors while debugging?

  • The vectors for Arguments in Instruction seem a bit extra. I would like to store them in-line in Instruction, and save on allocation, since the maximum size is only 3. Is there a typical solution to this kind of problem? The only thing I could think of was to have an array of 3 optional<Argument>s and that seemed tricky.

Here is a simple Intcode quine to test the code with (day09.in):

109,1,204,-1,1001,100,1,100,1008,100,16,101,1006,101,0,99

It should print itself to stdout (but with newlines instead of commas).