I see some things that may help you improve your code. I'll start with the stylistic pieces and then move on to the basic algorithm and the speed-up you're seeking.
Fix your formatting
This code has peculiar indentation that makes it difficult to tell when a function begins and ends. Fixing that would hlep.
Use <cmath> instead of <math.h>
The difference between the two forms is that the former defines things within the std:: namespace versus into the global namespace. Language lawyers have lots of fun with this, but for daily use I'd recommend using <cmath>. See this SO questionthis SO question for details.
Use all of the required #includes
The function std::scanf is used but its declaration is in #include <cstdlib> which is not actually in the list of includes. Similarly std::fill needs #include <algorithm>.
Avoid non-portable type assumptions
The code currently contains this line
std::scanf("%llu %llu", &m, &n);
That might be just fine with your compiler on your machine, but on mine, the compiler complains:
warning: format ‘%llu’ expects argument of type ‘long long unsigned int*’, but argument 2 has type ‘uint64_t* {aka long unsigned int*}’
The simplest way to avoid that problem is to use the C++ stream extractors instead:
std::cin >> m >> n;
There is a parallel problem (and fix) with std::printf.
Use the right constructor
The code currently contains these two lines:
std::vector<bool> flags(n + 1);
std::fill(flags.begin(), flags.end(), true);
Why not simply use the constructor version that does both of these?
std::vector<bool> flags(n + 1, true);
Use your knowledge of the mathematical domain
The only even prime number is 2, so you could optimize for both space and performance by only storing bits that correspond with odd numbers. Further, it's clear that if we eliminate even numbers, the inner sieve loop can look like this instead:
for (uint64_t j = i * i; j <= n; j += 2 * i) {
Avoid doing the same work twice
In the case that there are multiple trials, one simple optimization would be to compute the sieve only once (based on the largest n) and then simply use it multiple times rather than re-calculating the sieve for every trial.
Do work at compile time instead of runtime
Another optimization would be to compute the sieve at compile time rather than runtime. If you're using C++14 or better, one can compute the sieve at compile time.