Glad to learn that you got the correct result, but 3 hours is still a long time for this, I think. Luckily, there are lots of opportunities for optimizing this program.
A lot of copying is going on
Currently your program uses something similar to a deck of cards (your stack) from which you physically remove one card and place it flat on the table (your copying_the_current_attempt_from_the_stack). Then you process this one card and potentially add new cards to the deck (your copying_the_current_attempt_onto_stack).
The optimal solution does not remove that one card, but will edit that card for the first of the new cards, and will (partially) copy this same card for any other new cards.
An awful lot of debugging instructions sit in the way
Even if you use NDEBUG in order to disable the printing of several of these messages, you will still suffer the presence of the 3 instructions load s0, NDEBUG test s0, s0 jump nz, dont_print, especially because they appear in hot loops. Once you know for a fact that the program works fine, you should remove all of these, and then measure the time the program takes.
Many loops are inefficient
In my review to your previous question I explained why a repeat ... until <cond> loop is better than a while <cond> ... wend loop. Do apply this and see the difference!
A quick search for N-Queens will reveal that there are many algoritmic optimizations that you could apply. But staying close to your solution, I see 2 big optimizations:
- once you get the solution [a,b,c,d,e,f,g,h], you immediately also have the symmetrical solution [7-a,7-b,7-c,7-d,7-e,7-f,7-g,7-h]
- there can never be a queen in a corner
In your 8x8 program this translates to a different way of setting up the stack. Instead of initially creating one empty stack record:
0,?,?,?,?,?,?,?,?
^ ^
| top_of_the_stack
bottom_of_the_stack
and have the first pass through adding_a_new_queen_loop replace it by 8 new stack records:
1,7,?,?,?,?,?,?,?,1,6,?,?,?,?,?,?,?, ... 1,0,?,?,?,?,?,?,?
^ ^
| top_of_the_stack
bottom_of_the_stack
you initially create just 3 non-empty stack records:
1,6,?,?,?,?,?,?,?,1,5,?,?,?,?,?,?,?,1,4,?,?,?,?,?,?,?
^ ^
| top_of_the_stack
bottom_of_the_stack
The fact that these records are non-empty is very important because it is the reason why from now on you can safely turn all of your loops into repeat ... until <cond> loops.
Redundancies clutter the code
eg. In 'found a solution' you have this snippet:
compare s1, s2
jump c, not_a_new_digit
load s2, s1
not_a_new_digit:
load s1, s2
Coming from regbank b the S2 register holds address 10 (digits_of_the_ordinal_number), and S1 holds an address of 10 or more. The jump is never going to happen. You can remove these 5 lines.