Skip to content

Conversation

richardleach
Copy link
Contributor

The extend-an-existing-array code in Perl_av_extend_guts contains the following:

		if (av == PL_curstack) {	/* Oops, grew stack (via av_store()?) */
		    PL_stack_sp = *allocp + (PL_stack_sp - PL_stack_base);
		    PL_stack_base = *allocp;
		    PL_stack_max = PL_stack_base + newmax;
		}

However, the stack probably represents a small proportion of calls to Perl_av_extend_guts.

At least nowadays, the via av_store()? comment seems to be a red herring, as all in-core instances where (av == PL_curstack) seem to originate from Perl_stack_grow.

This commit therefore moves the PL_stack* allocations to Perl_stack_grow.

Local test builds of this commit were successful, but this commit should be:

  • thoroughly smoked
  • reviewed by someone very familiar with the stacks.
Copy link
Member

@atoomic atoomic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xenu
Copy link
Member

xenu commented Jul 31, 2020

At least nowadays, the via av_store()? comment seems to be a red herring, as all in-core instances where (av == PL_curstack) seem to originate from Perl_stack_grow.

I have no idea if anything actually calls av_store with PL_curstack but it definitely has code to handle that case.

If the stack handling code is removed from Perl_av_extend_guts, then it will have to be removed from av_store too.

@richardleach richardleach added the do not merge Don't merge this PR, at least for now label Jul 31, 2020
@richardleach
Copy link
Contributor Author

richardleach commented Jul 31, 2020

I have no idea if anything actually calls av_store with PL_curstack but it definitely has code to handle that case.

Thanks for pointing that out. I'd seen it but had a brain fart and ignored its implications. Will try to dig into it over the weekend.

"XPUSH in disguise" goes all the way back to Perl 5.0.0: https://perl5.git.perl.org/perl5.git/blob/a0d0e21ea6ea90a22318550944fe6cb09ae10cda:/av.c

@Leont Leont requested a review from iabyn August 1, 2020 13:07
@Leont
Copy link
Contributor

Leont commented Aug 1, 2020

I am not sure about the implications of this. It may break important stuff, it may not.

@richardleach
Copy link
Contributor Author

FWIW

I put a conditional Perl_croak at the start of av_store, it wasn't triggered by make test.

Advanced Perl Programming, 1st Ed mentions that ST() could have used av_store() to automatically extend the stack, but that would be considerably slower.

@richardleach
Copy link
Contributor Author

richardleach commented Aug 1, 2020

Looking more at when "XPUSH in disguise" was introduced:

  • the av_extend & av_store code (or predecessors) in perl-5a9 and older did not have code to handle (av == PL_curstack)
  • this code was introduced by the time of perl-5.000
  • perl-5.000 itself did not use av_store with the stack AFAICT
  • perl-5.000 introduced the EXTEND, ST, XPUSH etc. macros, so it's not obvious to me why any non-core code would have used av_store for stack management either

A wild supposition:

  • (av == PL_curstack) is the only stack that is considered in av_fetch and av_store in perl-5.000 and today.
  • If a non-core user ignored the macros and used av_store to push values to the stack, couldn't they also try to use av_store to e.g. push things to the Temp stack? In which case, why isn't there code to handle that possibility as well?
  • The reference from Advanced Perl Programming, 1st Ed above states: ST() could have used av_store() to automatically extend the stack, but that would be considerably slower. - sooooo:
  • Is it possible that during the evolution of the ST() macro it actually did use av_store() and that's why the (av == PL_curstack) code was introduced, but when ST() stopped using av_store(), that code became redundant and was accidentally left in place?
@iabyn
Copy link
Contributor

iabyn commented Aug 3, 2020 via email

@xenu
Copy link
Member

xenu commented Aug 3, 2020

So not only there are no comments explaining that but it also isn't being tested by our tests.

@Leont
Copy link
Contributor

Leont commented Aug 3, 2020

So not only there are no comments explaining that but it also isn't being tested by our tests.

Well volunteered! ;-)

@richardleach
Copy link
Contributor Author

The av == PL_curstack checking is there because in some places, such as the @ary = split(...) optimisation, perl cheats and takes an existing array (@ary in that example) and tells the core to temporarily use that as the stack.

Thanks, Dave!

The problem is that if something grows @ary while this is happening, e.g. @ary = split(/(?{ push @ary, 1 })/, ....);

Bleugh, I see.

If you have time for questions:

  1. I'll try to walk through pp_split, but what is the split optimisation in a nutshell? Saving the need to copy the resulting elements from the stack into the array? Or is there more to it than that? (Other than not potentially making the stack massive, if that can be avoided.)

  2. Is that the only sane way to achieve the split optimisation? It seems to make others pay for split's cleverness. :-( e.g. perl -e "push @gg, 1 for (1..50); print @gg" incurs the cost of checking whether (@gg == PL_curstack) multiple times as @gg gets expanded.

  3. Do you know of anything that uses PL_curstack with av_store in https://github.com/Perl/perl5/blob/blead/av.c#L365? As noted above, nothing in the test suite does.

     if (av == PL_curstack && key > PL_stack_sp - PL_stack_base)
     PL_stack_sp = PL_stack_base + key;	/* XPUSH in disguise */
    
  4. Is using the stack like this supported for XS modules or embedded use cases, or is it core-only fun?

Also, while I'm asking likely-dumb questions about av.c:

  1. Is there a compelling reason nowadays why array elements are initialised in a loop, rather than potentially more efficiently with the Zero macro?
	    if (av && AvREAL(av)) {
		while (tmp)
		    ary[--tmp] = NULL;
	    }
  1. Related to (5): OO applications, e.g. a Mojolicious app, surely create far more non-stack arrays than stacks, so can arrays not just be created with Newxz rather than doing the following? Or are there also some creates-lots-of-stacks use cases?
		Newx(*allocp, newmax+1, SV*);
		ary = *allocp + 1;
		tmp = newmax;
		*allocp[0] = NULL;	/* For the stacks */
	    }
	    if (av && AvREAL(av)) {
		while (tmp)
		    ary[--tmp] = NULL;
	    }
@richardleach
Copy link
Contributor Author

So not only there are no comments explaining that but it also isn't being tested by our tests.

Well volunteered! ;-)

Happy to help with this....once I understand it more.

@richardleach
Copy link
Contributor Author

First draft at a comment block for av_extend_guts:

/* (av == PL_curstack) in the following circumstances:
   1. Perl_stack_grow is explicitly growing the stack
      (as used by the EXTEND, XPUSH, etc. macros)
   2. An AV is currently acting as the stack following
      a SWITCHSTACK operation. (e.g. for an optimisation)
   3. See "XPUSH in disguise" in av_store.
      <TODO do we know when this happens?>
      
   Note: This list may not be exhaustive or have full
   test coverage! */

Comments/suggestions welcome.

@richardleach
Copy link
Contributor Author

Minor tweak.

/* (av == PL_curstack) in the following circumstances:
   1. Perl_stack_grow is explicitly growing the stack
      (as used by the EXTEND, XPUSH, etc. macros)
   2. An AV is currently acting as the stack following
      a SWITCHSTACK operation and the AV has to be
      extended before the stack is switched back.
      (SWITCHSTACK is usually used in this manner
      for in-place optimizations.)
   3. See "XPUSH in disguise" in av_store.
      <TODO do we know when this happens?>
      
   Note: This list may not be exhaustive or have full
   test coverage! */
@richardleach
Copy link
Contributor Author

Started working on a test, but my patch doesn't break this (even under ASan):
./miniperl -e 'my @ary = split(/(?{ push @ary, 1 })/, "123467890");'

Devel::Peek shows that, at the end, @ary:

  • has AvMAX == 132
  • has AvFILL = 8
  • contains SVs for the digits 2,3,4,5,6,7,8,9,0

realarray is true at the end of that split, so the optimization did kick in.

PL_stack_sp has somehow survived or a better/bigger test case is needed?

@iabyn
Copy link
Contributor

iabyn commented Aug 10, 2020 via email

@richardleach
Copy link
Contributor Author

The check in av_store is cheap(): it is guarded by if (!AvREAL(av)), and all normal AVs are AvREAL(), so the extra PL_curstack will only be done occasionally.

Yeah, but when the array has to be extended, av_extend_guts still checks for PL_curstack. (Sorry for not being clearer above.)

  1. Do you know of anything that uses PL_curstack with av_store in https://github.com/Perl/perl5/blob/blead/av.c#L365? As noted above, nothing in the test suite does. if (av == PL_curstack && key > PL_stack_sp - PL_stack_base) PL_stack_sp = PL_stack_base + key; /* XPUSH in disguise */

There appears to be a bug here. That check in av_store() only applies to !AvREAL() arrays, but in the case of split, the array being temporarily used as a stack is still AvREAL(), so the check isn't done. Indeed on a vanilla debugging perl: $ perl5320 -e'@ary = split(/\w(?{ @ary[1000] = 1 })/, "abc")' Split loop at -e line 1. panic: POPSTACK $

Huh. So the check in av_store() should apply to all arrays if it is to be effective?

Or, in the wee, small, heat-insomnia hours I wondered if the following could work:

  • When the optimization kicks in, pp_split could create a new, temporary AV
  • SWITCHSTACK that temp AV
  • Sometime after the SWITCHSTACK has been undone, possibly in the if (realarray) section, the AvARRAY/AvALLOC/AvMAX details of @ary and the temp AV could be swapped over
  • the temp AV is either then explicitly destroyed or left in a state that the next FREETMPS will clear it up

Still tricksy, but then only pp_split has to care?

Is there a compelling reason nowadays why array elements are initialised in a loop, rather than potentially more efficiently with the Zero macro? if (av && AvREAL(av)) { while (tmp) ary[--tmp] = NULL; }

Probably not.

Thanks, I'll add it to the ideas-to-try pile.

@demerphq
Copy link
Collaborator

demerphq commented Aug 11, 2020 via email

@richardleach
Copy link
Contributor Author

Is there a compelling reason nowadays why array elements are initialised in a loop, rather than potentially more efficiently with the Zero macro? if (av && AvREAL(av)) { while (tmp) ary[--tmp] = NULL; }

Could it be because of overflow on some systems?

I don't know. I can't imagine that, but perhaps my imagination - and exposure to multiple platforms - isn't good enough. ;)

Any change would definitely have a dependence on the underlying malloc/calloc implementations, as well as the ability of a compiler to recognize what's being done and convert it anyway during optimization.

, is there a bzero equivalent that takes the size of element independently from the number of elements? Would that ever even matter? Also, could it simply be faster on some systems? It would be zeroing 8 bytes at a time on a 64 bit box. I dont know enough portability of C to say myself, but it seems conceivable that on some systems it could be a faster way to deal with aligned data than a byte oriented operation like bzero().

It seems very conceivable that the existing code used to be faster on some platforms, don't know if it still is or if perl still supports those platforms.

Modern memset()/memzero(), apart from perhaps fixing up a handful of unaligned bytes, should do at least a pointer-width of bytes at a time. On x86-64, it's very likely that even with a non-targeted build, it will do more than that (e.g. using SSE2 instructions). Glibc can memset 64 bytes at a time using AVX instructions - but I suppose that code would have to be compiled for the native/targeted architecture to get that?

I should check if at -02, the likes of Clang and GCC already convert if (av && AvREAL(av)) { while (tmp) ary[--tmp] = NULL; } loops to memset anyway. If they don't, then it seems like the performance of modern, mainstream architectures may suffer - depending upon CPU magic - with the status quo.

@richardleach
Copy link
Contributor Author

richardleach commented Aug 20, 2020

Indeed on a vanilla debugging perl:

$ perl5320 -e'@ary = split(/\w(?{ @ary[1000] = 1 })/, "abc")'
Split loop at -e line 1.
panic: POPSTACK

@iabyn - also anything that equates to this, which tweaking the av_store code wouldn't fix:

# perl -e '@ary = split(/\w(?{ undef @ary })/, "abc")'
Segmentation fault

Feels like a game of whack-a-mole. ;) Presumably there are some scenarios that will always end badly, e.g. if the reference count on @ ary is dropped to zero in the ?{} block.

But if you think the following - or some such - might be worth trying, I'm happy to find out and prepare a PR if it pans out:

  • When the optimization kicks in, create a new AV
  • SWITCHSTACK that temp AV
  • Sometime after the SWITCHSTACK has been undone, probably in the if (realarray) section, the AvARRAY/AvALLOC/AvMAX details of @ ary and the temp AV could be swapped over. (More likely, I'd swap the stash pointers over in the xpvav bodies, then swap the xpvav pointers in the SV heads.)
  • mortalize the the temp AV so that the next FREETMPS will clear it up
@richardleach
Copy link
Contributor Author

  • When the optimization kicks in, create a new AV
  • SWITCHSTACK that temp AV
  • Sometime after the SWITCHSTACK has been undone, probably in the if (realarray) section, the AvARRAY/AvALLOC/AvMAX/AvFILLp details of @ ary and the temp AV could be swapped over.
  • mortalize the the temp AV so that the next FREETMPS will clear it up

Couldn't sleep, so I did this. No test failures, but needs tidying up before any PR.

@iabyn
Copy link
Contributor

iabyn commented Aug 27, 2020 via email

@richardleach
Copy link
Contributor Author

I don't like this idea. For a start, pp_slit() is already overly-complex; adding more complexity isn't good.

That's fair enough. I wasn't sure if partially removing the optimisation was an option, hence #18090.

I think the best approach is to just to partially remove the optimisation. I.e. split onto to the stack as normal, then at the end, if OPpSPLIT_ASSIGN, then empty the array, extend it, and Copy() the stack to AvARRAY. This means the code is still fast due to not needing to execute padav and aassign ops.

Ok, happy to start work on that at the weekend.

The only downsize is that that stack's high water mark for the rest of the program's execution might be excessive.

Do you think it's worth comparing the stack size before/after and potentially shrinking the stack if it grows excessively? (For some value of "excessively".)

Although I'm rejecting your PR, I'll make some quick comments for your future benefit.

Thanks, I appreciate that.

  1. There were about 400 lines separating tmp4array being created and it being mortalised. That's lots of scope for something to die inbetween and for the array to be leaked.

Would mortalising it early have been the correct approach?
Would it have to be pushed to the tmps stack as well - like below - and then taken off at the end?

        EXTEND_MORTAL(1);
        PL_tmps_stack[++PL_tmps_ix] = SvREFCNT_inc_simple_NN(av);
        orig_ix = PL_tmps_ix;
  1. I didn't like your choice of variable names: tmp4ary, tmp1, tmp2, tmp3, tmp4. Not very meaningful.

Noted. :)

@iabyn
Copy link
Contributor

iabyn commented Aug 27, 2020 via email

@iabyn
Copy link
Contributor

iabyn commented Aug 27, 2020 via email

@richardleach
Copy link
Contributor Author

#18232 has removed the use of SWITCHSTACK by the @ary = split(...) optimisation, so like a dog with a bone, I'd like to return to the original discussion. i.e. Two (av == PL_curstack) code paths are not covered by the test suite and lack documentation/comments stating any use cases:

  • In av_extend
            if (av == PL_curstack) { /* Oops, grew stack (via av_store()?) */
                PL_stack_sp = *allocp + (PL_stack_sp - PL_stack_base);
                PL_stack_base = *allocp;
                PL_stack_max = PL_stack_base + newmax;
            }
  • In av_store
	    if (av == PL_curstack && key > PL_stack_sp - PL_stack_base)
		PL_stack_sp = PL_stack_base + key;	/* XPUSH in disguise */

For av_store, @iabyn noted earlier that "There appears to be a bug here. That check in av_store() only applies to !AvREAL() arrays, but in the case of split, the array being temporarily used as a stack is still AvREAL(), so the check isn't done."

The remaining users of SWITCHSTACK in core (PUSHSTACKi, POPSTACK, SAVESWITCHSTACK) and on CPAN seem to be of the type SWITCHSTACK(PL_curstack,next->si_stack), rather than swapping in a regular AV for optimisation purposes.

Can anyone point to known usage of those two code paths now? Or should I try smoking CPAN to see if anything breaks when they are removed?

@khwilliamson
Copy link
Contributor

@richardleach what do you want to do with this P.R.?

@richardleach
Copy link
Contributor Author

I'd like to smoke cpan with the (av == PL_curstack) code removed to see if anything breaks, but don't have the tuits to do it.

@jkeenan
Copy link
Contributor

jkeenan commented Sep 16, 2022

This pull request has had a "do not merge" label on it since July 2020. If it's not under active development, I recommend that we close it and open up a new ticket when and if needed.

Thank you very much.
Jim Keenan

@jkeenan jkeenan added the Closable? We might be able to close this ticket, but we need to check with the reporter label Sep 16, 2022
@richardleach
Copy link
Contributor Author

Guess I should have pushed for merging this at the start of the 5.37 cycle to see what broke. I'll close it for now and make a note to revisit it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review Closable? We might be able to close this ticket, but we need to check with the reporter do not merge Don't merge this PR, at least for now

8 participants