Skip to content

Conversation

@richardleach
Copy link
Contributor

This allows the existing undef OP to act on a pad SV. The following
two cases are optimized:

undef my $x, currently implemented as:

    4     <1> undef vK/1 ->5
    3        <0> padsv[$x:1,2] sRM/LVINTRO ->4

my $a = undef, currently implemented as:

    5     <2> sassign vKS/2 ->6
    3        <0> undef s ->4
    4        <0> padsv[$x:1,2] sRM*/LVINTRO ->5

These are now just represented as:
3 <1> undef[$x:1,2] vK/LVINTRO,TARGMY ->4

The undef $x case gets a slight performance boost, as shown in this toy example:
my $x; for (0..10_000_000) { undef $x; undef $x; undef $x; undef $x; undef $x; undef $x; undef $x; undef $x; undef $x; undef $x }
Blead:

            714.52 msec task-clock                #    0.956 CPUs utilized
                 1      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               198      page-faults               #    0.277 K/sec
     3,170,725,221      cycles                    #    4.438 GHz
         5,948,953      stalled-cycles-frontend   #    0.19% frontend cycles idle
           231,911      stalled-cycles-backend    #    0.01% backend cycles idle
    10,843,683,798      instructions              #    3.42  insn per cycle
                                                  #    0.00  stalled cycles per insn
     2,260,727,959      branches                  # 3163.998 M/sec

Patched:

            602.19 msec task-clock                #    0.974 CPUs utilized
                 1      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               199      page-faults               #    0.330 K/sec
     2,724,503,940      cycles                    #    4.524 GHz
           313,679      stalled-cycles-frontend   #    0.01% frontend cycles idle
           191,871      stalled-cycles-backend    #    0.01% backend cycles idle
     8,943,649,796      instructions              #    3.28  insn per cycle
                                                  #    0.00  stalled cycles per insn
     1,760,721,729      branches                  # 2923.875 M/sec

The $x = undef case, in which more optimization is achieved, performs much better, as shown in this toy example:
my $x; for (0..10_000_000) { $x = undef; $x = undef; $x = undef; $x = undef; $x = undef; $x = undef; $x = undef; $x = undef; $x = undef; $x = undef }

Blead:
          1,224.48 msec task-clock                #    0.990 CPUs utilized
                 2      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               205      page-faults               #    0.167 K/sec
     5,663,539,065      cycles                    #    4.625 GHz
         4,336,701      stalled-cycles-frontend   #    0.08% frontend cycles idle
             4,385      stalled-cycles-backend    #    0.00% backend cycles idle
    19,644,168,889      instructions              #    3.47  insn per cycle
                                                  #    0.00  stalled cycles per insn
     3,760,824,142      branches                  # 3071.356 M/sec

Patched:

            628.51 msec task-clock                #    0.974 CPUs utilized
                 1      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               200      page-faults               #    0.318 K/sec
     2,716,312,069      cycles                    #    4.322 GHz
           432,725      stalled-cycles-frontend   #    0.02% frontend cycles idle
           224,782      stalled-cycles-backend    #    0.01% backend cycles idle
     8,943,691,044      instructions              #    3.29  insn per cycle
                                                  #    0.00  stalled cycles per insn
     1,760,729,375      branches                  # 2801.443 M/sec

Also some bench.pl comparisons:

expr::sassign::undef_lex
$x = undef

        blead  undef
       ------ ------
    Ir 100.00 318.00
    Dr 100.00 317.65
    Dw 100.00 412.50
  COND 100.00 300.00
   IND 100.00 150.00

COND_m 100.00 100.00
 IND_m 100.00 200.00

 Ir_m1 100.00 100.00
 Dr_m1 100.00 100.00
 Dw_m1 100.00 100.00

 Ir_mm 100.00 100.00
 Dr_mm 100.00 100.00
 Dw_mm 100.00 100.00

expr::sassign::undef_lex_direc
undef $x

        blead  undef
       ------ ------
    Ir 100.00 142.00
    Dr 100.00 158.82
    Dw 100.00 162.50
  COND 100.00 142.86
   IND 100.00 150.00

COND_m 100.00 100.00
 IND_m 100.00 150.00

 Ir_m1 100.00 100.00
 Dr_m1 100.00 100.00
 Dw_m1 100.00 100.00

 Ir_mm 100.00 100.00
 Dr_mm 100.00 100.00
 Dw_mm 100.00 100.00

expr::sassign::undef_my_lex
my $x = undef

        blead  undef
       ------ ------
    Ir 100.00 164.50
    Dr 100.00 180.85
    Dw 100.00 196.15
  COND 100.00 173.68
   IND 100.00 133.33

COND_m 100.00 100.00
 IND_m 100.00 200.00

 Ir_m1 100.00 100.00
 Dr_m1 100.00 100.00
 Dw_m1 100.00 100.00

 Ir_mm 100.00 100.00
 Dr_mm 100.00 100.00
 Dw_mm 100.00 100.00

expr::sassign::undef_my_lex_direc
undef my $x

        blead  undef
       ------ ------
    Ir 100.00 112.43
    Dr 100.00 123.40
    Dw 100.00 119.23
  COND 100.00 115.79
   IND 100.00 133.33

COND_m 100.00 100.00
 IND_m 100.00 150.00

 Ir_m1 100.00 100.00
 Dr_m1 100.00 100.00
 Dw_m1 100.00 100.00

 Ir_mm 100.00 100.00
 Dr_mm 100.00 100.00
 Dw_mm 100.00 100.00
@richardleach
Copy link
Contributor Author

Rebased following OP_PADSV_STORE merge. Awaiting some reviewers.

@haarg
Copy link
Contributor

haarg commented Aug 23, 2022

This isn't just an optree optimization. undef $x and $x = undef have different behavior. $x = undef leaves a PV allocated, but clears its flags. This means re-using the variable for a string of the same size will not allocate any new memory.

Without this patch:

perl -e'use Devel::Peek; my $f; Dump $f; $f = "x" x 40; Dump($f); $f = undef; Dump($f); undef $f; Dump($f);'
SV = NULL(0x0) at 0x1370226d0
  REFCNT = 1
  FLAGS = ()
SV = PV(0x13700b670) at 0x1370226d0
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x600000a28210 "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"\0
  CUR = 40
  LEN = 42
SV = PV(0x13700b670) at 0x1370226d0
  REFCNT = 1
  FLAGS = ()
  PV = 0x600000a28210 "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"\0
  CUR = 40
  LEN = 42
SV = PV(0x13700b670) at 0x1370226d0
  REFCNT = 1
  FLAGS = ()
  PV = 0

With this patch:

./perl -Ilib -e'use Devel::Peek; my $f; Dump $f; $f = "x" x 40; Dump($f); $f = undef; Dump($f); undef $f; Dump($f);'
SV = NULL(0x0) at 0x13880df80
  REFCNT = 1
  FLAGS = ()
SV = PV(0x13800b670) at 0x13880df80
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x600001172c70 "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"\0
  CUR = 40
  LEN = 42
SV = PV(0x13800b670) at 0x13880df80
  REFCNT = 1
  FLAGS = ()
  PV = 0
SV = PV(0x13800b670) at 0x13880df80
  REFCNT = 1
  FLAGS = ()
  PV = 0

This may be a reasonable change, but would at least need to be noted in the commit message.

@richardleach
Copy link
Contributor Author

This isn't just an optree optimization. undef $x and $x = undef have different behavior. $x = undef leaves a PV allocated, but clears its flags. This means re-using the variable for a string of the same size will not allocate any new memory.

Good point. I think the existing behaviour should be preserved. If there's a flag spare, it looks straightforward to implement.

@demerphq
Copy link
Collaborator

demerphq commented Aug 23, 2022 via email

@demerphq
Copy link
Collaborator

demerphq commented Aug 23, 2022 via email

@richardleach richardleach force-pushed the hydahy/undef-padsv-sassign branch from 3594208 to b06b35c Compare August 23, 2022 15:05
@richardleach
Copy link
Contributor Author

On Tue, 23 Aug 2022 at 15:25, demerphq @.***> wrote:
I vote that the existing behavior be preserved, it can have some interesting performance implications.

That's okay, I understood it as you meant it.

The current branch should now preserve the pre-existing behaviour w.r.t. PVs:

perl5.37.4 -e'use Devel::Peek; my $f; Dump $f; $f = "x" x 40; Dump($f); $f = undef; Dump($f); undef $f; Dump($f);'
SV = NULL(0x0) at 0x5636bb8f1720
  REFCNT = 1
  FLAGS = ()
SV = PV(0x5636bb8cdff0) at 0x5636bb8f1720
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x5636bb8f8ed0 "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"\0
  CUR = 40
  LEN = 42
SV = PV(0x5636bb8cdff0) at 0x5636bb8f1720
  REFCNT = 1
  FLAGS = ()
  PV = 0x5636bb8f8ed0 "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"\0
  CUR = 40
  LEN = 42
SV = PV(0x5636bb8cdff0) at 0x5636bb8f1720
  REFCNT = 1
  FLAGS = ()
  PV = 0

I've added the tests you suggested to t/perf/opcount.t, will just tidy them slightly so they don't break unnecessarily in the next opcode update.

@richardleach richardleach force-pushed the hydahy/undef-padsv-sassign branch from b06b35c to a286f49 Compare August 23, 2022 20:29
@richardleach
Copy link
Contributor Author

Ok. tests tidied. I think all review comments to date have been addressed.

This allows the existing `undef` OP to act on a pad SV. The following
two cases are optimized:

`undef my $x`, currently implemented as:
    4     <1> undef vK/1 ->5
    3        <0> padsv[$x:1,2] sRM/LVINTRO ->4

`my $a = undef`, currently implemented as:

    5     <2> sassign vKS/2 ->6
    3        <0> undef s ->4
    4        <0> padsv[$x:1,2] sRM*/LVINTRO ->5

These are now just represented as:

   3     <1> undef[$x:1,2] vK/SOMEFLAGS ->4

Note: The two cases are not quite functionally identical, as `$x = undef`
clears the SV flags but preserves any PV allocation for later reuse,
whereas `undef $x` does free any PV allocation. This behaviour difference
is preserved through use of the OPpUNDEF_KEEP_PV flag.
@richardleach richardleach force-pushed the hydahy/undef-padsv-sassign branch from a286f49 to 2e69278 Compare August 24, 2022 21:18
@demerphq
Copy link
Collaborator

Looks good to me again today,. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants