Create unique ids

Question

You are to write a program that takes a list of strings as input. For every string in the list you are to determine the smallest N such that no other string in the list begins with the same N characters as the string in question. Now compose a new list with all these strings. Here is an example, on the left we have the input and on the right we have the output

aba                   aba
aababba               aa
abbaaabbb          -> abba
bbbbbbabababaaab      b
abbbabbaa             abbb

If a there is a string that is also the beginning of another string you should append a $ to the end to represent the end of the string. Here is an example where that is the case

ababababbabab         abababa
ababbbaabababa        ababbb
ababb                 ababb$
bababba            -> bababb
bbbabababab           bb
bababababbabab        bababa
abababbababa          abababb

Rules

Input will only contain alphabetic characters.
You will not receive a list with the same entry twice.
You take input in any ordered container.
This is code-golf, you should aim to minimize the byte size of your program.

Also related. This may have been the one you guys were thinking of, though it's not really a dupe (the "ID"s only have to be unique up to that point in the list) — ETHproductions
– ETHproductions, Commented Jul 11, 2017 at 0:12
Is something like ["ba,"baa","bab"] a valid input? It does not have the same entry twice, but there is no N defined for the first string. If it is a valid input should it return ["ba$","baa","bab"], ["b$","baa","bab"], ["ba$$","baa","bab"], ["b$$","baa","bab"], any of the above, or something else? — Jonathan Allan
– Jonathan Allan, Commented Jul 11, 2017 at 11:03

Community · Accepted Answer · 2020-06-17 09:04:33Z

Husk, 14 13 12 bytes

Ṡzo↑▲‡Ṫ≠TT'$

Try it online!

Less than half the length of the previous best solution!

Explanation

Ṡzo↑▲‡Ṫ≠TT'$
        TT'$     Pad all strings to equal length by appending dollar signs

     ‡Ṫ≠         Create a 2D matrix with all pairwise differences between
                 strings. ≠ returns the smallest index at wich two lists differ
                 (or 0, if they are equal)

Ṡzo↑▲            For each string, take as many characters as the maximum number
                 in its row of the table.

Since my explanation may not be that clear, here's a small worked example:

Starting list: ["abc","abd","b","bcd"]

Table of differences:

     abc abd  b  bcd     max
 abc  0   3   1   1       3
 abd  3   0   1   1       3
 b    1   1   0   2       2
 bcd  1   1   2   0       2

So we take from each string (with added trailing '$') as many characters as stated in the "max" column.

Final result: ["abc","abd","b$","bc"]

Community · Accepted Answer · 2020-06-17 09:04:33Z

Jelly, 28 26 16 bytes

-10 bytes by implementing the algorithm found by Leo in their brilliant Husk answer.

i€0Ṁḣ@
;€”$µ=þç"

or...

;€”$µ=i0µþ⁸Ṁ€⁸ḣ"

A monadic link taking and returning lists of lists of characters.

Try it online! (the footer makes a full program which prints the result split by newlines.)

How?

i€0Ṁḣ@ - helper link, create an entry of the output: equality row, string with trailing '$'
i€0    - first index of zero in €ach entry of the row
   Ṁ   - maximum
    ḣ@ - head to index with swapped @rguments (the prefix of the string)

;€”$µ=þç" - link: list of lists of characters (list of "strings")
  ”$      - literal '$'
;€        - concatenate for €ach
    µ     - monadic chain separation (call that x)
      þ   - table of (with x on the left and, implicitly, on the right):
     =    -   equals? (vectorises)
        " - zip with (with the table on the left and, implicitly, x on the right)
       ç  - call the last link as a dyad

My original:

-2 bytes thanks to Erik the Outgolfer (replace ḣJ$ with ;\ and ẎċÐ€Ð€$ with ċ@€€Ẏ$)

NMḢ⁹ḣ;⁸Ṃẋ@”$¤Ṗ
;\€ċ@€€Ẏ$ç"

A monadic link taking and returning lists of lists of characters.

Try it online! (the footer makes a full program which prints the result split by newlines.)

_{I'm almost certain this is beatable, and probably by a decent margin! (although I have attempted to golf the method.)}

The same byte count may be achieved without a helper link too, with:

ḣJ$€ẎċÐ€Ð€$ðNMḢ⁹ḣ;⁸Ṃẋ@”$¤Ṗð"

How?

Note: the reusable link is the second line of code, so start there.

NMḢ⁹ḣ;⁸Ṃẋ@”$¤Ṗ - helper link, create an entry of the output: prefix counts, string
               -   ("prefix counts" should be counts of the prefixes of the "string" in
                    the totality of prefixes of *all* the strings)
N              - negate the counts
 M             - maximal indexes (lengths of prefixes appearing least often, ascending)
  Ḣ            - head (finds the minimal length required), call that ml
   ⁹           - chain's right argument (prefixes)
    ḣ          - head (string) to index ml (gets the minimal length prefix)
            ¤  - nilad followed by links as a nilad:
      ⁸        -   chain's left argument (prefix counts)
       Ṃ       -   minimum (this will either be 1 or 2)
          ”$   -   literal '$'
        ẋ@     -   repeat with swapped @rguments (either "$" or "$$")
     ;         - concatenate
             Ṗ - pop (remove the last "$" - leaving one where the prefix occurs in another
               -      string's prefixes, and none otherwise)

;\€ċ@€€Ẏ$ç" - link: list of lists of characters (list of "strings")
  €         - for €ach string
 \          -   cumulative reduce with:
;           -     concatenation
            - (gets a list of lists of prefixes)
        $   - last two links as a monad:
       Ẏ    -   tighten (flatten by one to make a single list of all prefixes)
    @       -   swap arguments
     €€     -     for each prefix in each list of prefixes
   ċ        -       count occurrences in the tightened list (>=1 since it counts itself)
          " - zip with the dyad (right argument is this link's argument):
         ç  -   last link (helper) as a dyad

You can replace ẎċÐ€Ð€$ with ċ@€€Ẏ$ for -1. — Erik the Outgolfer
– Erik the Outgolfer, Commented Jul 11, 2017 at 9:21

Laikoni · Accepted Answer · 2017-07-11 22:18:04Z

3

Haskell, 83 79 bytes

f l|let h=map(++"$")l;w x|y<-init x,2>sum[1|z<-h,y<=z,y++"{">z]=w y|1<3=x=w<$>h

Try it online! Example usage: f ["abcd","abh","ab","bacd"] yields ["abc","abh","ab$","b"].

Given a list of strings, the function f appends a trailing $ and applies w to each string, where w iteratively drops the last character of the string (with init) until the next application of init would lead to the string now longer being a unique prefix.

Edit: Three bytes off thanks to Ørjan Johansen's shorter prefix test!

edited Jul 11, 2017 at 22:18

answered Jul 10, 2017 at 23:04

Laikoni

26.4k7 gold badges54 silver badges116 bronze badges

\$\begingroup\$ Good point, looks like I can't count. \$\endgroup\$

Wheat Wizard
– Wheat Wizard ♦

2017-07-10 23:30:29 +00:00
Commented Jul 10, 2017 at 23:30
1

\$\begingroup\$ y<=z,y++"{">z is a shorter prefix test. \$\endgroup\$

Ørjan Johansen
– Ørjan Johansen

2017-07-11 01:16:31 +00:00
Commented Jul 11, 2017 at 1:16

Add a comment |

Wheat Wizard · Accepted Answer · 2017-07-10 23:29:33Z

2

Haskell, 109 104 98 97 90 84 83 bytes

Probably not very good but Hopefully this will kick things off a bit.

f s=[[x|x<-map(`take`(a++"$"))[0..],[1]==[1|d<-s,and$zipWith(==)x$d++"$"]]!!0|a<-s]

Try it online!

edited Jul 10, 2017 at 23:29

answered Jul 10, 2017 at 21:56

Wheat Wizard♦

103k23 gold badges299 silver badges697 bronze badges

\$\begingroup\$ I think x<=d++"$",x++"{">d is a shorter prefix test. \$\endgroup\$

Ørjan Johansen
– Ørjan Johansen

2017-07-11 01:32:05 +00:00
Commented Jul 11, 2017 at 1:32

Add a comment |

Neil · Accepted Answer · 2017-07-13 12:41:45Z

1

JavaScript (ES6), 89 86 bytes

a=>a.map(g=(s,i,a,t=s+'$')=>a.some((t,j)=>j-i&&!t.search(s))?t:g(s.slice(0,-1),i,a,s))

Edit: Saved 3 bytes thanks to @CraigAyre. (Supporting arbitrary characters would still have saved two bytes using !t.indexOf(s).)

edited Jul 13, 2017 at 12:41

answered Jul 11, 2017 at 0:27

Neil

184k12 gold badges76 silver badges290 bronze badges

\$\begingroup\$ Can you use search instead of startsWith?: !t.search(s) \$\endgroup\$

Craig Ayre
– Craig Ayre

2017-07-12 14:58:00 +00:00
Commented Jul 12, 2017 at 14:58

Add a comment |

Keyu Gan · Accepted Answer · 2017-07-11 07:48:27Z

0

Mathematica, 141 142 bytes

g[x_:0,___]:=x;f=#<>""&/@(Thread@h[x=#~PadRight~99&/@(Characters[#<>"$"]&/@#),Max/@Outer[g@@Join@@Position[#-#2,y_/;y!=0]&,x,x,1]]/.h->Take)&

I think I can golf it a bit. The code doesn't work on Mathics so you may use Wolfram Sandbox to test.

edited Jul 11, 2017 at 7:48

answered Jul 11, 2017 at 7:41

Keyu Gan

2,16312 silver badges25 bronze badges

Add a comment |

Community · Accepted Answer · 2020-06-17 09:04:33Z

PHP, 155 129 102 bytes

for(;++$k<$argc;$t=!print"$t
")for($i=0;count(preg_grep("_^".($t.=$argv[$k][$i++]?:"$")._,$argv))>1;);

expects input to not contain regex special chars and that no argument is -.
Run with php -nr <word> <word> ... or try it online.

breakdown

for(;++$k<$argc;            # loop through arguments
    $t=!print"$t\n")            # 2. print $t, reset $t
    for($i=0;                   # 1. loop through string:
    1<count(preg_grep("_^".(        # 2. continue while more than one argument begins with $t
        $t.=$argv[$k][$i++]?:"$"    # 1. append current character or "$" to $t
    )._,$argv)););

Stack Exchange Network

Create unique ids

Rules

7 Answers 7

Husk, 14 13 12 bytes

Explanation

Jelly, 28 26 16 bytes

How?

How?

Haskell, 83 79 bytes

Haskell, 109 104 98 97 90 84 83 bytes

JavaScript (ES6), 89 86 bytes

Mathematica, 141 142 bytes

PHP, 155 129 102 bytes

Linked

Hot Network Questions

Create unique ids

Rules

7 Answers 7

Husk, 14 13 12 bytes

Explanation

Jelly, 28 26 16 bytes

How?

How?

Haskell, 83 79 bytes

Haskell, 109 104 98 97 90 84 83 bytes

JavaScript (ES6), 89 86 bytes

Mathematica, 141 142 bytes

PHP, 155 129 102 bytes

Linked

Related

Hot Network Questions