0

I have an array and need to remove if it is with the value of the from [id] duplicate item, I tried to do with:

$comments_new = array_map ("unserialize", 
          array_unique (array_map ("serialize", $ comments)))

But, there is no change.Template of my array.

Array
(
    [0] => Array
        (
            [created_time] => 2018-10-28T17:35:58+0000
            [from] => Array
                (
                    [name] => Usuario
                    [id] => 111111
                )

            [message] =>test  as das
            [id] => 4234234214123412341234124
        )

    [1] => Array
        (
            [created_time] => 2018-10-28T17:35:24+0000
            [from] => Array
                (
                    [name] => Usuario2
                    [id] => 22222222
                )

            [message] => test
            [id] => 12341241234134444343
        )

    [2] => Array
        (
            [created_time] => 2018-10-28T18:44:08+0000
            [from] => Array
                (
                    [name] => Usuario3
                    [id] => 33333333
                )

            [message] => ccccc
            [id] => 223423421243123412341234123
        )

    [3] => Array
        (
            [created_time] => 2018-10-28T18:43:44+0000
            [from] => Array
                (
                    [name] => Usuario2
                    [id] => 22222222
                )

            [message] => test other
            [id] => 23424123412341234
        )


)

Note that inside the array there is the ID item, it is for this item that it must be checked if it is duplicated, then I must remove the record from the array.

7
  • Can you show briefly what you would expect as the output (only for the duplicated items) (i.e. do you want the first or last of the duplicate items) Commented Oct 28, 2018 at 19:28
  • @sNniffer May I ask why you prefer an overcomplicated answer which is O(n²) instead of O(1)? An example on my local machine with 20,000 comments having random IDs between 0 and 20,000 on PHP 5.4 showed 34 milliseconds (my four lines) vs. 11 minutes (accepted answer), and takes more memory.) See example here: sandbox.onlinephpfunctions.com/code/… Commented Oct 29, 2018 at 11:51
  • @steffen your solution is O(n), while the accepted solution is O(n²). The accepted answer can also handle multiple criteria which is something that indexing by a single value cannot achieve in a reliable way. The accepted answer is the general solution for de-duplication by any criteria. Commented Oct 30, 2018 at 6:24
  • @RalphRitoch Yes, sorry I meant O(n). I however think that your solution might be practical in some cases, but certainly not here where the question is very specific on the criteria. Removing duplicates which share the same ID - that's implemented in 5 seconds and extremely fast. I also disagree on you saying it's the "general solution for deduplication". Actually most implementations in modern languages implement a set of objects without duplicates by mapping these objects to integers (=hashes). I've been well aware of all the things you've commented when I wrote my answer and read yours. Commented Oct 30, 2018 at 7:20
  • @steffen than you also know hashes are prone to collisions but yes, a solution is possible in o(n) by generating hashes but than you run the risk of collisions. Commented Oct 30, 2018 at 7:25

3 Answers 3

1

Here you go, use array_filter:

<?php
$comment = [
    [
        "created_time" => new DateTime(),
        "from" => [
            "name" => "test1",
            "id" => 1
        ],
        "message" => "test1",
        "id" => 4234234214123412341234124
    ],
    [
        "created_time" => new DateTime(),
        "from" => [
            "name" => "test2",
            "id" => 1
        ],
        "message" => "test2",
        "id" => 17481419471248
    ]
];
$temp = array();
$comment = array_filter($comment, function ($v) use (&$temp) {
    if (in_array($v['from']['id'], $temp)) {
        return false;
    } else {
        array_push($temp, $v['from']['id']);
        return true;
    }
});

var_dump($comment);
Sign up to request clarification or add additional context in comments.

3 Comments

Don't use in_array, that'll result in bad performance. Just $temp['from']['id'] = true and then isset($temp['from']['id']). Or in short: return !isset($temp['from']['id']) ? !($temp['from']['id'] = 1) : true;
@steffen this solution is actually better than yours because it can handle object keys.
This solution can't easily handle multiple keys but using count(array_filter()) > 0 in place of in_array() such as in my solution can resolve that small issue. This solution is also o(n²) as in_array is o(n) * o(n) of the array_filter()
0

Here is a functional solution using array_reduce and array_filter which has no side effects.

Replace $arr_in with the variable containing your data, and you can replace the code following // Check if keys match with any rule you need for considering records a duplicate.

$arr_out = array_reduce ( 
    $arr_in, // Input Array
    function($out,$item) { 
        return 
            count(
                array_filter(
                    $out,
                    function($e) use(&$item) {
                        // Check if keys match
                        return $e["from"]["id"] == $item["from"]["id"]; 
                    }
                )
            ) > 0 ? 
            $out : 
            array_merge($out,array($item)); 
    }, 
    array() 
);

Comments

0

Just index it:

foreach ($comments as $el) {
    $temp[$el['from']['id']] = $el;
}
$comments = array_values($temp);

3 Comments

This solution cannot handle multiple criteria. For example if a record with the same number of child elements and two matching fields determines a duplicate such as $a["k1"] == $b["k1"] && $a["k2"] == $b["k2"] && (count($a["children"]) == count($b["children"]) ) than using an associative array isn't going to solve the problem. Your solution only works for a single key, and only if that key isn't an object.
@RalphRitoch I'm sorry, but you are wrong here. This solution is capable handling multiple criteria. Calculating a key by more data, even with serialize() or json_encode(), is still O(n) and so the preferable solution. Try the benchmark if you don't believe it.
your solution doesn't use serialize though so it will crash on object keys. As for benchmarking, I know the difference between O(n) and O(n²). The serialize() may be a good optimization but it isn't clear code. Using my solution its easy to make a functional function out of it which accepts the array and a matcher function making it a general function for any case.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.