I have a dataframe. It contains the amount of sales for different items across different sales outlets. The dataframe shown below only shows few of the items across few sales outlets. There's a bench mark of 100 items per day sale for each item. For each item that's sold more than 100, it is marked as "Yes" and those below 100 is marked as "No"
val df1 = Seq(
("Mumbai", 90, 109, , 101, 78, ............., "No", "Yes", "Yes", "No", .....),
("Singapore", 149, 129, , 201, 107, ............., "Yes", "Yes", "Yes", "Yes", .....),
("Hawaii", 127, 101, , 98, 109, ............., "Yes", "Yes", "No", "Yes", .....),
("New York", 146, 130, , 173, 117, ............., "Yes", "Yes", "Yes", "Yes", .....),
("Los Angeles", 94, 99, , 95, 113, ............., "No", "No", "No", "Yes", .....),
("Dubai", 201, 229, , 265, 317, ............., "Yes", "Yes", "Yes", "Yes", .....),
("Bangalore", 56, 89, , 61, 77, ............., "No", "No", "No", "No", .....))
.toDF("Outlet","Boys_Toys","Girls_Toys","Men_Shoes","Ladies_shoes", ............., "BT>100", "GT>100", "MS>100", "LS>100", .....)
Now,I want to add a column "Count_of_Yes" in which for each sales outlets (each row), the value of the column "Count_of_Yes" will be the total number of "Yes" in that row. How do I iterate over each row to get the count of Yes?
My expected dataframe should be
val output_df = Seq(
("Mumbai", 90, 109, , 101, 78, ............., "No", "Yes", "Yes", "No", ....., 2),
("Singapore", 149, 129, , 201, 107, ............., "Yes", "Yes", "Yes", "Yes", ....., 4),
("Hawaii", 127, 101, , 98, 109, ............., "Yes", "Yes", "No", "Yes", ....., 3),
("New York", 146, 130, , 173, 117, ............., "Yes", "Yes", "Yes", "Yes", ....., 4),
("Los Angeles", 94, 99, , 95, 113, ............., "No", "No", "No", "Yes", ....., 1),
("Dubai", 201, 229, , 265, 317, ............., "Yes", "Yes", "Yes", "Yes", ....., 4),
("Bangalore", 56, 89, , 61, 77, ............., "No", "No", "No", "No", ....., 0))
.toDF("Outlet","Boys_Toys","Girls_Toys","Men_Shoes","Ladies_shoes", ............., "BT>100", "GT>100", "MS>100", "LS>100", ....., "Count_of_Yes")