Skip to main content

Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

8 votes
1 answer
258 views

TF-IDF Implementation in Java

I have tried following the formulas for Term frequency–Inverse document frequency (TF-IDF) calculation and Cosine similarity calculation, and translated it into code. The results I get seems to be ...
Malde's user avatar
  • 81
7 votes
4 answers
801 views

JSON data format for MCQ data bank

I'm creating a data bank of MCQ (Multi Choice Questions) and their answers so that an app can be built around it. Regarding the actual storage format, I have two ideas: An array of objects with keys (...
Prahlad Yeri's user avatar
4 votes
1 answer
200 views

FPGrowth Algorithm Implementation

This is my implementation of the FPGrowth algorithm where, as an optimisation, I avoid re-creating the tree at each extension of the prefix, while I use a view representation that I think would be ...
jackb's user avatar
  • 113
5 votes
1 answer
234 views

A simple probabilistic AI for generating random sentences in Java

Motivation I have this repository. It contains a program that analyzes an input text file and builds a word graph: in the graph, each node represents a word in the analyzed text. Now, if there are two ...
coderodde's user avatar
  • 31.9k
1 vote
1 answer
323 views

Better way to create a contingency table with pandas for film genres from a Film DataFrame

From a public dataset available on film rating I created a contingency table as follow. Honestly I don't like all these "for-loops" I think the quality of the code can be definitely improved ...
Andrea Ciufo's user avatar
1 vote
0 answers
50 views

Relational join of two datasets

Front Matter I'm learning Scala and have not gotten used to functional programming and the language. I'm hoping a review of my naively implemented code can help me bridge my object-oriented ways to ...
Zhao Li's user avatar
  • 111
2 votes
0 answers
197 views

producer-consumer Pipeline problem implementation in asyncio

I wrote this code to make a non-blocking manager along with pipeline operations using asyncio, my main concern is to catch received items producer, and when the received operation is complete. I want ...
etyzz's user avatar
  • 21
1 vote
1 answer
140 views

Data mining in Java: finding undrawn lottery rows - follow-up

(See the previous (initial) iteration.) This time, I have substantially reduced the usage of the final and this keywords. Also, ...
coderodde's user avatar
  • 31.9k
2 votes
2 answers
154 views

Data mining in Java: finding undrawn lottery rows

(See the next iteration.) Introduction Suppose Evil Lottery Inc is interested in not paying millions of dollars back to players. They gather the drawn lottery rows first, after which they mine rows ...
coderodde's user avatar
  • 31.9k
2 votes
0 answers
753 views

How to avoid bottlenecks json processing with Apache Beam?

I have a input with some transaction data in json input (in this case a file) ...
Lin's user avatar
  • 357
2 votes
1 answer
510 views

Analyzing patient treatment data using Pandas

I work in the population health industry and get contracts from commercial companies to conduct research on their products. This is the general code to identify target patient groups from a provincial ...
KubiK888's user avatar
  • 225
3 votes
1 answer
3k views

Finding word association strengths from an input text

I have the written the following (crude) code to find the association strengths among the words in a given piece of text. ...
Kristada673's user avatar
2 votes
2 answers
318 views

Python program to rank based on the frequency of names that appears in text files

I've written a python program to rank the names that appear in the file(s) based on their frequency. In other words, there are multiple files and want to rank the frequency of the names that appears ...
nsivakr's user avatar
  • 163
1 vote
1 answer
1k views

Data scraping from Internet with Excel-VBA

I have built an Excel application, which gets all the titles of books from Amazon.com, which it is asked to do and scrapes the following data out of them: Book Title Author Price What do you need to ...
Vityata's user avatar
  • 329
5 votes
1 answer
6k views

k-means using numpy

This is k-means implementation using Python (numpy). I believe there is room for improvement when it comes to computing distances (given I'm using a list comprehension, maybe I could also pack it in a ...
Adel Redjimi's user avatar

15 30 50 per page