Ertugrul

Posted on Jun 25

Using Wavelets and Clustering to Predict Odd or Even Numbers: An Overengineered Approach with Pretty (But Confusing) Plots

Spoiler alert: If you just want to check parity, you only need to look at the last bit of a number's binary representation. But what if, just for fun, you tried to guess whether a number is odd or even by analyzing wavelet features extracted from its binary signal and clustering those features? Surprisingly, this quirky method achieves almost 70% accuracy — not bad for something completely unnecessary.

Motivation

Parity (odd/even) is one of the simplest integer properties, and can be computed directly with:

y = n % 2

But what if we approached this simple problem using advanced signal processing techniques, just for the sake of exploration and learning?

Idea: Convert numbers into binary bit signals, extract features at multiple resolutions using wavelets, cluster those features using unsupervised methods, and try to infer parity from the cluster structure.

This is a fun way to explore signal representation, unsupervised learning, feature engineering, and multi-scale analysis.

Step 1: Binary Signal Representation

Each number n is converted into a binary signal:

x = (x₁, x₂, ..., x_L), where xᵢ ∈ {0,1}

The signal is zero-padded to the nearest power-of-two length to make it compatible with wavelet decomposition.

Step 2: Wavelet Decomposition

We apply multi-level discrete wavelet transform (DWT) using the Haar wavelet. The result is a set of coefficient vectors, each representing signal information at a different resolution level.

Step 3: Feature Extraction per Level

At each wavelet level, we extract 3 statistical features from the coefficients:

Energy: sum of squares of the coefficients
L2 Norm: Euclidean norm
Mean Absolute Value: average of absolute values

These features describe the strength and variation of the signal at each resolution.

Step 4: Clustering Each Feature Separately

For each of the 3 features at each wavelet level, we apply KMeans clustering with 2 clusters. Since the ground truth labels (odd/even) are not used during training, we later match clusters to parity classes based on which cluster contains more odd numbers.

This allows us to interpret each feature cluster as representing "more likely odd" or "more likely even".

Step 5: Estimating Probability of Oddness

For each number and each feature, we compute the probability of being odd as:

the fraction of odd numbers in the cluster it belongs to.

This gives us a 3D array of probabilities: (number, level, feature).

Step 6: Weighted Score and Final Prediction

We combine the probabilities into a single score using weighted averaging:

Each wavelet level is weighted (higher levels can have more influence).
Each feature is equally weighted.

The final oddness score Sₙ is computed as:

Sₙ = weighted_average(Pₙ)

Then:

Predict odd if Sₙ > 0.5
Predict even otherwise

Results: Surprisingly Good

On numbers from 1 to 999:

Final Accuracy: 69.67%

This is significantly better than random guessing (50%) and surprisingly high given the simplicity of the true rule (just check the last bit).

Visualization

A scatter plot shows final scores for each number, colored by actual parity:

plt.figure(figsize=(14,6))
plt.scatter(numbers, final_scores, c=['red' if l==1 else 'blue' for l in labels], label='True Odd (Red) / Even (Blue)')
plt.axhline(0.5, color='green', linestyle='--', label='Decision Threshold (0.5)')
plt.title("Wavelet Features + KMeans: Predicted Probability of Being Odd")
plt.xlabel("Number")
plt.ylabel("Score")
plt.legend()
plt.show()

Final Python Code

import numpy as np
import pywt
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

def number_to_bit_signal(num):
    bit_str = bin(num)[2:]
    bit_arr = np.array([int(b) for b in bit_str])
    pad_len = 2**int(np.ceil(np.log2(len(bit_arr)))) - len(bit_arr)
    return np.pad(bit_arr, (0, pad_len), 'constant')

def extract_features(num, wavelet='haar', max_level=3):
    sig = number_to_bit_signal(num)
    max_possible_level = pywt.dwt_max_level(len(sig), pywt.Wavelet(wavelet).dec_len)
    level = min(max_level, max_possible_level) if max_possible_level > 0 else 1
    coeffs = pywt.wavedec(sig, wavelet, level=level)
    features = []
    for c in coeffs:
        energy = np.sum(c**2)
        norm = np.linalg.norm(c)
        mean_abs = np.mean(np.abs(c))
        features.append([energy, norm, mean_abs])
    return features

numbers = np.arange(1, 1000)
labels = numbers % 2

all_features = [extract_features(n) for n in numbers]
max_level = max(len(f) for f in all_features)

features_by_level = []
for lvl in range(max_level):
    lvl_feats = []
    for f in all_features:
        if lvl < len(f):
            lvl_feats.append(f[lvl])
        else:
            lvl_feats.append([0,0,0])
    features_by_level.append(np.array(lvl_feats))

probabilities = np.zeros((len(numbers), max_level, 3))

for lvl in range(max_level):
    for feat_idx in range(3):
        X = features_by_level[lvl][:, feat_idx].reshape(-1, 1)
        kmeans = KMeans(n_clusters=2, random_state=42).fit(X)
        clusters = kmeans.labels_

        cluster0_mean = np.mean(labels[clusters==0])
        cluster1_mean = np.mean(labels[clusters==1])

        tek_cluster = 0 if cluster0_mean > cluster1_mean else 1

        for i, cl in enumerate(clusters):
            members = (clusters == cl)
            prob_tek = np.mean(labels[members])
            probabilities[i, lvl, feat_idx] = prob_tek

level_weights = np.linspace(0.5, 1.5, max_level)
feature_weights = np.array([1, 1, 1])

weighted_probs = probabilities * level_weights.reshape(1, max_level, 1) * feature_weights.reshape(1, 1, 3)
final_scores = np.sum(weighted_probs, axis=(1,2)) / np.sum(level_weights) / np.sum(feature_weights)

predicted_labels = (final_scores > 0.5).astype(int)
accuracy = np.mean(predicted_labels == labels)

print(f"Final Accuracy: {accuracy*100:.2f}%")

plt.figure(figsize=(14,6))
plt.scatter(numbers, final_scores, c=['red' if l==1 else 'blue' for l in labels], label='True Odd (Red) / Even (Blue)')
plt.axhline(0.5, color='green', linestyle='--', label='Decision Threshold (0.5)')
plt.title("Wavelet Features + KMeans: Predicted Probability of Being Odd")
plt.xlabel("Number")
plt.ylabel("Score")
plt.legend()
plt.show()

What Did We Learn?

You can classify parity with nearly 70% accuracy using wavelet features and KMeans — which is weirdly impressive.
Wavelets can extract structured signals even from binary data.
This is a playful but educational exercise in signal processing and clustering.

Next Steps

Try other features (skewness, entropy, etc.)
Use supervised models (e.g., logistic regression)
Apply to less trivial binary classification problems

TL;DR

A fancy, multi-level wavelet + clustering approach to guess odd/even numbers yields ~70% accuracy. Useless? Absolutely. Fun? Totally.

DEV Community