I am currently doing a project where I am using a CNN for text classification on tweet data but am unsure of what pre-processing steps need to be taken before the actual model is coded. I can't seem to find any resources on what pre-processing needs to be done to the actual dataset.
So far I have done special character removal, made each tweet all lower case and initiated stop-word removal. I understand vectorization will need to be done but unsure of other steps or what order they need to be done in.
This is being coded in python.