Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
Tools and recipes to train deep learning models and build services for NLP tasks such as text classification, semantic search ranking and recall fetching, cross-lingual information retrieval, and question answering etc.
Given a pair of questions on Quora, the NLP task aims to classify whether two given questions are duplicates or not. The underlying idea is to identify whether the pair of questions have the same intent though they might have been structured differently .
Code that exemplifies neural network solutions for classification tasks with DyNet. On top of that, the code demonstrates how to implement a custom classifier that is compatible with scikit-learn's API.
This dataset includes Twitter Handles extracted from Wikidata. For entity type classification task on handles, they are grouped into four groups: person, location, organization, product, and character.
This solution ranked 1st in the inclass competition on Kaggle out of 43 teams. The data challenge is a project taken to climax the kernel methods in machine learning course at AMMI-2020, aimed at the implementation of machine learning algorithms to gain understanding and further adapt them to structural data (DNA sequence data). In this report, we (Aissatou & I) present our approach to the challenge which was hosted on Kaggle with the goal of predict- ing whether a DNA sequence region is binding site to a specific transcription factor or not. Our best result ranked 1st on the private leader board with a score of 71.20% .