multi-modal

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

vqa awesome-list multi-modal multi-modal-learning attention-networks

Updated Feb 9, 2023

microsoft / farmvibes-ai

Star

FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability

weather sustainability ai agriculture geospatial remote-sensing multi-modal geospatial-analytics stac

Updated May 18, 2023
Jupyter Notebook

tangxyw / RecSysPapers

Star

推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.

Updated May 24, 2023
Python

boschresearch / OASIS

Star

Official implementation of the paper "You Only Need Adversarial Supervision for Semantic Image Synthesis" (ICLR 2021)

machine-learning computer-vision deep-learning pytorch gan image-generation multi-modal generative-adversarial-networks oasis image-to-image-translation bcai semantic-image-synthesis iclr2021 label-to-image-translation

Updated Nov 8, 2022
Python

EndlessSora / TSIT

Star

[ECCV 2020 Spotlight] A Simple and Versatile Framework for Image-to-Image Translation

generative-adversarial-network gan style-transfer image-manipulation image-generation versatile multi-modal feature-transformation image-to-image-translation multi-scale two-stream-networks semantic-image-synthesis eccv2020

Updated Nov 28, 2021
Python

v-iashin / SpecVQGAN

Star

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

audio video pytorch transformer gan multi-modal evaluation-metrics video-understanding vas video-features vqvae bmvc melgan audio-generation vggsound

Updated Apr 1, 2023
Jupyter Notebook

IntelLabs / fastRAG

Star

Efficient Retrieval Augmentation and Generation Framework

nlp benchmark information-retrieval transformers knowledge-graph question-answering summarization multi-modal semantic-search diffusion sentence-transformers colbert

Updated May 21, 2023
Python

junchen14 / Multi-Modal-Transformer

Star

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.

language multi-modal image-transformer vision-transformer video-language efficiency-transformer video-transformer mlp-mixer transformer-readling-list multi-modal-cvpr2021