OpenGVLab
@opengvlab
Shanghai AI Lab, General Purpose Vision Team. We created InternImage, BEVFormer, VideoMAEv2, LLaMA-Adapter-V2, Ask-Anything, and many more!
OpenGVLab’s Tweets
#DragGAN demo is now live! We have also open-sourced our implemention of DragGAN. Check out github.com/Zeqiang-Lai/Dr
Quote Tweet
Our team member (Zeqiang Lai) has reproduced #DragGAN and integrated it into InternGPT! Have a try!
Demo: github.com/OpenGVLab/Inte
Code: github.com/Zeqiang-Lai/Dr
Show this thread
1:32
4.4K views
2
2
4
Show this thread
Quote Tweet
Our team member (Zeqiang Lai) has reproduced #DragGAN and integrated it into InternGPT! Have a try!
Demo: github.com/OpenGVLab/Inte
Code: github.com/Zeqiang-Lai/Dr
Show this thread
1:32
4.4K views
1
Our team member (Zeqiang Lai) has reproduced #DragGAN and integrated it into InternGPT! Have a try! 🏄♂️
Demo: github.com/OpenGVLab/Inte
Code: github.com/Zeqiang-Lai/Dr
1:32
4.4K views
3
52
141
Show this thread
After 15-month reviewing, our extension of UniFormer has been accepted by TPAMI🎉. In the major revision, we add our earlier exploration for building lightweight models. The simple models run so fast on CPU/GPU. Fast Demo: huggingface.co/spaces/Andy162
Project: github.com/Sense-X/UniFor
2
8
InternGPT can now generate images based on audio input by incorporating 's ImageBind into our pipeline.
learn more: github.com/OpenGVLab/Inte
0:57
39 views
Quote Tweet
Very excited to announce our work, InternChat. We can now interact with #ChatGPT using the cursor, bringing human interaction to a whole new level.
Demo built with @Gradio
No OpenAI API key needed for a limited time!
For the demo and more samples:
github.com/OpenGVLab/Inte
Show this thread
1
Exciting progress for FocalNet! Wonder if the new method can be applied to other players on the chart too.
Our InternImage at 65.4mAP is feeling the heat🔥
Quote Tweet
Our FocalNet is shining again! Combing <700M focalnet-huge and stable-DINO, we achieved 64.8 mAP with only ImageNet-22K and Object365, without any test time augmentation! It beats EVA and only lags behind 3B InternImage, gives you the strongest reproducible object detector! twitter.com/jw2yang4ai/sta…
Show this thread
1
2
Chatbot Arena meets multi-modality!
Multi-Modality Arena allows you to benchmark #LLMs side-by-side while providing images as inputs.
Which model would you like to see being supported?
Demo at vlarena.opengvlab.com
built with
Code: github.com/OpenGVLab/Mult
Quote Tweet
Announcing the Week 2 update for the Chatbot Arena leaderboard!
We've added some new models that are showcasing strong performance. Currently, @OpenAI's GPT-4 and @AnthropicAI's Claude lead the pack, with open-source models in hot pursuit.
More findings: lmsys.org/blog/2023-05-1
Show this thread
Demo now live!
Our latest VideoChat connects video foundation models with #LLMs via a learnable neural interface in an end-to-end manner, exhibiting video deep video understanding such as "Why is this video funny?"
Demo & more: github.com/OpenGVLab/Ask-
1:52
42 views
1
7
A quick view of our featured samples. Click the chimney and turn it into Eiffel Tower? No problem. More features to be discovered on our GitHub Page!
github.com/OpenGVLab/Inte
GIF
1
3
Show this thread
Very excited to announce our work, InternChat. We can now interact with #ChatGPT using the cursor, bringing human interaction to a whole new level.
Demo built with
No OpenAI API key needed for a limited time!
For the demo and more samples:
github.com/OpenGVLab/Inte
1
17
49
Show this thread
VideoMAE V2: We scale the already successful VideoMAE to 1 billion parameters and bringing the dataset to million-scale, resulting in new #SOTA on Something-Something and Kinetics, overcoming VRAM consumption, overfitting, and many more hurdles.
github.com/OpenGVLab/Vide
Quote Tweet
It was a great month for open source: So many LLMs came out that it's become quite overwhelming to keep track of it all.
So, in this month's Ahead of AI issue, I am sharing resources and research insights on the latest open-source LLMs & datasets!
magazine.sebastianraschka.com/p/ahead-of-ai-
HumanBench is a foundation model centered around, you guessed it, humans! It can generalize across tasks such as ReID, pose, pedestrian detection, and many more, obtaining SOTA on 17 relevant datasets.
project led by Prof.
code: github.com/OpenGVLab/Huma
0:36
24 views
LLaMA-Adapter V2: This update allows the 65B model to surpass #ChatGPT on some questions in terms of response quality, while also outperforming #Vicuna
Congratulations to our team!
Code at github.com/ZrrSkywalker/L
Follow us for more🫰
Quote Tweet
Show this thread
2:11
13.8K views
1
6
You thought image-text tools could not get better?
VIDEO-TEXT is here 🤯
Ask-Anything is a simple yet interesting tool for chatting about video with ChatGPT, MiniGPT4 and StableLM.
1:07
1K views
4
11
29
Show this thread
Ask-Anything, tool for chatting about video with chatGPT, miniGPT4 and StableLM
github: github.com/OpenGVLab/Ask-
demo: http://106.14.223.212:7860/
1:09
35.2K views
16
167
608
UniHCP is a human centric foundation model that can surpass expert models on many metrics!
github.com/OpenGVLab/Huma
Thanks! Yeap, here I also attach some interesting videos projects, which might bring some inspiration:
Socratic Models: arxiv.org/abs/2204.00598
Ask-Anything: github.com/OpenGVLab/Ask-
LaViLa: github.com/facebookresear
Vid2Seq: ai.googleblog.com/2023/03/vid2se
2
5
chatGPT + video! 👉 Ask-Anything
We have combined #ChatGPT with our video understanding models, come check it out!
github.com/OpenGVLab/Ask-
1:09
61 views
1
1
Check out our new End-to-end Autonomous Driving Challenge, featuring 4 new tracks and a total of $100K award! 🔥
Baseline results will be based on the general vision model, InternImage (github.com/OpenGVLab/Inte).
Learn more: opendrivelab.com/AD23Challenge.
#CVPR2023 #SelfDrivingCars
1
6
🏆New record in COCO object detection, and SOTA in 18 vision tasks, using only ONE model!
Code: github.com/OpenGVLab/Inte
Model, code, TensorRT deployment, and inference API coming soon🔥
1
1
Quote Tweet
Transformers v4.22 is out, and includes the first VIDEO models!
VideoMAE: masked auto-encoders for video
X-CLIP: CLIP for video-language
Other nice goodies:
Swin Transformer v2
Pegasus-X
Donut
MobileViT
... and MacOS support (device="mps")!
Show this thread
1












