pix2seq - わかめtube

Pix2Seq: A Language Modeling Framework for Object Detection

Pix2Seq: A Language Modeling Framework for Object Detection

3 years ago - 7:33

Ting Chen | Pix2Seq: A New Language Interface for Object Detection and Beyond

London Machine Learning Meetup

Ting Chen | Pix2Seq: A New Language Interface for Object Detection and Beyond

3 years ago - 58:39

2022/06/05 - (paper) Pix2Seq: A New Language Interface for Object Detection

Machine Learning Group

2022/06/05 - (paper) Pix2Seq: A New Language Interface for Object Detection

3 years ago - 47:22

Abhinav Rao - Pix2Seq: A Language Modeling Framework for Object Detection (PRS 1.1)

Abhinav Rao - Pix2Seq: A Language Modeling Framework for Object Detection (PRS 1.1)

3 years ago - 31:40

【点论文】295 Pix2seq: A Language Modeling Framework for Object Detection

ThinkNotClearzh

【点论文】295 Pix2seq: A Language Modeling Framework for Object Detection

2 years ago - 16:51

Object Detection Part 7: Detection Transformers (DETR), Object Queries

Object Detection Part 7: Detection Transformers (DETR), Object Queries

1 year ago - 4:28

PR-348: Pix2seq: A Language Modeling Framework for Object Detection

만끽 MaanGeek

PR-348: Pix2seq: A Language Modeling Framework for Object Detection

3 years ago - 31:13

[2022 ICLR] Pix2Seq

딥러닝논문읽기모임

[2022 ICLR] Pix2Seq

2 years ago - 21:41

DETR: End-to-End Object Detection with Transformers | Paper Explained

Aleksa Gordić - The AI Epiphany

DETR: End-to-End Object Detection with Transformers | Paper Explained

4 years ago - 31:19

nanogpt for Speaker Diarization

nanogpt for Speaker Diarization

1 year ago - 24:59

Machine Learning Group

Machine Learning Group

In this channel we present all the virtual meetings recorded. Each video is about one session of the explanation of one topic of ...

@machinelearninggroup3450 subscribers

MDETR: Modulated Detection for End-to-End Multi-Modal Understanding

Microsoft Research

MDETR: Modulated Detection for End-to-End Multi-Modal Understanding

3 years ago - 1:13:28

SDtCS

SDtCS

We are a bunch of undergrads united by one mission, to help push innovation toward a better future. And we hold a firm belief in ...

@sdtcs subscribers

만끽 MaanGeek

만끽 MaanGeek

@maangeek subscribers

만끽 MaanGeek

만끽 MaanGeek

@maangeek subscribers

Towards Sheet Music Information Retrieval: A Unified Approach Using Multitask Transformers @ WoRMS24

Optical Music Recognition Research

Towards Sheet Music Information Retrieval: A Unified Approach Using Multitask Transformers @ WoRMS24

7 months ago - 14:19

ICLR 2023 Workshop on Sparsity in Neural Networks - Introduction

Cerebras Systems

ICLR 2023 Workshop on Sparsity in Neural Networks - Introduction

2 years ago - 4:44

ICLR23 SR4AD-01 Introduction and opening remarks(Li Chen)

ICLR23 SR4AD-01 Introduction and opening remarks(Li Chen)

2 years ago - 7:35

Poppy drawing a lion 01

Poppy drawing a lion 01

5 years ago - 2:51

Poppy drawing a lion 02

Poppy drawing a lion 02

5 years ago - 2:43

Poppy drawing a bike 02

Poppy drawing a bike 02

5 years ago - 2:33

Lucas Beyer - Computer Vision in the Age of LLMs | ML in PL 2024

Lucas Beyer - Computer Vision in the Age of LLMs | ML in PL 2024

6 months ago - 49:53

Unified-IO: A Unified Model for Vision, Language and Multi-Modal Tasks

USC Information Sciences Institute

Unified-IO: A Unified Model for Vision, Language and Multi-Modal Tasks

2 years ago - 49:12

12-in-1: Multi-Task Vision and Language Representation Learning

ComputerVisionFoundation Videos

12-in-1: Multi-Task Vision and Language Representation Learning

5 years ago - 1:02

Local Ai Qwen3 Coder 480B 1MILLION CTX Runs Like ????

Digital Spaceport

Local Ai Qwen3 Coder 480B 1MILLION CTX Runs Like ????

2 days ago - 0:24

Multimodal Object Detection via Probabilistic Ensembling

Multimodal Object Detection via Probabilistic Ensembling

3 years ago - 1:28

Poppy drawing a bike 01

Poppy drawing a bike 01

5 years ago - 2:24

Tightly Connecting Vision and Language

Microsoft Research

Tightly Connecting Vision and Language

3 years ago - 1:07:38

[CVPR 2021] "Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval", CVPR 2021.

[CVPR 2021] "Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval", CVPR 2021.

4 years ago - 4:57

ICLR 2023 Spotlight: Planning Goals for Exploration

ICLR 2023 Spotlight: Planning Goals for Exploration

2 years ago - 10:52

❓ How to use AI Agents - 13 WAYS to GET STABLE PROFIT with AI | AI Agents Tutorial | AI Tutorial

Alex Novak - AI Guide

❓ How to use AI Agents - 13 WAYS to GET STABLE PROFIT with AI | AI Agents Tutorial | AI Tutorial

-

ACM-MM'21 (Oral) TSA-Net: Tube Self-Attention Network for Action Quality Assessment

ACM-MM'21 (Oral) TSA-Net: Tube Self-Attention Network for Action Quality Assessment

3 years ago - 11:57

GeoAI: Applying Deep Learning to Geospatial Data

MS GRSS IEEE Student Chapter

GeoAI: Applying Deep Learning to Geospatial Data

Streamed 4 years ago - 1:39:26

Open-Vocabulary Visual Perception upon Frozen Vision and Language Models (Yin Cui, Google)

Computer Vision in the Wild (CVinW)

Open-Vocabulary Visual Perception upon Frozen Vision and Language Models (Yin Cui, Google)

2 years ago - 32:24

What Makes Good In Context Examples for GPT 3

Namrata Shivagunde

What Makes Good In Context Examples for GPT 3

3 years ago - 7:43

Paper Review: Selfless Sequential Learning (ICLR 2019)

Deep Learning Simplified

Paper Review: Selfless Sequential Learning (ICLR 2019)

3 years ago - 15:54

ICLR 2022 Oral: Language modeling via stochastic processes

ICLR 2022 Oral: Language modeling via stochastic processes

3 years ago - 12:43

Intuitions on Lifelong Machine Learning and Open World Object Detection

Intuitions on Lifelong Machine Learning and Open World Object Detection

4 years ago - 23:28

CLVision Poster: "Towards Open World Object Detection"

CLVision Poster: "Towards Open World Object Detection"

4 years ago - 4:51

The Machine Learning Experience - Classifying Flowers with Scikit-Learn

The Machine Learning Experience - Classifying Flowers with Scikit-Learn

2 days ago - 54:54

Lucas Beyer (Google DeepMind) - Convergence of Vision & Language

Aleksa Gordić - The AI Epiphany

Lucas Beyer (Google DeepMind) - Convergence of Vision & Language

1 year ago - 55:08

[VLP Tutorial @ CVPR 2022] Image-Text Pre-training Part II

Microsoft Research

[VLP Tutorial @ CVPR 2022] Image-Text Pre-training Part II

3 years ago - 40:58

Trends in Machine Learning at ICLR 2022 - Brief Overview

RelationalAI - AI for Relational Data

Trends in Machine Learning at ICLR 2022 - Brief Overview

3 years ago - 56:38

What is KOSMOS-2?

What is KOSMOS-2?

2 years ago - 14:47

This Embodied LLM is...

This Embodied LLM is...

2 years ago - 32:50

Ting Chen - Mesenchymal Niche Regulated RegionalEpithelial Regeneration

Ting Chen - Mesenchymal Niche Regulated RegionalEpithelial Regeneration

3 years ago - 28:06

Tips and Tricks from Computer Vision Experts: Object Detection

Artificial Intelligence Association of Lithuania

Tips and Tricks from Computer Vision Experts: Object Detection

Streamed 3 years ago - 58:49

[2D Perception]Demo - Pick and Place system with 6-DOF Pose Estimation using DOPE

[2D Perception]Demo - Pick and Place system with 6-DOF Pose Estimation using DOPE

2 years ago - 1:49

Quadbox: Quadrilateral Bounding Box Based Scene Text Detection Using Vector Regression

Prateek Keserwani

Quadbox: Quadrilateral Bounding Box Based Scene Text Detection Using Vector Regression

3 years ago - 1:28

Medical Imaging - my approach how to tackle an object detection task

Medical Imaging - my approach how to tackle an object detection task

Intro ...

3 years ago - 15:46

Autoregressive Conditional Neural Processes (ICLR 2023)

Wessel Bruinsma

Autoregressive Conditional Neural Processes (ICLR 2023)

2 years ago - 5:05

PR-354: Data-driven Interior Plan Generation for Residential Buildings & Graph2Plan

PR-354: Data-driven Interior Plan Generation for Residential Buildings & Graph2Plan

3 years ago - 29:58

[ICLR 2023 Oral] QuAnt: Quantum Annealing with Learnt Couplings

Vlad Golyanik (4DQV)

[ICLR 2023 Oral] QuAnt: Quantum Annealing with Learnt Couplings

2 years ago - 9:39

CVPR #18528 - Workshop on Foundation Models: 1st Foundation Model Challenge

ComputerVisionFoundation Videos

CVPR #18528 - Workshop on Foundation Models: 1st Foundation Model Challenge

1 year ago - 2:24:25

PR-342: Playable Video Generation (CVPR 2021 (Oral), 한국어 리뷰)

PR-342: Playable Video Generation (CVPR 2021 (Oral), 한국어 리뷰)

3 years ago - 20:31

PR-350: Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation (CVPR 2021, 한국어 리뷰)

PR-350: Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation (CVPR 2021, 한국어 리뷰)

3 years ago - 18:59

reComputer Object Detection and Tracking: The Role of YOLOv8 in Shaping Results

reComputer Object Detection and Tracking: The Role of YOLOv8 in Shaping Results

2 days ago - 1:38

[Paper Reading] PaliGemma

[Paper Reading] PaliGemma

Streamed 11 months ago - 1:06:10

[ECCV 2022 Oral] PressureVision: Estimating Hand Pressure from a Single RGB Image

[ECCV 2022 Oral] PressureVision: Estimating Hand Pressure from a Single RGB Image

2 years ago - 11:10

PR-335: Self-supervised Learning for Large-scale Item Recommendations

PR-335: Self-supervised Learning for Large-scale Item Recommendations

3 years ago - 32:32

AI-Powered Safety Gear Detection System | Deep Learning Helmet, Mask & Vest Detection 2025

AI-Powered Safety Gear Detection System | Deep Learning Helmet, Mask & Vest Detection 2025

1 day ago - 2:26

PR-351: Adaptive Aggregation Networks for Class-Incremental Learning

PR-351: Adaptive Aggregation Networks for Class-Incremental Learning

3 years ago - 30:26

[2022 ICLR] MobileViT : Light-weight, general-purpose, and Mobile-friendly Vision Transformer

딥러닝논문읽기모임

[2022 ICLR] MobileViT : Light-weight, general-purpose, and Mobile-friendly Vision Transformer

3 years ago - 16:42

RunwayML and Google Colab: Week 6 (Next Frame Prediction and Style Transfer)

Artificial Images

RunwayML and Google Colab: Week 6 (Next Frame Prediction and Style Transfer)

4 years ago - 1:01:40

[Lect 7-1] Object Localization and Classification

Dr. Khaled Mostafa Elsayed

[Lect 7-1] Object Localization and Classification

4 years ago - 17:59

PR-352: ImageBART: Bidirectional Context with Multinomial Diffusion for AR Image Synthesis

PR-352: ImageBART: Bidirectional Context with Multinomial Diffusion for AR Image Synthesis

3 years ago - 45:43

[DeepReader] R-CNN

[DeepReader] R-CNN

4 years ago - 4:50

A Quantitative Analysis of Statistical and Graph-Based Term Weighting Schemes for Keyword Extraction

딥러닝논문읽기모임

A Quantitative Analysis of Statistical and Graph-Based Term Weighting Schemes for Keyword Extraction

2 years ago - 16:00

Contrastive Self-Supervised Learning and Potential Limitations - Dr Ting Chen from Google Brain

JumpTrading ELLIS UCL CSML Seminar Series

Contrastive Self-Supervised Learning and Potential Limitations - Dr Ting Chen from Google Brain

3 years ago - 55:12

PR-349: Adversarial Generation of Continuous Images

PR-349: Adversarial Generation of Continuous Images

3 years ago - 42:23

Open-Source AI Camera System – Pose, Segmentation, Detection & Camera Management

Nicolai Nielsen

Open-Source AI Camera System – Pose, Segmentation, Detection & Camera Management

3 days ago - 1:00

Training YOLOv12 for Detecting Pothole Severity Using a Custom Dataset - Video Input Demo

Training YOLOv12 for Detecting Pothole Severity Using a Custom Dataset - Video Input Demo

6 days ago - 0:46

Training YOLOv12 for Detecting Pothole Severity Using a Custom Dataset - Image Input Demo

Training YOLOv12 for Detecting Pothole Severity Using a Custom Dataset - Image Input Demo

6 days ago - 0:37

ICLR 2023 Impressions from Kigali

ICLR 2023 Impressions from Kigali

2 years ago - 3:24

Self-Supervised Learning based on Heat Equation

딥러닝논문읽기모임

Self-Supervised Learning based on Heat Equation

2 years ago - 8:51

PR-346: Super Tickets in Pre-Trained Language Models

PR-346: Super Tickets in Pre-Trained Language Models

3 years ago - 32:38

PR-339: Maintaining discrimination and fairness in class incremental learning

PR-339: Maintaining discrimination and fairness in class incremental learning

3 years ago - 29:47

Build a real-time multi camera tracking system | with Python

Build a real-time multi camera tracking system | with Python

11 days ago - 17:42

Robotics & AI Talk with Dr. Cordelia Schmid

MIRMI - Robotics and Machine Intelligence

Robotics & AI Talk with Dr. Cordelia Schmid

10 months ago - 49:53

NLP 7.4.1 Tree Neural Network الجزء الأول

NLP 7.4.1 Tree Neural Network الجزء الأول

3 years ago - 10:17

PR-341: Involution: Inverting the Inherence of Convolution for Visual Recognition

PR-341: Involution: Inverting the Inherence of Convolution for Visual Recognition

3 years ago - 33:33

[Google Research] Minerva - Solving Quantitative Reasoning Problems with Language Models

딥러닝논문읽기모임

[Google Research] Minerva - Solving Quantitative Reasoning Problems with Language Models

2 years ago - 14:43

[CVPR2023] ImageBind One Embedding Space To Bind Them All

딥러닝논문읽기모임

[CVPR2023] ImageBind One Embedding Space To Bind Them All

1 year ago - 15:19

PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

3 years ago - 28:06

Localizing piglets in pig farm with oriented bounding box

Localizing piglets in pig farm with oriented bounding box

4 years ago - 0:14

PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

JinWon Lee (DeepTube)

PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

3 years ago - 29:34

[유투브 딥러닝 논문읽기 모임] NeurlIPS Hard Negative Mixing for Contrastive learning

[유투브 딥러닝 논문읽기 모임] NeurlIPS Hard Negative Mixing for Contrastive learning

2 years ago - 9:22

19.12.23 Sequential Modeling Enables Scalable Learning for Large Vision Models

DS Talks Siberia

19.12.23 Sequential Modeling Enables Scalable Learning for Large Vision Models

1 year ago - 46:05

멀티모달 LLM - PaliGemma 모델을 활용해서 물체 검출(Object Detection) 하기

멀티모달 LLM - PaliGemma 모델을 활용해서 물체 검출(Object Detection) 하기

9 months ago - 15:46

[Paper Review] DALL-E : Zero-Shot Text-to-Image Generation

서울대학교 산업공학과 DSBA 연구실

[Paper Review] DALL-E : Zero-Shot Text-to-Image Generation

3 years ago - 46:00

What is one pixel attack? #neuralnetworks #ai #ml

What is one pixel attack? #neuralnetworks #ai #ml

8 days ago - 2:20

もっと読み込む