LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

openbmb/minicpm-v • • 18 Mar 2024

To address the challenges, we present LLaVA-UHD, a large multimodal model that can efficiently perceive images in any aspect ratio and high resolution.

3,570

6.68 stars / hour

Paper
Code

LightAutoML: AutoML Solution for a Large Financial Services Ecosystem

sb-ai-lab/lightautoml • • 3 Sep 2021

We present an AutoML system called LightAutoML developed for a large European financial services company and its ecosystem satisfying the set of idiosyncratic requirements that this ecosystem has for AutoML solutions.

AutoML

1,029

1.52 stars / hour

Paper
Code

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

kongds/mora • • 20 May 2024

Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models.

Continual Pretraining Mathematical Reasoning

100

1.26 stars / hour

Paper
Code

Diffusion for World Modeling: Visual Details Matter in Atari

eloialonso/diamond • • 20 May 2024

Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model.

Image Generation reinforcement-learning

1.22 stars / hour

Paper
Code

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

opengvlab/internvl • • 25 Apr 2024

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.

Ranked #6 on Visual Question Answering on MM-Vet

4k Language Modelling +3

2,750

1.18 stars / hour

Paper
Code

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

tencent/hunyuandit • • 14 May 2024

For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images.

Image Generation Language Modelling +2

1,875

1.17 stars / hour

Paper
Code

Retrieval-Augmented Generation for AI-Generated Content: A Survey

pku-dair/rag-survey • 29 Feb 2024

We first classify RAG foundations according to how the retriever augments the generator, distilling the fundamental abstractions of the augmentation methodologies for various retrievers and generators.

Information Retrieval Large Language Model +2

126

1.08 stars / hour

Paper
Code

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

idea-research/grounding-dino-1.5-api • 16 May 2024

Empirical results demonstrate the effectiveness of Grounding DINO 1. 5, with the Grounding DINO 1. 5 Pro model attaining a 54. 3 AP on the COCO detection benchmark and a 55. 7 AP on the LVIS-minival zero-shot transfer benchmark, setting new records for open-set object detection.

Ranked #1 on Zero-Shot Object Detection on LVIS v1.0 val (using extra training data)

Edge-computing object-detection +1

367

1.05 stars / hour

Paper
Code

A decoder-only foundation model for time-series forecasting

google-research/timesfm • • 14 Oct 2023

Motivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset.

Decoder Time Series +1

2,521

1.01 stars / hour

Paper
Code

EasySpider: A No-Code Visual System for Crawling the Web

NaiboWang/EasySpider • ACM The Web Conference 2023

As such, web-crawling is an essential tool for both computational and non-computational scientists to conduct research.

Data Integration Marketing

26,051

0.94 stars / hour

Paper
Code

Trending Research