Hella New AI Papers – Aug 24, 2024

Read/listen to the substack newsletter:

Support my learning journey either by clicking the Join button above, becoming a Patreon member, or a one-time Venmo!

Discuss this stuff with other Tunadorks on Discord

All my other links

Timestamps:
00:00 Intro
01:09 Tree Attention – Topology-aware Decoding for Long-Context Attention on GPU clusters
02:46 MoFO – Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
03:31 Multi-Meta-RAG – Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata
04:03 xGen-MM (BLIP-3) – A Family of Open Large Multimodal Models
04:39 Automated Design of Agentic Systems
06:15 KAN 2.0 – Kolmogorov-Arnold Networks Meet Science
07:11 Solving a Rubik’s Cube Using its Local Graph Structure
07:49 Transfusion – Predict the Next Token and Diffuse Images with One Multi-Modal Model
08:55 Scaling Law with Learning Rate Annealing
09:49 Recurrent NNs Learn to Store and Generate Sequences using Non-Linear Representations
11:17 Learning Randomized Algorithms with Transformers
12:48 Beyond English-Centric LLMs – What Language Do Multilingual LMs Think in?
15:16 HMoE – Heterogeneous MoE for LMing
16:13 Strategist – Learning Strategic Skills by LLMs via Bi-Level Tree Search
17:00 Demystifying the Communication Characteristics for Distributed Transformers
18:05 The Exploration-Exploitation Dilemma Revisited – An Entropy Perspective
18:53 Performance Law of LLMs
19:47 Importance Weighting Can Help LLMs Self-Improve
20:46 ML with Physics Knowledge for Prediction – A Survey
21:12 Faster Adaptive Decentralized Learning Algorithms
21:38 AdapMoE – Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference
22:45 Acquiring Bidirectionality via Large and Small LMs
24:31 Attention is a smoothed cubic spline
25:28 Latent Causal Probing – A Formal Perspective on Probing with Causal Models of Data
26:40 From pixels to planning – scale-free active inference
28:14 Critique-out-Loud Reward Models
29:20 FocusLLM – Scaling LLM’s Context by Parallel Decoding
30:55 Memorization In In-Context Learning
32:32 First Activations Matter – Training-Free Methods for Dynamic Activation in LLMs
33:01 Empirical Equilibria in Agent-based Economic systems with Learning agents
34:23 Matmul or No Matmal in the Era of 1-bit LLMs
35:13 Scaling Laws with Vocabulary – Larger Models Deserve Larger Vocabularies
35:50 LLM Pruning and Distillation in Practice – The Minitron Approach
36:20 Mission – Impossible LMs
38:39 Let Me Speak Freely – A Study on the Impact of Format Restrictions on Performance of LLMs
39:42 Controllable Text Generation for LLMs – A Survey
40:25 Jamba-1.5 – Hybrid Transformer-Mamba Models at Scale
41:59 Not All Samples Should Be Utilized Equally – Towards Understanding and Improving Dataset Distillation
43:04 Search-Based LLMs for Code Optimization
43:35 Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
44:17 Loss of plasticity in deep continual learning
45:50 What’s Really Going On in ML? Some Minimal Models
46:18 The graphical brain – Belief propagation and active inference
46:59 Outro

[ad_2]

source