Hella New AI Papers – Aug 24, 2024
Read/listen to the substack newsletter:
Support my learning journey either by clicking the Join button above, becoming a Patreon member, or a one-time Venmo!
Discuss this stuff with other Tunadorks on Discord
All my other links
Timestamps:
00:00 Intro
01:09 Tree Attention – Topology-aware Decoding for Long-Context Attention on GPU clusters
02:46 MoFO – Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
03:31 Multi-Meta-RAG – Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata
04:03 xGen-MM (BLIP-3) – A Family of Open Large Multimodal Models
04:39 Automated Design of Agentic Systems
06:15 KAN 2.0 – Kolmogorov-Arnold Networks Meet Science
07:11 Solving a Rubik’s Cube Using its Local Graph Structure
07:49 Transfusion – Predict the Next Token and Diffuse Images with One Multi-Modal Model
08:55 Scaling Law with Learning Rate Annealing
09:49 Recurrent NNs Learn to Store and Generate Sequences using Non-Linear Representations
11:17 Learning Randomized Algorithms with Transformers
12:48 Beyond English-Centric LLMs – What Language Do Multilingual LMs Think in?
15:16 HMoE – Heterogeneous MoE for LMing
16:13 Strategist – Learning Strategic Skills by LLMs via Bi-Level Tree Search
17:00 Demystifying the Communication Characteristics for Distributed Transformers
18:05 The Exploration-Exploitation Dilemma Revisited – An Entropy Perspective
18:53 Performance Law of LLMs
19:47 Importance Weighting Can Help LLMs Self-Improve
20:46 ML with Physics Knowledge for Prediction – A Survey
21:12 Faster Adaptive Decentralized Learning Algorithms
21:38 AdapMoE – Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference
22:45 Acquiring Bidirectionality via Large and Small LMs
24:31 Attention is a smoothed cubic spline
25:28 Latent Causal Probing – A Formal Perspective on Probing with Causal Models of Data
26:40 From pixels to planning – scale-free active inference
28:14 Critique-out-Loud Reward Models
29:20 FocusLLM – Scaling LLM’s Context by Parallel Decoding
30:55 Memorization In In-Context Learning
32:32 First Activations Matter – Training-Free Methods for Dynamic Activation in LLMs
33:01 Empirical Equilibria in Agent-based Economic systems with Learning agents
34:23 Matmul or No Matmal in the Era of 1-bit LLMs
35:13 Scaling Laws with Vocabulary – Larger Models Deserve Larger Vocabularies
35:50 LLM Pruning and Distillation in Practice – The Minitron Approach
36:20 Mission – Impossible LMs
38:39 Let Me Speak Freely – A Study on the Impact of Format Restrictions on Performance of LLMs
39:42 Controllable Text Generation for LLMs – A Survey
40:25 Jamba-1.5 – Hybrid Transformer-Mamba Models at Scale
41:59 Not All Samples Should Be Utilized Equally – Towards Understanding and Improving Dataset Distillation
43:04 Search-Based LLMs for Code Optimization
43:35 Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
44:17 Loss of plasticity in deep continual learning
45:50 What’s Really Going On in ML? Some Minimal Models
46:18 The graphical brain – Belief propagation and active inference
46:59 Outro
[ad_2]
source
Not to many carzy banger papers here this time. More like loads of small progressions.
Small request: In case anything interesting in bayesian program learning comes out then id kindly ask you to concider covering it.
I strongly belive the theories of bpl could fundamentally enhance LLMs more then any novel solution.
Another nothingburger of a video 🔥
I like the lack of face tracking indication
"Although LLMs generate one token at a time, the entire sequence of past tokens must still be stored in memory"….."compute attention scores". Still gets me every time that this is what goes on in the background of massive LLMs when they're busy inferencing.
Thanks, very good content with excellent coverage. Saves me a lot of time.
Let's go!