arxiv:2410.16703

PLDR-LLM: Large Language Model from Power Law Decoder Representations

Published on Oct 22, 2024

Fromthesky Research Labs

Upvote

Authors:

Burc Gokden

Abstract

The PLDR-LLM utilizes a Power Law Graph Attention mechanism for generating deductive and inductive outputs, achieving competitive performance in zero-shot and few-shot settings compared to scaled dot-product LLMs.

AI-generated summary

We present the Large Language Model from Power Law Decoder Representations (PLDR-LLM), a language model that leverages non-linear and linear transformations through Power Law Graph Attention mechanism to generate well-defined deductive and inductive outputs. We pretrain the PLDR-LLMs of varying layer sizes with a small batch size of 32 and sim8B tokens from the RefinedWeb dataset, and show that they achieve competitive performance in zero-shot and few-shot settings compared to scaled dot-product LLMs of similar model size reported in the literature. We show that deductive outputs of PLDR-LLMs can be used to compare model characteristics or improve the performance by introducing the Directed Acyclic Graph (DAG) loss as a metric and regularizer. Our results indicate that the initial maximum learning rate and warm-up steps have a lasting impact on deductive outputs throughout the pretraining. We provide a detailed description of PLDR-LLM architecture, its implementation and the pretraining procedure.

View arXiv page View PDF GitHub 2 Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2410.16703

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 19

Browse 19 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.16703 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.16703 in a Space README.md to link it from this page.