BayJarvis: Forging Ahead at the Nexus of Predictive and Generative AI - Crafting J.A.R.V.I.S, from AI Model to AI System.
At BayJarvis, we're deeply involved in the ever-changing world of Artificial Intelligence. We're working hard to contribute to the future of this field. Our goal is to go beyond standard AI solutions by developing J.A.R.V.I.S, an autonomous agent that we hope will broaden what's possible in AI.
Our team's background includes experience in time-series forecasting and strategies aimed at maximizing returns. We focuses on utilizing AI to support autonomous decision-making processes. We have experience applying AI in data environments, such as financial and property management data, complemented by the use of Large Language Models (LLMs). Our knowledge extends to areas like deep learning, ensemble tree methods, and reinforcement learning, all contributing to the autonomy of our AI systems.
In the realm of advertising recommendation systems, BayJarvis uses its expertise to delve into user behavior profiling, aiming to uncover the intricate actions and patterns of users. With the integration of advanced Language Models, J.A.R.V.I.S is beginning to venture into code generation, content creation, and proactive debugging. We view these advancements as important progress, enhancing development productivity and moving toward fully autonomous digital platforms.
Our understanding of search algorithms guides our autonomous agent in finding optimal solutions, striving for efficiency and precision in everything we do. At BayJarvis, our vision isn't just about imagining the future; it's about actively creating it.
Our ecosystem of 3 packages and 5 repos dedicated to exloring and extending autonomous agents and applying the capabilities of LLMs as J.A.V.I.S. as possible.
Latest Blogs
Tags: [paper] 66 · [llm] 57 · [finetuning] 15 · [prompt] 14 · [autonomous-agent] 12 · [reinforcement-learning] 12 · [optimization] 9 · [rlhf] 8 · [mistral] 7 · [multi-agent] 7 · [openai] 7 · [peft] 7 · [quantization] 7 · [transformer] 7 · [llama2] 6 · [lora] 6 · [multistep-reasoning] 6 · [network-architecture] 6 · [rag] 6 · [rlaif] 6 · [chatgpt] 5 · [google] 5 · [huggingface] 5 · [mixture-of-experts] 5 · [zephyr] 5 · [agent] 4 · [deepmind] 4 · [recommender] 4 · [socratic] 4 · [survey] 4 · [system2] 4 · [trl] 4 · [advertising] 3 · [anthropic] 3 · [autogen] 3 · [continual-learning] 3 · [dpo] 3 · [foundation-model] 3 · [inference] 3 · [interpretability] 3 · [ppo] 3 · [routing] 3 · [safety] 3 · [scaling] 3 · [tenyx] 3 · [transformers] 3 · [vision] 3 · [alignment] 2 · [alphazero] 2 · [cnn] 2 · [cognitive-architecture] 2 · [compiler] 2 · [ddpo] 2 · [diffusion] 2 · [discretization] 2 · [dspy] 2 · [forecasting] 2 · [gptq] 2 · [langchain] 2 · [mamba] 2 · [meta] 2 · [mllm] 2 · [mlm] 2 · [multi-modal] 2 · [polydisciplinary] 2 · [ranking] 2 · [react] 2 · [reflexion] 2 · [reinforced-self-training] 2 · [rnn] 2 · [s4] 2 · [search] 2 · [sequence-modeling] 2 · [state-space-model] 2 · [structured-state-spaces] 2 · [theory] 2 · [time-series] 2 · [adaptive-agent] 1 · [apple] 1 · [autogpt] 1 · [autotrain] 1 · [benchmark] 1 · [bradley-terry] 1 · [chainlit] 1 · [cicero] 1 · [clip] 1 · [coala] 1 · [code-generation] 1 · [compositionality] 1 · [compression] 1 · [ctransformers] 1 · [diffusers] 1 · [diplomacy] 1 · [em-algorithm] 1 · [evolutionary-optimization] 1 · [fair] 1 · [feature-interactions-learning] 1 · [galore] 1 · [gan] 1 · [gating-network] 1 · [gemma] 1 · [geometry] 1 · [hiformer] 1 · [in-context-rl] 1 · [jax] 1 · [lamarckian-mutation] 1 · [langevin-dynamics] 1 · [learning-rate-schedule] 1 · [legendre-polynomials] 1 · [llama] 1 · [llava] 1 · [low-rank] 1 · [ludwig] 1 · [mcts] 1 · [memgpt] 1 · [meta-learning] 1 · [meta-rl] 1 · [metric-learning] 1 · [microsoft] 1 · [mm1] 1 · [model-merging] 1 · [moe] 1 · [multi-task] 1 · [multihop-retrieval] 1 · [nework-architecture-search] 1 · [nvidia] 1 · [orca] 1 · [os] 1 · [paged-attention] 1 · [patch] 1 · [plm] 1 · [preferece-learning] 1 · [preference-learning] 1 · [representation-engineering] 1 · [rest] 1 · [rest-em] 1 · [sakana] 1 · [sampling] 1 · [signal-propogation-theory] 1 · [softmax-loss] 1 · [spline-theory] 1 · [superposition] 1 · [toxicity-detection] 1 · [transparency] 1 · [unlearning] 1 · [vllm] 1 · [wgan] 1 · [withmartian] 1 · [world-model] 1
[paper] 16th April 2024 Faith and Fate: Limits of Transformers on Compositionality
Transformer language models like GPT-4 and ChatGPT have demonstrated remarkable capabilities across a wide range of tasks, sparking both admiration and concern about their potential impact. However, a recent paper titled "Faith and Fate: Limits of Transformers on Compositionality" by researchers from Allen Institute for AI, University of Washington, University of Southern California and University of Chicago takes a critical look at the limitations of these models in tasks requiring multi-step compositional reasoning.
[paper] 13th April 2024 Voyager: An Open-Ended Embodied Agent with Large Language Models
Voyager is the first LLM (Large Language Models) powered embodied lifelong learning agent that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. The agent is designed to operate in the Minecraft environment, a popular open-ended game that offers a rich set of tasks and interactions.
[paper] 13th April 2024 Reflexion: Language Agents with Verbal Reinforcement Learning
Reflexion is a novel framework proposed by Shinn et al. for reinforcing language agents through linguistic feedback rather than traditional weight updates. The key idea is to have agents verbally reflect on feedback signals, maintain the reflective text in an episodic memory buffer, and use this to guide better decision making in subsequent trials.
[paper] 6th April 2024 Scaling Laws for Fine-Grained Mixture of Experts
Mixture of Experts (MoE) models have emerged as a primary solution for reducing the computational cost of Large Language Models (LLMs). "Scaling Laws for Fine-Grained Mixture of Experts", Jakub Krajewski, Jan Ludziejewski, and their colleagues from the University of Warsaw and IDEAS NCBR analyze the scaling properties of MoE models, incorporating an expanded range of variables.
[paper] 4th April 2024 FrugalGPT: Making Large Language Models Affordable and Efficient
Large Language Models (LLMs) like GPT-4, ChatGPT, and J1-Jumbo have revolutionized natural language processing, enabling unprecedented performance on a wide range of tasks. However, the high cost of querying these LLM APIs is a major barrier to their widespread adoption, especially for high-throughput applications.
[paper] 4th April 2024 ROUTERBENCH: A Benchmark for Multi-LLM Routing System
Large language models (LLMs) have demonstrated impressive capabilities across a wide range of applications. However, no single model can optimally address all tasks, especially when considering the trade-off between performance and cost. This has led to the development of LLM routing systems that leverage the strengths of various models.
[paper] 3rd April 2024 Toy Models of Superposition
Neural networks often exhibit a puzzling phenomenon called "polysemanticity" where many unrelated concepts are packed into a single neuron, making interpretability challenging. This paper provides toy models to understand polysemanticity as a result of models storing additional sparse features in "superposition". Key findings include:
[paper] 1st April 2024 Cognitive Architectures for Language Agents
Cognitive Architectures for Language Agents: A Framework for Building Intelligent Language Models. Large language models (LLMs) have achieved impressive results on many natural language tasks. However, to build truly intelligent agents, we need to equip LLMs with additional capabilities like memory, reasoning, learning, and interacting with the environment. A new paper titled "Cognitive Architectures for Language Agents" proposes a framework called CoALA to guide the development of such language agents.
[paper] 31st March 2024 Retrieval-Augmented Generation for Large Language Models: A Survey
Retrieval-Augmented Generation (RAG) has emerged as a promising solution to enhance Large Language Models (LLMs) by incorporating knowledge from external databases. This survey paper provides a comprehensive examination of the progression of RAG paradigms, including Naive RAG, Advanced RAG, and Modular RAG.
[paper] 26th March 2024 LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
Large Language Models (LLMs) like ChatGPT have transformed numerous fields by leveraging their extensive reasoning and generalization capabilities. However, as the complexity of prompts increases, with techniques like chain-of-thought (CoT) and in-context learning (ICL) becoming more prevalent, the computational demands skyrocket. This paper introduces LLMLingua, a sophisticated prompt compression method designed to mitigate these challenges. By compressing prompts into a more compact form without significant loss of semantic integrity, LLMLingua enables faster inference and reduced computational costs, promising up to 20x compression rates with minimal performance degradation.
[paper] 25th March 2024 Efficient Memory Management for Large Language Model Serving with PagedAttention
The paper introduces a novel approach to optimize memory usage in serving Large Language Models (LLMs) through a method called PagedAttention, inspired by virtual memory and paging techniques in operating systems. This method addresses the significant memory waste in existing systems due to inefficient handling of key-value (KV) cache memory, which is crucial for the performance of LLMs.
[paper] 24th March 2024 Evolutionary Optimization of Model Merging Recipes
The field of large language models (LLMs) has witnessed a paradigm shift with the advent of model merging, a novel approach that combines multiple LLMs into a unified architecture without additional training, offering a cost-effective strategy for new model development. This technique has sparked a surge in experimentation due to its potential to democratize the development of foundational models. However, the reliance on human intuition and domain knowledge in model merging has been a limiting factor, calling for a more systematic method to explore new model combinations.
[paper] 21st March 2024 GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Training Large Language Models (LLMs) presents significant memory challenges predominantly due to the growing size of weights and optimizer states. While common memory-reduction approaches, such as Low-Rank Adaptation (LoRA), have been employed to mitigate these challenges, they typically underperform training with full-rank weights in both pre-training and fine-tuning stages. This limitation arises because these approaches restrict the parameter search to a low-rank subspace, altering training dynamics and potentially requiring a full-rank warm start.
[paper] 20th March 2024 OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
A team of researchers has released OpenMoE, a series of open-source Mixture-of-Experts (MoE) based large language models ranging from 650M to 34B parameters. Their work provides valuable insights into training MoE models and analyzing their behavior. Here are some key takeaways:
[paper] 19th March 2024 Training Language Model Agents without Modifying Language Models
Reframing Large Language Models (LLMs) as agents has ushered in a new paradigm of automation. Researchers and practitioners have increasingly been using these models as agents to automate complex tasks using specialized functions. However, integrating useful functions into LLM agents often requires manual effort and extensive iterations, which is time-consuming and inefficient. Inspired by the analogy of humans continuously forging tools to adapt to tasks, this paper introduces a novel approach to train LLM agents by forging their functions, treating them as learnable 'agent parameters', without modifying the LLM weights. This paradigm, termed 'Agent Training', involves updating the agent's functions to maximize task-solving ability, offering a promising avenue for developing specialized LLM agents efficiently.
Latest News
9th March 2024 #
BitNet Transformer: Scaling 1-bit Transformers for Large Language Models
BitNet Transformer, a architecture that scales 1-bit Transformers for large language models. BitNet Transformer achieves competitive performance while substantially reducing memory footprint and energy consumption compared to state-of-the-art 8-bit quantization methods and FP16 Transformer baselines.
Key Features:
- BitLinear: A drop-in replacement for the nn.Linear layer in PyTorch, enabling the training of 1-bit weights from scratch.
- Scalable and Stable: BitNet Transformer is designed to be scalable and stable, capable of handling large language models efficiently.
- Competitive Performance: Achieves competitive results in terms of perplexity and downstream task accuracy compared to baselines.
- Significant Energy Savings: Provides substantial energy cost reductions, especially as the model size scales up.
- Scaling Law: Exhibits a scaling law akin to full-precision Transformers, suggesting its potential for effective scaling to even larger language models.
Availability:
5th December 2023 #
Launch of EcoAssistant: Advancing AutoGen for Superior Code-driven Q&A
EcoAssistant, utilizing AutoGen for enhanced code-driven question answering and leveraging advanced AI techniques for iterative code refinement and an assistant hierarchy to manage varying levels of query complexity.
Project Highlights:
- Iterative Code Refinement: Employs sophisticated algorithms to refine responses for increased accuracy.
- Assistant Hierarchy: Structured system to handle queries at different complexity levels, ensuring precise and relevant answers.
- Use of Past Queries: Incorporates successful past queries to improve response generation and efficiency.
Availability: The documentation and code available on GitHub. For further details, refer to the project's blog post:Implementing EcoAssistant: Leveraging AutoGen for Enhanced Code-driven Question Answering.
1st December 2023 #
Release of Zephyr's Mistral DPO Training Framework
The Zephyr's Mistral DPO training framework, based on distilled direct preference optimization (dDPO) for language model alignment, has been released. It introduces an efficient method to fine-tune language models using Direct Preference Optimization, focusing on human value alignment. The framework features robust configuration options, specialized dataset handling, and a tailored training process, all designed to enhance model responsiveness and relevance. Mistral DPO stands out as a pivotal advancement in AI, aiming for models that not only understand language but also grasp human intentions.
Details on GitHub: Zephyr dDPO Training and Blog: Harnessing Zephyr's Breeze: DPO Training on Mistral-7B-GPTQ for Language Model Alignment.
28th November 2023 #
Release Zephyr 7B GPTQ Model Fine-tuning Framework with 4-Bit Quantization
Zephyr's new framework enhances GPT-Q model performance through fine-tuning and 4-bit quantization, tailored for efficient chatbot interactions.
Availability: The framework is open for exploration and contribution on BayJarvis llm github repo and BayJarvis Blog, offering a new avenue for enhancing chatbot solutions.
Framework Highlights:
-
Fine-tuning Workflow: Utilizes
zephyr_trainer.py
for data preparation and model training, incorporating LoRA modules and quantization for optimized performance. -
Efficiency & Adaptability: Implements gradient checkpointing and precise training arguments, ensuring responsive and effective model behavior.
-
Inference Capability: Demonstrated by
finetuned_inference.py
, the model delivers real-time, context-aware responses, ideal for support scenarios.
18th November 2023 #
nanoDPO v0.1 Release, a pioneering implementation of Direct Preference Optimization (DPO) for time series data, inspired by "Direct Preference Optimization: Your Language Model is Secretly a Reward Model," the cutting-edge DPO approach in language model fine-tuning.
Key Features:
- Causal Transformer and LSTM Integration: Incorporating Causal Transformer and LSTM models to handle time series data effectively.
- DPO Algorithm Implementation: Direct Preference Optimization for nuanced understanding and prediction of time series trends.
- DPO and Multi-Class Trainers: Two distinct training models catering to different time series analysis requirements.
- Customizable Training Configurations: Enhanced flexibility with adjustable learning rates, batch sizes, and model specifications.
- Robust performance metrics including accuracy and loss visualizations.
- Compatibility with popular machine learning tools like PyTorch and wandb.
Documentation:
For more information, visit the GitHub README and the detailed Documentation.
6th November 2023 #
nanoPPO v0.15 Release, bringing significant enhancements to the Proximal Policy Optimization (PPO) algorithm tailored for reinforcement learning tasks.
What's New in v0.15?
- Actor/Critic Causal Attention Policy: A new policy framework to enhance decision-making processes.
- Custom Learning Rate Scheduler: Introducing a version number and a custom scheduler for fine-tuning the learning rate during agent training.
- Gradient and Weight Inf/Nan Checks: Added safeguards against infinite and NaN values in gradients and weights to improve stability.
- Enhanced Training Mechanism: The training script now utilizes average rewards and includes a new cosine learning rate scheduler for iterative adjustment.
Additional Improvements:
- Debug flag for NAN detection in model parameters.
- Use of
torch.nn.utils.clip_grad_norm_
for gradient clipping.
Documentation:
For a full overview of the new features and improvements, please refer to the GitHub README and the detailed Changelog.
7th October 2023 #
nChain 0.12 Release unveils a Python package specifically crafted for creating LLM bots over a flexible and extensible dataset.
Features & Enhancements:
- Sentence Transformers Embedding: By harnessing the capabilities of
sentence_transformers
,nChain
delivers superior text embeddings. This integration ensures that your textual data is transformed into accurate and high-quality vector representations. - Annoy Index for Embedding Search: With
nChain
, search operations are a breeze, thanks to the integration of the Annoy index. This feature promises swift and precise searches, streamlining the embedding retrieval process. - ArXiv Paper Search Example: To offer a glimpse into the practical potential of
nChain
, we have incorporated an example that demonstrates its prowess in searching through arXiv papers. This hands-on experience reveals the precision and efficiency that is the hallmark ofnChain
.
For an in-depth exploration of this release, we recommend visiting the Github readme and the Github release notes.
19th September 2023 #
nanoPPO 0.13 Release the Proximal Policy Optimization (PPO) algorithm for reinforcement learning is now available. Initially supporting discrete action spaces in v0.1, the latest v0.13 has expanded its support to continuous action spaces, catering to a broader spectrum of applications. To aid users in comprehending the training process, the release is equipped with examples that demonstrate how agents can be trained across different environments. Besides MountainCarContinuous, two unique customized environments, namely PointMass1D and PointMass2D, have been introduced. These are specifically designed to facilitate the convenient testing of PPO agent training. An initial test suite is incorporated to maintain high standards of code quality and ensure consistent functionality. For a comprehensive overview, please refer to the Github readme and the Github release notes.