aiming-lab/MetaClaw - JCJC错别字

<div align="center"> <img src="assets/new_logo.png" alt="MetaClaw" width="600"> <br/> # Just talk to your agent — it learns and *EVOLVES*. <p> <a href="https://github.com/aiming-lab/MetaClaw"><img src="https://img.shields.io/badge/github-MetaClaw-181717?style=flat&labelColor=555&logo=github&logoColor=white" alt="GitHub"></a> <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-green?style=flat&labelColor=555" alt="License MIT"></a> <img src="https://img.shields.io/badge/⚡_Fully_Async-yellow?style=flat&labelColor=555" alt="Fully Async" /> <img src="https://img.shields.io/badge/☁️_No_GPU_Cluster-blue?style=flat&labelColor=555" alt="No GPU Cluster" /> <img src="https://img.shields.io/badge/🛠️_Skill_Evolution-orange?style=flat&labelColor=555" alt="Skill Evolution" /> <img src="https://img.shields.io/badge/🚀_One--Click_Deploy-green?style=flat&labelColor=555" alt="One-Click Deploy" /> </p> <br/> [🇨🇳 中文](./assets/README_ZH.md) • [🇯🇵 日本語](./assets/README_JA.md) • [🇰🇷 한국어](./assets/README_KO.md) • [🇫🇷 Français](./assets/README_FR.md) • [🇩🇪 Deutsch](./assets/README_DE.md) • [🇪🇸 Español](./assets/README_ES.md) <br/> </div> --- <div align="center"> ### Two commands. That's it. </div> ```bash metaclaw setup # one-time config wizard metaclaw start # skills on, OpenClaw wired — ready to chat metaclaw start --mode rl # optional: + live RL training via Tinker ``` <div align="center"> <img src="assets/metaclaw.gif" alt="MetaClaw demo" width="700"> </div> --- ## 🔥 News - **[03/11/2026]** **v0.2** — One-click deployment via `metaclaw` CLI. Skills enabled by default, RL is now opt-in. - **[03/09/2026]** We release **MetaClaw** — Just talk to your agent and let it evolve automatically. **NO** GPU deployment required; just plug into the **API**. --- ## 🎥 Demo https://github.com/user-attachments/assets/d86a41a8-4181-4e3a-af0e-dc453a6b8594 --- ## 📖 Overview **MetaClaw turns live conversations into continuous training data — automatically.** Just talk to your agent as usual, and MetaClaw handles the learning loop behind the scenes. It places your model behind an OpenAI-compatible proxy that intercepts interactions from OpenClaw, injects relevant skills at each step, and can optionally perform continuous fine-tuning through Tinker Cloud RL. Updated weights are hot-swapped seamlessly without interrupting the service. There is no need to maintain a dedicated GPU cluster. MetaClaw works with any OpenAI-compatible LLM API out of the box, and optionally integrates **Kimi-K2.5** (1T MoE) via [Tinker](https://www.thinkingmachines.ai/tinker/) for cloud-based LoRA training. ## 🤖 Key Features ### **One-click deployment** Configure once with `metaclaw setup`, then `metaclaw start` brings up the proxy, injects skills, and wires OpenClaw automatically. No manual shell scripts needed. ### **Two operating modes** | Mode | Default | What it does | |------|---------|--------------| | `skills_only` | ✅ | Proxy → your LLM API. Skills injected, auto-summarized after each session. No GPU/Tinker required. | | `rl` | off | Proxy → Tinker cloud RL. Full training loop with PRM scoring and skill evolution from failures. | ### **Skill injection** At every turn, MetaClaw retrieves the most relevant skill instructions and injects them into the agent's system prompt. Immediate behavior improvement without retraining. ### **Automatic skill summarization** After each conversation, the same LLM you're already using analyzes the session and distills new skills automatically. With RL enabled, a dedicated judge model extracts skills from failed episodes. ### **No GPU cluster required** In `skills_only` mode, only a network connection is needed. RL training is offloaded to Tinker cloud. ### **Two learning modes** MetaClaw supports both: - **RL (GRPO)** for learning from implicit feedback signals - **On-Policy Distillation (OPD)** for distilling a larger teacher model into the student on-policy In OPD mode, the student generates responses as usual, and a teacher model provides per-token log-probabilities on those same responses. The teacher logprobs are passed to the loss function (e.g., `cispo`) so the student learns to match the teacher's distribution. The teacher must be served behind an OpenAI-compatible `/v1/completions` endpoint (e.g., vLLM, SGLang). ### **Asynchronous by design** Serving, reward modeling, and training are fully decoupled. The agent continues responding while scoring and optimization run in parallel. --- ## 🚀 Quick Start ### 1. Install ```bash pip install -e . # skills_only mode (lightweight) pip install -e ".[rl]" # + RL training support (torch, transformers, tinker) pip install -e ".[evolve]" # + skill evolution via OpenAI-compatible LLM ``` ### 2. Configure ```bash metaclaw setup ``` The interactive wizard will ask you to choose your LLM provider (Kimi, Qwen, MiniMax, or custom), enter your API key, and optionally enable RL training. ### 3. Start ```bash metaclaw start ``` That's it. MetaClaw starts the proxy, automatically configures OpenClaw to use it, and restarts the gateway. Open OpenClaw and start chatting — skills are injected at every turn, and the session is automatically summarized into new skills when you're done. --- ## 🛠️ CLI Reference ``` metaclaw setup # Interactive first-time configuration wizard metaclaw start # Start MetaClaw (proxy + optional RL) metaclaw start --mode rl # Force RL mode for this session metaclaw stop # Stop a running MetaClaw instance metaclaw status # Check proxy health and running mode metaclaw config show # View current configuration metaclaw config KEY VALUE # Set a config value ``` **Common config keys:** ```bash metaclaw config rl.enabled true # Enable RL training metaclaw config rl.tinker_api_key sk-... # Set Tinker key metaclaw config skills.auto_evolve false # Disable auto skill summarization metaclaw config proxy.port 31000 # Change proxy port ``` --- ## ⚙️ Configuration Configuration lives in `~/.metaclaw/config.yaml`, created by `metaclaw setup`. ```yaml mode: skills_only # "skills_only" | "rl" llm: provider: kimi # kimi | qwen | openai | minimax | custom model_id: moonshotai/Kimi-K2.5 api_base: https://api.moonshot.cn/v1 api_key: sk-... proxy: port: 30000 skills: enabled: true dir: ~/.metaclaw/skills # your skill library retrieval_mode: template # template | embedding top_k: 6 task_specific_top_k: 10 # cap task-specific skills (default 10) auto_evolve: true # auto-summarize skills after each session rl: enabled: false # set to true to enable RL training model: moonshotai/Kimi-K2.5 tinker_api_key: "" prm_url: https://api.openai.com/v1 prm_model: gpt-5.2 prm_api_key: "" lora_rank: 32 batch_size: 4 resume_from_ckpt: "" # optional checkpoint path to resume training evolver_api_base: "" # leave empty to reuse llm.api_base evolver_api_key: "" evolver_model: gpt-5.2 opd: enabled: false # set to true to enable OPD (teacher distillation) teacher_url: "" # teacher model base URL (OpenAI-compatible /v1/completions) teacher_model: "" # teacher model name (e.g., Qwen/Qwen3-32B) teacher_api_key: "" # teacher model API key kl_penalty_coef: 1.0 # KL penalty coefficient for OPD max_context_tokens: 20000 # prompt token cap before truncation ``` --- ## 💪 Skills Skills are short Markdown instructions injected into the agent's system prompt at each turn. They live in your skills directory (`~/.metaclaw/skills/` by default), organized as individual `SKILL.md` files. **Skill auto-summarization** runs after each conversation. The LLM you configured analyzes what happened and generates new skills automatically. No manual curation needed — the library grows with your usage. To pre-load the built-in skill bank (40+ skills across coding, security, agentic tasks, etc.): ```bash cp -r memory_data/skills/* ~/.metaclaw/skills/ ``` --- ## 🔬 Advanced: RL Mode Enable RL training to continuously fine-tune the model from live conversations: ```bash metaclaw config rl.enabled true metaclaw config rl.tinker_api_key sk-... metaclaw config rl.prm_url https://api.openai.com/v1 metaclaw config rl.prm_api_key sk-... metaclaw start ``` In RL mode: - Each conversation turn is tokenized and submitted as a training sample - A judge LLM (PRM) scores responses asynchronously - Tinker cloud runs LoRA fine-tuning; updated weights are hot-swapped every `batch_size` samples - A dedicated evolver LLM extracts new skills from failed episodes **Programmatic rollout** (no OpenClaw TUI needed): set `openclaw_env_data_dir` to a directory of JSONL task files: ```json {"task_id": "task_1", "instruction": "Register the webhook at https://example.com/hook"} ``` --- ## 🔬 Advanced: OPD Mode On-Policy Distillation (OPD) lets you distill a larger teacher model into the student while it trains on-policy. The student generates responses as usual; the teacher provides per-token log-probabilities on those same responses. A KL penalty steers the student toward the teacher's distribution. ```bash metaclaw config opd.enabled true metaclaw config opd.teacher_url http://localhost:8082/v1 metaclaw config opd.teacher_model Qwen/Qwen3-32B metaclaw config opd.kl_penalty_coef 1.0 metaclaw start --mode rl ``` The teacher must be served behind an OpenAI-compatible `/v1/completions` endpoint (e.g., vLLM, SGLang). OPD can be combined with PRM scoring — both run asynchronously. See `examples/run_conversation_opd.py` for a programmatic example and `scripts/run_openclaw_tinker_opd.sh` for a ready-made launch script. --- ## 📚 Citation ```bibtex @misc{xia2026metaclaw, author = {Xia, Peng and Chen, Jianwen and Yang, Xinyu and Tu, Haoqin and Han, Siwei and Qiu, Shi and Zheng, Zeyu and Xie, Cihang and Yao, Huaxiu}, title = {MetaClaw}, year = {2026}, organization = {GitHub}, url = {https://github.com/aiming-lab/MetaClaw}, } ``` --- ## 🙏 Acknowledgements MetaClaw builds on top of the following open-source projects: - [OpenClaw](https://openclaw.ai) – the core agent framework. - [SkillRL](https://github.com/aiming-lab/SkillRL) – our skill-augmented RL framework. - [Tinker](https://www.thinkingmachines.ai/tinker/) – used for online RL training. - [OpenClaw-RL](https://github.com/Gen-Verse/OpenClaw-RL) – inspiration for our RL design. - [awesome-openclaw-skills](https://github.com/VoltAgent/awesome-openclaw-skills) – provides the foundation for our skill bank. --- ## 📄 License This project is licensed under the [MIT License](LICENSE).

GitHub 链接	https://github.com/aiming-lab/MetaClaw
项目简介	Just talk to your agent — it learns and EVOLVES.
创建时间	2026-03-09
更新时间	2026-03-13

微信客服