I am currently a Masterβs student in Data Science and Machine Learning at National University of Singapore (NUS). I completed my undergraduate studies at Shanghai Jiao Tong University (SJTU), majoring in Information Security (IEEE Honor Class). My work spans LLM agents, agentic RAG, MCP/tool-use systems, recommender systems, computer vision, and multimodal understanding, with a focus on building deployable AI systems from research prototypes.
π₯ News
- 2026.05 β Built a campus-oriented LLM Agent and MCP tool ecosystem, including SJTU Agent, Shuiyuan MCP, and Treehole MCP.
- 2025.08 β Started Masterβs program in Data Science and Machine Learning at NUS.
- 2024.07 β One paper accepted to ECCV 2024: HIMO benchmark for human-object interaction.
- 2024.06 β π Third Place in CVPR 2024 Ego-EXO4D Challenge (Body Pose Track).
- 2024.02 β One paper accepted to CVPR 2024: Inter-X dataset for human-human interaction analysis.
π Publications
Inter-X: Towards Versatile Human-Human Interaction Analysis
Liang Xu, Xintao Lv, Yichao Yan, Xin Jin, Shuwen Wu, Congsheng Xu, Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, Yunhui Liu, Wenjun Zeng, Xiaokang Yang.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[paper] [project] [doi]
The largest dual-human interaction dataset to date (~11K sequences, 8.1M+ frames), featuring SMPL-X parameters, skeleton sequences, body-part-level textual descriptions, interaction order, relationship, and personality annotations. Supports 8 downstream tasks including text-to-motion, reaction generation, and motion captioning.
HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects
Xintao Lv, Liang Xu, Yichao Yan, Xin Jin, Congsheng Xu, Shuwen Wu, Yizhou Zhou, Yifan Liu, Lincheng Li, Mengxiao Bi, Wenjun Zeng, Xiaokang Yang.
European Conference on Computer Vision (ECCV), 2024.
[paper] [project] [doi]
First large-scale full-body human-multi-object interaction benchmark (3.3K sequences, 4.08M frames, 53 object types). Proposes a dual-branch diffusion model with Mutual Interaction Module and autoregressive generation pipeline for fine-grained temporal control in HOI synthesis.
π» Projects
Campus Agent and MCP Tool Ecosystem
Personal / Open-source Agent Engineering Project | 2026.05
Built a campus-oriented LLM agent tool ecosystem around sjtu-agent, shuiyuan-mcp, and ykst-treehole-mcp. Upgraded SJTU Agent from a fixed built-in tool list into an extensible local agent runtime with dynamic MCP tool discovery, stdio/SSE/streamable HTTP transports, short-lived MCP sessions, OpenAI/Anthropic streaming tool loops, and SKILL.md prompt injection shared by CLI, Web SSE, Telegram, Feishu, WeChat, and reminder daemons. Implemented Shuiyuan MCP for Discourse/SSO-cookie workflows with 25 tools and 10 resources, and reverse-engineered Treehole's gRPC-Web/protobuf protocol to expose 51 MCP tools. Added pinned install flows, local-session secret safeguards, and explicit write gates such as confirm: true.
MemeSense: Structured Textual Explanations for Meme Interpretation
NUS CS4248 β Multimodal Understanding | 2026.03 - 2026.04
Built an interpretable meme-understanding pipeline using LLaVA-1.5-7B + QLoRA: EasyOCR adaptive preprocessing β GPT-4o structured silver labels β cultural-context need classification β BM25 RAG retrieval β caption generation. Designed selective knowledge injection so the model retrieves external cultural context only when needed, outperforming always-retrieve and never-retrieve baselines. In a 30-sample human evaluation, 83.3% of generated explanations were preferred over original dataset captions; token F1, ROUGE-L, and BERTScore were used for multi-input ablations.
CoLLM: Collaborative LLM for Recommendation β Reproduction & Extension
NUS DSA5106 β Scalable Distributed Computing | 2026.04 - 2026.05
Reproduced and extended the CoLLM framework for injecting collaborative filtering signals into LLM-based recommendation. Migrated the codebase from Vicuna/LLaMA to Qwen2-7B + QLoRA (4-bit NF4, double quantization), fixed AMP NaN loss, missing <unk> token handling, and collaborative token embedding alignment. Two-stage training first taught the LLM to recommend from user-history text, then froze LLM + LoRA and trained a GELU MLP to project MF user/item embeddings into the LLM hidden space as soft collaborative tokens. On MovieLens-1M OOD, AUC improved from 0.678 to 0.691, above the MF-only baseline of 0.674.
π 3D Human Body Motion Estimation β Ego-EXO4D Challenge (3rd Place)
CVPR 2024 Workshop | 2024.04 - 2024.05
Designed a level-wise Transformer network to predict 17 body keypoint positions in 3D from egocentric and exocentric camera footage, placing third in the CVPR 2024 Ego-EXO4D Challenge body pose track as team SJTU-SEIEE. Used a dual-branch architecture: a shallow encoder captures local motion patterns while a deep 32-layer encoder models long-range temporal context. Implemented training, evaluation, sliding-window inference, WandB tracking, and coordinate transformation between global and Aria camera systems; evaluated across 9 real-world activity scenarios using MPJPE and MPJVE.

SVD-InST: Image Style Transfer
SJTU AI3603 Final Project | 2023.11 - 2023.12
Fine-tuned Stable Diffusion 1.4 to translate real photos into the Nine-Colored Mural artistic style. Combined Textual Inversion for learning a style token with Singular Value Decomposition fine-tuning (SVDiff), freezing most model weights and training only 3.7M parameters (0.25% of the full model). Achieved the best FID score among compared methods: FID 125.1, outperforming InST (127.5), CycleGAN (178.3), StyTR-2 (171.3), and fast-style-transfer (172.7), with LPIPS 0.54 on par with InST.
Recommendation System & LLM Instruction Tuning
SJTU NIS4301 | 2024.03 - 2024.05
Built a recommendation system that integrates graph neural network and tabular modeling approaches, including GraphConv, GCN, GAT, FT-Transformer, and TabNet. Improved collaborative filtering performance by combining GraphSage + SkipGNN, and explored instruction tuning with GPT-J and GPT-3 for movie recommendation tasks to evaluate the potential of LLM-based recommenders.
Spoken Natural Language Understanding
SJTU CS3602 NLP Final Project | 2024.11 - 2024.12
Extracted structured semantic triples (act, slot type, value) from noisy Chinese ASR transcripts for in-car navigation commands. Built a supervised BERT-RNN sequence-labeling pipeline with BIO tagging, POI-based data augmentation, and Jieba word-chunk fusion, achieving F1 82.75. Also evaluated zero-shot, few-shot, and chain-of-thought prompting with DeepSeek Chat as a generative semantic parsing baseline, focusing on output controllability and hallucination analysis.

SJTU CS3324
Designed an architecture combining CLIP and GAN to predict visual saliency maps from text cues. Trained on an eye-tracking dataset where subjects received text prompts during visual experiments.
Evo β Paleontological Evolution Visualization
Personal Project | 2026
Built an interactive web application for exploring the history of life on Earth through three synchronized views: a paleogeographic map, a phylogenetic tree of life, and a draggable geological timeline. The app renders continental reconstructions across Phanerozoic periods, temporal filtering on the tree of life, and 13,600+ fossil occurrence records from the Paleobiology Database. Implemented with React + TypeScript, Leaflet/react-leaflet, D3.js, custom SVG timeline interactions, and Zustand state management.
SG Rent β Singapore Rental Housing Recommendation System
Personal Project | 2026
Developed a browser-based rental recommendation tool for Singapore housing search. Users enter up to five commuting destinations and receive ranked HDB and condo properties based on MRT commute time, price, and amenities. The system includes Dijkstra shortest-path routing across MRT/LRT lines, Leaflet map visualization with property and station overlays, weighted ranking, property filters, and a static dataset of 1,535 properties and 163 MRT/LRT stations. Built as a pure static frontend with React + TypeScript, requiring no backend at runtime.
Claude Code Skill β OpenClaw AgentSkill
Personal Tooling Project | 2026
Created an OpenClaw-compatible AgentSkill that routes coding tasks through the Claude Code CLI, covering implementation, refactoring, debugging, code review, and project scaffolding workflows. The skill defines Windows-first installation guidance, one-shot and interactive invocation patterns, fallback behavior, subagent parallelism for independent tasks, and prompt-writing conventions for reliable code execution.
Solar β Interactive Solar System Trajectory Visualizer
Personal Project | 2025
An interactive 3D solar system simulation built with React + Three.js + Vite + TypeScript. Features real-time planetary trajectory rendering in both 2D and 3D views, asteroid catalog browsing with chunked lazy loading, conjunction event detection, split-screen reference frame comparison, URL-based state persistence, and JSON/CSV data export. Supports NEO distance heatmaps and custom celestial body group management.
π Experiences
AI Institute, School of Computer Science, Shanghai Jiao Tong University
Undergraduate Research Intern | July 2023 - May 2024
Supervisor: Prof. Xiaokang Yang, Prof. Yichao Yan
Research Focus: Computer Vision, Embodied Intelligence, Human Motion Modeling
- MoCap Data & Tooling: Managed motion-capture data collection and built processing, visualization, slicing, annotation, and calibration tools for the Inter-X and HIMO benchmarks. Resolved technical inconsistencies between Noitom PNS hand-motion-capture gloves and OptiTrack body MoCap systems through manual offset correction and calibration.
- Model Training & Validation: Trained and evaluated diffusion-based text-to-motion and human-object interaction generation models for dataset validation, covering Inter-X text-to-motion pipelines and HIMO 2-object / 3-object HOI generation workflows. Monitored FID, R-Precision, Matching Score, and Diversity metrics with customized preprocessing pipelines.
- Virtual Scene Development: Independently built an Unreal Engine C++/Blueprint + Blender virtual rendering pipeline and generated high-fidelity motion visualizations across 4 scenes Γ 6 viewpoints.
- Research Contribution: Contributed to two papers published at CVPR 2024 (Inter-X) and ECCV 2024 (HIMO), and placed third in the CVPR 2024 Ego-Exo4D body pose challenge.
Digital Intelligence Institute, China Pacific Insurance (CPIC)
Algorithm Intern | Feb 2025 - July 2025
Supervisor: Hui Wang
Research Focus: LLM Agent, Agentic RAG, Enterprise Knowledge Base, Database Troubleshooting Assistant
- OceanBase Agentic RAG: Helped deliver an internal OceanBase documentation QA and troubleshooting workbench for database administrators and data-management users, built with FastAPI, Milvus dense/sparse vectors, scalar metadata filters, reranking, and DeepSeek's OpenAI-compatible API.
- Agent Architecture: Implemented and integrated QAAgent and TroubleshootAgent capabilities including query routing, version resolution, specialized retrieval, multi-hop retrieval, lightweight doc graph expansion, evidence verification, clarifying questions, session memory, and observable Agent Trace metadata.
- Safety & Evaluation: Designed a read-only diagnostic-tool contract for Phase 1 to avoid production SQL execution, parameter changes, or service restarts. Added agentic evaluation coverage beyond Recall/MRR, including version-resolution accuracy, citation coverage, troubleshooting acceptability, and high-risk suggestion checks; unit test suite covered 37 cases.
π Honors and Awards
- Third Place, CVPR 2024 Ego-EXO4D Challenge β Body Pose Track (2024)
- First-Class Cyber-Security Scholarship, Shanghai Jiao Tong University (2023)
- Excellent League Member of SJTU (2022)
- Outstanding Freshman Award, Shanghai Jiao Tong University (2021)
π Educations
- 2025.08 - Present, Master of Science, Data Science and Machine Learning, National University of Singapore, Singapore.
- Selected Courses: Machine Learning, Scalable Distributed Computing, Data Management and Retrieval
- 2021.09 - 2025.06, Bachelor of Engineering, Information Security (IEEE Honor Class), Shanghai Jiao Tong University, Shanghai, China. GPA: 85.5/100
- Selected Courses: Algorithm Design and Analysis (A), C++ Program Design Practice (A+), Unreal Engine Program Design (A+), Artificial Intelligence Principles (A), Natural Language Processing (A)
- 2018.09 - 2021.06, Shanghai High School, Shanghai, China.
π Skills
- Agent / LLM: Agentic RAG, MCP / Tool Calling, Multi-Agent Workflow, OpenAI / Anthropic APIs, Prompt Engineering, LLM Evaluation, LoRA / QLoRA.
- Engineering: Python, FastAPI, PyTorch, Hugging Face Transformers, Milvus / Vector DB, Docker / Kubernetes, SQL, TypeScript / Node.js, Linux, Shell, Git.
- Research & Systems: Multimodal Understanding, Computer Vision, Human Motion Modeling, Recommendation Systems, Retrieval, Evidence Verification, Session Memory, Agent Trace.
π Links
- Evo β Paleontological Evolution Visualization β Explore fossils, continental reconstructions, phylogenetic relationships, and geological time through synchronized interactive views.
- SG Rent β Singapore Rental Housing Recommendation System β Rank Singapore HDB and condo rentals by MRT commute time, price, amenities, and search filters.
- Solar β Interactive Solar System Trajectory Visualizer β A 3D solar system simulation with React, Three.js, and TypeScript. Explore planetary orbits, browse asteroid catalogs, and detect conjunction events.
- Campus Agent and MCP Tool Ecosystem β Local-first campus agent runtime with dynamic MCP tools, Skills, and community automation integrations.