avatar
Articles
300
Tags
46
Categories
8

Home
Archives
Tags
Categories
About
Yuchen You
Search
Home
Archives
Tags
Categories
About

Yuchen You

1. Boot and System Management Daemons
Updated2025-05-26|operating_system•unix
Boot Process Overview 近年来, boot 的程序逐步从较为复杂的 BIOSs 简化为 UEFI 程序 而最近的系统 采用的是一个 system manager daemon systemd 而非传统的 UNIX init, systemd 通过添加 dependency management 来精简 (streamline) 开机流程, 因为 dependency 机制可以允许并发开机的请求 在 bootstrapping (即 boot) 期间, kernel 会被读入到 mem 中并且开始执行 ![[Pasted image 20250523131345.png]] 在系统完全开机之前 文件系统会被 check 一次, 并且系统守护进程 daemon 会开始运行 这些指令 (shell scripts) 统一称为 init scripts System Firmware 硬件基础 机器开机的时候, CPU 会被固件层执行 boot code (存储在 ROM 中), 在诸如虚拟机等的 virtual 环境中, 这个也是虚拟的但是概念相近 系统固件(system ...
3. Metastable Failures in Distributed Systems
Updated2025-05-28|operating_system•distributed_sys•chaos_system•system_failure
Introduction metastable failures 亚稳态故障 a failure pattern in distributed systems Currently, metastable failures manifest themselves as black swan events(黑天鹅事故) they are outliers(异常事故) because nothing in the past points to their possibility have a severe impact much easier to explain in hindsight(事后) than to predict. Although instances of metastable failures can look different at the surface, deeper analysis shows that they can be understood within the same framework. By reviewing experiences fr ...
4. Metastable Failures in the Wild
Updated2025-05-28|operating_system•distributed_sys•chaos_system•system_failure
Introduction 这篇文章是基于 MetaStable Failures in Distributed Systems 的升级, in the wild 指的是不可控的实际世界 In this work, we study the prevalence(患病率) of such failures in the wild by scouring(冲刷) over publicly available incident reports from many organizations, ranging from hyperscalers to small companies In this paper, we make four contributions that extend the work of Bronson et al. and increase our understanding of metastable failures: A study of metastable failures in the wild that confirms metastable fai ...
0. Kubernetes (multipass + k3s + helm) + ChaosMesh
Updated2025-05-27
本文参考了 极客网(GeekHours-Kubernetes) 的笔记, 配置环境采用 macOS Sequoia + m3 (Silicone) 如果电脑配置不同, 建议参考上述网址进行下载配置 环境配置及基本原理讲解 单节点 k8s 环境部署 12brew install minikubeminikube start 多节点 k8s 环境部署 在单物理机上部署多个节点, 要么采用 docker 容器思路, 要么采用 虚拟机思路, 由于 kubernetes 本身并不是 docker 衍生品, 这里采用 虚拟机思路来实现 (想要利用 docker 实现的可以参考 kind 项目), 但是我们配置的虚拟机目的也应该是达到类似容器的轻量级, 命令行访问环境的条件即可, 因此我们采用 multipass 项目和 k3s 项目进行配置 multipass 轻量级虚拟机 这是由 Canonical 公司 (Ubuntu 母公司) 开发的一个项目, 支持通过命令行设置来进行控制虚拟机的配置和 vm 集群状态查询 123456789101112131415161718192021# 下载这个指令b ...
5. ZooKeeper: Wait-free Coordination for Internet-scale Systems
Updated2025-06-28
Overview ZooKeeper, a service for coordinating processes of distributed applications. aims to provide a simple and high performance kernel for building more complex coordination primitives at the client The interface exposed by ZooKeeper has the wait-free aspects of shared registers with an event-driven mechanism similar to cache invalidations of distributed file systems to provide a simple, yet powerful coordination service. Configuration Configuration is one of the most basic forms of coordina ...
2. Predictive and Adaptive Failure Mitigation to Avert Production Cloud VM Interruptions
Updated2025-05-16|distributed_sys
When a failure occurs in production systems, the highest priority is to quickly mitigate(缓解) it. Failure Mitigation (FM) is done in a reactive and ad-hoc way, namely taking some fixed actions only after a severe symptom is observed. Propose a preventive and adaptive failure mitigation service, NARYA, that is integraed in a production cloud, Microsoft Azure’s compute platform Narya predicts imminent(迫在眉睫的) host failures based on multi-layer system signals then decides smart mitigation actions go ...
1. AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
Updated2025-05-16|distributed_sys
AI for IT Operations (AIOps) aims to automate complex operational tasks, such as fault localization and root cause analysis traditional: ddressing isolated operational tasks LLM and AI agents: enabling end-to-end and multitask automation Target: self-healing cloud systems, a paradigm we term AgentOps AIOpsLab a framework that not only deploys micro-service cloud environments, injects faults, generates workloads, and exports telemetry(遥测得的) data but also orchestrates these components and provid ...
0. Hadoop Distributed File System
Updated2025-06-23|distributed_sys
Consistency 一致性 CAP Consistency 所有节点对同一份数据, 在同一时刻具有相同的视图 Transaction Consistency 事务开始前和结束后, 数据库必须处于一个合法的状态 数据复制中的一致性模型 如下表 一致性类型 定义 特点 强一致性(Strong Consistency) 所有读操作总能读取到最新写入的数据 类似单机行为, 用户视角简单但性能代价高 线性一致性(Linearizability) 操作结果看起来是按全局时间顺序排列 是强一致性的一种更严格形式 顺序一致性(Sequential Consistency) 各节点操作顺序一致, 但不保证全局时序 稍弱, 允许不同读者看到写入顺序不同但一致的版本 因果一致性(Causal Consistency) 如果一个操作因另一个而起, 它们必须按因果顺序执行 不相关的操作可乱序, 提高并发性 会话一致性(Session Consistency) 一个客户端在一个会话内的所有操作是顺序一致的 用户体验更好, 适用于移动端等临时连接系统 最终 ...
2. The Design of a Practical System for Fault-Tolerant Virtual Machines
Updated2025-05-11|distributed_sys
1. Kafka
Updated2025-05-12|distributed_sys
Introduction Event Streaming the practice of capturing data in real-time from event sources like databases, sensors, mobile devices, cloud services, and software applications in the form of streams of events routing the event streams to different destination technologies as needed ensures a continuous flow and interpretation of data so that the right information is at the right place, at the right time Kafka’s Event Stream Purpose To publish (write) and subscribe to (read) streams of events, in ...
1…567…30
avatar
Yuchen You (Wesley)
Articles
300
Tags
46
Categories
8
Follow Me
Announcement
This is my Blog
Recent Post
7. 分布式事务系统与 Spanner2025-11-28
6. Amazon Dynamo 系统2025-11-28
5. Transaction and ACID DB2025-11-28
4. Database Storage Structure2025-11-06
5. Consensus: Paxos Made Simple (not to me)2025-10-16
Categories
  • ai27
  • cpp8
  • cs_basic27
  • cybersecurity17
  • eecs28110
  • hardware2
  • math69
  • physics21
Tags
linear_algebra database virtual_machine sql thermal transformer machine_learning p_np complex_analysis field algorithm cpp_basic container cv hardware distributed_sys optimization chaos_system unix discrete_math cyber_security dynamic computability memory information_theory system_failure operating_system Model mse logic Consensus attention golang clip deep_learning structure kernel probability ODE statistics
Archives
  • November 20254
  • October 20255
  • September 202523
  • August 20253
  • July 20259
  • June 20253
  • May 202514
  • April 20253
Info
Article :
300
UV :
PV :
Last Update :
©2020 - 2025 By Yuchen You (Wesley)
Framework Hexo|Theme Butterfly
welcome to my blog!
Search
Loading the Database