avatar
Articles
280
Tags
47
Categories
7

Home
Archives
Tags
Categories
About
Yuchen You
Search
Home
Archives
Tags
Categories
About

Yuchen You

4. Metastable Failures in the Wild
Updated2025-05-28|operating_system•distributed_sys•chaos_system•system_failure
Introduction 这篇文章是基于 MetaStable Failures in Distributed Systems 的升级, in the wild 指的是不可控的实际世界 In this work, we study the prevalence(患病率) of such failures in the wild by scouring(冲刷) over publicly available incident reports from many organizations, ranging from hyperscalers to small companies In this paper, we make four contributions that extend the work of Bronson et al. and increase our understanding of metastable failures: A study of metastable failures in the wild that confirms metastable fai ...
0. Kubernetes (multipass + k3s + helm) + ChaosMesh
Updated2025-05-27
本文参考了 极客网(GeekHours-Kubernetes) 的笔记, 配置环境采用 macOS Sequoia + m3 (Silicone) 如果电脑配置不同, 建议参考上述网址进行下载配置 环境配置及基本原理讲解 单节点 k8s 环境部署 12brew install minikubeminikube start 多节点 k8s 环境部署 在单物理机上部署多个节点, 要么采用 docker 容器思路, 要么采用 虚拟机思路, 由于 kubernetes 本身并不是 docker 衍生品, 这里采用 虚拟机思路来实现 (想要利用 docker 实现的可以参考 kind 项目), 但是我们配置的虚拟机目的也应该是达到类似容器的轻量级, 命令行访问环境的条件即可, 因此我们采用 multipass 项目和 k3s 项目进行配置 multipass 轻量级虚拟机 这是由 Canonical 公司 (Ubuntu 母公司) 开发的一个项目, 支持通过命令行设置来进行控制虚拟机的配置和 vm 集群状态查询 123456789101112131415161718192021# 下载这个指令b ...
5. ZooKeeper: Wait-free Coordination for Internet-scale Systems
Updated2025-06-28
Overview ZooKeeper, a service for coordinating processes of distributed applications. aims to provide a simple and high performance kernel for building more complex coordination primitives at the client The interface exposed by ZooKeeper has the wait-free aspects of shared registers with an event-driven mechanism similar to cache invalidations of distributed file systems to provide a simple, yet powerful coordination service. Configuration Configuration is one of the most basic forms of coordina ...
2. Predictive and Adaptive Failure Mitigation to Avert Production Cloud VM Interruptions
Updated2025-05-16|distributed_sys
When a failure occurs in production systems, the highest priority is to quickly mitigate(缓解) it. Failure Mitigation (FM) is done in a reactive and ad-hoc way, namely taking some fixed actions only after a severe symptom is observed. Propose a preventive and adaptive failure mitigation service, NARYA, that is integraed in a production cloud, Microsoft Azure’s compute platform Narya predicts imminent(迫在眉睫的) host failures based on multi-layer system signals then decides smart mitigation actions go ...
1. AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
Updated2025-05-16|distributed_sys
AI for IT Operations (AIOps) aims to automate complex operational tasks, such as fault localization and root cause analysis traditional: ddressing isolated operational tasks LLM and AI agents: enabling end-to-end and multitask automation Target: self-healing cloud systems, a paradigm we term AgentOps AIOpsLab a framework that not only deploys micro-service cloud environments, injects faults, generates workloads, and exports telemetry(遥测得的) data but also orchestrates these components and provid ...
0. Hadoop Distributed File System
Updated2025-06-23|distributed_sys
Consistency 一致性 CAP Consistency 所有节点对同一份数据, 在同一时刻具有相同的视图 Transaction Consistency 事务开始前和结束后, 数据库必须处于一个合法的状态 数据复制中的一致性模型 如下表 一致性类型 定义 特点 强一致性(Strong Consistency) 所有读操作总能读取到最新写入的数据 类似单机行为, 用户视角简单但性能代价高 线性一致性(Linearizability) 操作结果看起来是按全局时间顺序排列 是强一致性的一种更严格形式 顺序一致性(Sequential Consistency) 各节点操作顺序一致, 但不保证全局时序 稍弱, 允许不同读者看到写入顺序不同但一致的版本 因果一致性(Causal Consistency) 如果一个操作因另一个而起, 它们必须按因果顺序执行 不相关的操作可乱序, 提高并发性 会话一致性(Session Consistency) 一个客户端在一个会话内的所有操作是顺序一致的 用户体验更好, 适用于移动端等临时连接系统 最终 ...
2. The Design of a Practical System for Fault-Tolerant Virtual Machines
Updated2025-05-11|distributed_sys
1. Kafka
Updated2025-05-12|distributed_sys
Introduction Event Streaming the practice of capturing data in real-time from event sources like databases, sensors, mobile devices, cloud services, and software applications in the form of streams of events routing the event streams to different destination technologies as needed ensures a continuous flow and interpretation of data so that the right information is at the right place, at the right time Kafka’s Event Stream Purpose To publish (write) and subscribe to (read) streams of events, in ...
1. Map Reduce, Simplified Data Processing on Large Clusters
Updated2025-05-09|distributed_sys
link for this paper: mapreduce link for mit cs6.824 lecture: lecture 1 ([[mapreduce-osdi04.pdf#page=1&selection=10,0,12,10&color=yellow|MapReduce is a programming model and an associated implementation for processing and generating large data sets.]]) contribution: 可以运行在 commodity machines 上面, scalable 在上千台机器上运行大量数据 (# terabytes) programmer 不需知道很多并行相关的知识, easy to use hides detail for parallelization, fault-tolerance, locality optimization, and load balancing many problems are easily ...
9. 数字取证 Digital-Forensics
Updated2025-04-03|cybersecurity|cyber_security
数字取证流程(Digital Forensics Process) 四个核心阶段 识别(Identification) 确定存储关键数据的物理/数字对象(如计算机, 硬盘, 移动设备, 外接媒体等); 收集(Collection) 保护证据完整性, 建立链式监管(Chain of Custody), 记录证据哈希(Hash of Image)以验证来源; 分析(Analysis) 检查文件系统, 日志, 内存等, 恢复删除文件(Deleted Files)或残留数据(Slack Space); 报告(Reporting) 形成专家报告, 提供法律证据(Legal Evidence); 数据收集与保存(Data Collection & Preservation) 数据来源 computer other harddrive monitor keyboard and mouse media (dvd, cd, usb) printer digital forensics did not replace traditional (physical) ...
1…678…28
avatar
Yuchen You (Wesley)
Articles
280
Tags
47
Categories
7
Follow Me
Announcement
This is my Blog
Recent Post
1. 流体压强2026-01-15
0. 搜索引擎基础概念2026-01-15
0. 基本 GPU 架构2026-01-14
0. Introduction to Fluid Dynamics2026-01-13
1. transformer2026-01-11
Categories
  • ai21
  • cs_basic26
  • cybersecurity15
  • eecs2818
  • math68
  • mlsys1
  • physics21
Tags
thermal distributed_sys p_np mlsys deep_learning structure schedule fluid cuda operating_system information_theory Consensus algorithm computer_composition sql complex_analysis gpu machine_learning attention cv container multi-variable_function network dynamics ODE cpp_basic linear_algebra statistics mse database logic field discrete_math computability golang system_failure virtual_machine transformer dynamic kernel
Archives
  • January 20265
  • December 20254
  • November 20253
  • October 20255
  • September 202523
  • August 20253
  • July 20259
  • June 20253
Info
Article :
280
UV :
PV :
Last Update :
©2020 - 2026 By Yuchen You (Wesley)
Framework Hexo|Theme Butterfly
welcome to my blog!
Search
Loading the Database