淘宝网的架构分析(小组)
一个成熟的大型网站(如淘宝、京东等)的系统架构需要考虑诸多复杂的因素,因为像淘宝这种大型网站数据量比一般的网站要大的多,所以在设计架构方面也要复杂的多,既要考虑成本因素也要考虑访问速度安全性等。这里我简单的对淘宝的网站系统架构进行一个简单的分析。
淘宝作为一个大型购物网站,其数据量是很大的,所以不像一般网站,淘宝需要用各种方法来保证服务器的正常运行以及用户购买时的良好体验。主要由以下方式:
1. 应用、数据、文件分离 2.利用缓存改善网站性能 3.使用CDN和反向代理提高访问速度 4.使用分布式文件系统
5.将应用服务器进行业务拆分
首先随着作为大型购物网站,一台服务器已经肯定满足性能需求,所以将应用程序、数据库、文件各自部署在独立的服务器上,并且根据服务器的用途配置不同的硬件,达到最佳的性能效果。在硬件优化性能的同时,同时也通过软件进行性能优化,在大部分的网站系统中,都会利用缓存技术改善系统的性能,使用缓存主要源于热点数据的存在,大部分网站访问都遵循28原则(即80%的访问请求,最终落在20%的数据上),所以我们可以对热点数据进行缓存,减少这些数据的访问路径,提高用户体验。
由于功能复杂,用户访问路径长,淘宝选择对这些数据进行缓存以提高用户的访问速度。缓存实现常见的方式是本地缓存、分布式缓存。本地缓存,顾名思义是将数据缓存在应用服务器本地,可以存在内存中,也可以存在文件,本地缓存的特点是速度快,但因为本地空间有限所以缓存数据量也有限。分布式缓存的特点是,可以缓存海量的数据,并且扩展非常容易,在门户类网站中常常被使用,速度按理没有本地缓存快。同时提供均衡负载服务器来分担主要服务器的压力。
使用CDN和反向代理提高网站性能。由于淘宝的服务器不能分布在国内的每个地方,所以不同地区的用户访问需要通过互联路由器经过不同长度的路径来访问服务器,返回路径也一样,所以数据传输时间比较长。对于这种情况,常常使用CDN解决,CDN将数据内容缓存到运营商的机房,用户访问时先从最近的运营商获取数据,这样大大减少了网络访问的路径。
在这里简单介绍一下CDN的原理。CDN的全称Content Delivery Network,即内容分发网络。CDN是一个经策略性部署的整体系统,从技术上全面解决由于网络带宽小、用户访问量大、网点分布不均而产生的用户访问网站响应速度慢的根本原因。CDN目的是通过在现有的Internet中增加一层新的网络架构,将网站的内容发布到最接近用户的网络“边缘”,使用户可以就近取得所需的内容,解决 Internet网络拥塞状况,提高用户访问网站的响应速度。CDN是一种组合技术,其中包括源站、缓存服务器、智能DNS几个重要部分。
源站
源站指发布内容的原始站点。添加、删除和更改网站的文件,都是在源站上进行的;另外缓存服务器所抓取的对象也全部来自于源站。
缓存服务器
缓存服务器是直接提供给用户访问的站点资源,有一台或数台服务器组成;当用户发起访问时,他的访问请求被智能DNS定位到离他较近的缓存服务器。如果用户所请求的内容刚好在缓存里面,则直接把内容返还给用户;如果访问所需的内容没有被缓存,则缓存服务器向邻近的缓存服务器或直接向源站抓取内容,然后再返还给用户。
智能DNS
智能DNS是整个CDN技术的核心,它主要根据用户的来源,将其访问请求指向离用户比较近的缓存服务器,如把广州电信的用户请求指向到广州电信IDC机房中的缓存服务器。通过智能DNS解析,让用户访问同服务商下的服务器,消除国内南北网络互相访问慢的问题,达到加速作用。智能DNS的出现,颠复了传统的一个域名对应一个镜像的做法,让用户更加便捷的去访问网站。
随着业务进一步扩展,这时我们需要将应用程序进行业务拆分。每个业务应用负责相对独立的业务运作(所以需要开辟多个服务对不同业务进行划分)。业务之间通过消息进行通信或者同享数据库来实现。所以在负载服务器上链接一个B业务服务器在对其业务进行分布式划分(如用户,订单,支付等),然后B级业务服务器通过消息队列服务器来与A业务服务区进行实时通信已经共享数据。 诸如此类的A,B服务器有许多,相互之间都可以进行共享数据。
Architecture analysis of Taobao (group)
The system architecture of a mature large website (such as Taobao, JD.com, etc.) needs to consider many complex factors, because the data volume of a large website like Taobao is much larger than that of an ordinary website, so the design architecture is also much more complex, considering both cost factors and access speed and security. Here I will simply analyze the system architecture of Taobao's website.
As a large shopping website, Taobao has a large amount of data, so unlike ordinary websites, Taobao needs to use various methods to ensure the normal operation of the server and the good experience of users when purchasing. Mainly in the following ways:
1. Separation of applications, data, and files 2. Using cache to improve website performance 3. Using CDN and reverse proxy to improve access speed 4. Using distributed file system
5. Splitting the application server for business
First of all, as a large shopping website, one server has definitely met the performance requirements, so the application, database, and files are deployed on separate servers, and different hardware is configured according to the purpose of the server to achieve the best performance effect. While optimizing performance through hardware, we also optimize performance through software. In most website systems, cache technology is used to improve system performance. The use of cache is mainly due to the existence of hot data. Most website access follows the 28 principle (that is, 80% of access requests eventually fall on 20% of the data), so we can cache hot data, reduce the access path of these data, and improve user experience.
Due to the complexity of functions and the long user access path, Taobao chooses to cache these data to improve user access speed. Common ways to implement cache are local cache and distributed cache. Local cache, as the name suggests, caches data locally on the application server. It can exist in memory or in files. The characteristic of local cache is fast speed, but because the local space is limited, the amount of cached data is also limited. The characteristic of distributed cache is that it can cache massive amounts of data and is very easy to expand. It is often used in portal websites, but the speed is not as fast as local cache. At the same time, a load balancing server is provided to share the pressure of the main server.
Use CDN and reverse proxy to improve website performance. Since Taobao's servers cannot be distributed everywhere in China, users in different regions need to access the server through interconnected routers through paths of different lengths, and the return path is the same, so the data transmission time is relatively long. For this situation, CDN is often used to solve it. CDN caches the data content in the operator's computer room. When users access it, they first obtain data from the nearest operator, which greatly reduces the network access path.
Here is a brief introduction to the principle of CDN. The full name of CDN is Content Delivery Network. CDN is a strategically deployed overall system that technically solves the fundamental reason for the slow response speed of users accessing websites due to small network bandwidth, large user visits, and uneven distribution of outlets. The purpose of CDN is to add a new layer of network architecture to the existing Internet, publish the content of the website to the "edge" of the network closest to the user, so that users can obtain the required content nearby, solve the Internet network congestion, and improve the response speed of users accessing websites. CDN is a combination of technologies, including several important parts: source station, cache server, and smart DNS.
Source station
The source station refers to the original site where the content is published. Adding, deleting and changing files on a website are all done on the source site; in addition, all objects captured by the cache server also come from the source site.
Cache Server
The cache server is a site resource directly provided to users for access. It consists of one or more servers. When a user initiates an access, his access request is located by the smart DNS to a cache server that is closer to him. If the content requested by the user happens to be in the cache, the content is directly returned to the user; if the content required for access is not cached, the cache server fetches the content from a neighboring cache server or directly from the source site, and then returns it to the user.
Smart DNS
Smart DNS is the core of the entire CDN technology. It mainly points its access request to a cache server that is closer to the user based on the source of the user, such as pointing the user request of Guangzhou Telecom to the cache server in the IDC computer room of Guangzhou Telecom. Through smart DNS resolution, users can access servers under the same service provider, eliminating the problem of slow access between the north and south networks in China, and achieve acceleration. The emergence of smart DNS has overturned the traditional practice of one domain name corresponding to one mirror, allowing users to access websites more conveniently.
As the business expands further, we need to split the application into business. Each business application is responsible for relatively independent business operations (so multiple services need to be opened to divide different businesses). Businesses communicate through messages or share databases. Therefore, a B business server is connected to the load server to distribute its business (such as users, orders, payments, etc.), and then the B-level business server communicates with the A business service area in real time through the message queue server to share data. There are many such A and B servers, and they can share data with each other.
After the adjustment of the organizational structure of Taobao and Tmall, three new centers were established to support the development and integration of Taobao and Tmall. Specifically, the three new centers are: Consumer Experience Center, Merchant Service Center and Digital Technology Center. Among them, the Consumer Experience Center is mainly responsible for improving consumers' shopping experience; the Merchant Service Center is mainly responsible for improving the service quality of merchants; and the Digital Technology Center is mainly responsible for promoting digital transformation.