Categoría: Internet / Ordenadores

https://www.vox.com/2014/6/16/18076282/the-internet

https://web.stanford.edu/class/msande91si/www-spr04/readings/week1/InternetWhitepaper.htm

What is the internet?

How does the internet work?

The internet is the world’s most popular computer network. It began as an academic research project in 1969, and became a global commercial network in the 1990s. Today it is used by more than 2 billion people around the world.

The internet is notable for its decentralization. No one owns the internet or controls who can connect to it. Instead, thousands of different organizations operate their own networks and negotiate voluntary interconnection agreements.

Most people access internet content using a web browser. Indeed, the web has become so popular that many people incorrectly treat the internet and the web as synonymous. But in reality, the web is just one of many internet applications. Other popular Internet applications include email and BitTorrent.

Where is the internet?The internet has three basic parts:

The last mileis the part of the internet that connects homes and small businesses to the internet. Currently,about 60 percentof residential internet connections in the United States are provided by cable TV companies such as Comcast and Time Warner. Of the remaining 40 percent, a growing fraction use new fiber optic cables, most of which are part of Verizon’s FiOS program or AT&T’s U-Verse. Finally, a significant but shrinking number use outdated DSL service provided over telephone cables.

The last mile also includes the towers that allow people to access the internet with their cell phones. Wireless internet service accounts for a large and growing share of all internet usage.

Data centersare rooms full of servers that store user data and host online apps and content. Some are owned by large companies such as Google and Facebook. Others are commercial facilities that provide service to many smaller websites. Data centers have very fast internet connections, allowing them to serve many users simultaneously. Data centers can be located anywhere in the world, but they are often located in remote areas where land and electricity are cheap. For example,Google,Facebook, andMicrosofthave all constructed vast data centers in Iowa.

The backboneconsists of long-distance networks — mostly on fiber optic cables — that carry data between data centers and consumers. The backbone market is highly competitive. Backbone providers frequently connect their networks together atinternet exchange points, usually located in major cities. Establishing a presence at IEPs makes it much easier for backbone providers to improve their connections to others.

Who created the internet? The internet began as ARPANET, an academic research network that was funded by the military’s Advanced Research Projects Agency (ARPA, now DARPA). The project was led byBob Taylor, an ARPA administrator, and the network was built by the consulting firm of Bolt, Beranek and Newman. It began operations in 1969.

In 1973, software engineers Vint CerfandBob Kahnbegan work on the next generation of networking standards for the ARPANET. These standards, known asTCP/IP, became the foundation of the modern internet. ARPANET switched to using TCP/IP on January 1, 1983.

During the 1980s, funding for the internet shifted from the military to theNational Science Foundation.The NSF funded the long-distance networks that served as the internet’s backbone from 1981 until 1994. In 1994, the Clinton Administrationturned controlover the internet backbone to the private sector. It has been privately operated and funded ever since.

Who runs the internet? No one runs the internet. It’s organized as a decentralized network of networks. Thousands of companies, universities, governments, and other entities operate their own networks and exchange traffic with each other based on voluntary interconnection agreements.

The shared technical standards that make the internet work are managed by an organization called theInternet Engineering Task Force.The IETF is an open organization; anyone is free to attend meetings, propose new standards, and recommend changes to existing standards. No one is required to adopt standards endorsed by the IETF, but the IETFs consensus-based decision-making process helps to ensure that its recommendations are generally adopted by the internet community.

TheInternet Corporation for Assigned Names and Numbers(ICANN) is sometimes described as being responsible for internet governance. As its name implies, ICANN is in charge of distributing domain names (likevox.com) andIP addresses. But ICANN doesn’t control who can connect to the internet or what kind of information can be sent over it.

What’s an IP address?Internet Protocol addresses are numbers that computers use to identify each other on the internet. For example, an IP address forvox.comis 216.146.46.10.

An ICANN department known as theInternet Assigned Numbers Authorityis responsible for distributing IP addresses to ensure that two different organizations don’t use the same address.

https://royal.pingdom.com/2009/05/26/the-number-of-possible-ipv6-addresses-read-out-loud/

What is IPv6?The current internet standard, known as IPv4, only allows for about 4 billion IP addresses. This was considered a very big number in the 1970s, but today, the supply of IPv4 addresses is nearly exhausted.

So internet engineers have developed a new standard calledIPv6.IPv6 allows for a mind-boggling number of unique addresses — the exact figure is39 digits long— ensuring that the world will never again run out.

At first, the transition to IPv6 was slow. Technical work on the standard was completed in the 1990s, but the internet community faced a serious chicken-and-egg problem: as long as most people were using IPv4, there was little incentive for anyone to switch to IPv6.

But as IPv4 addresses became scarce, IPv6 adoption accelerated. Thefraction of users who connected to Google via IPv6grew from 1 percent at the beginning of 2013 to 6 percent in mid-2015.

How does wireless internet work?In its early years, internet access was carried over physical cables. But more recently, wireless internet access has become increasingly common.

There are two basic types of wireless internet access: wifi and cellular.Wifi networksare relatively simple. Anyone can purchase wifi networking equipment in order to provide internet access in a home or business. Wifi networks use unlicensed spectrum: electromagnetic frequencies that are available for anyone to use without charge. To prevent neighbors’ networks from interfering with each other, there are strict limits on the power (and therefore the range) of wifi networks.

Cellular networks are more centralized. They work by breaking up the service territory into cells. In the densest areas, cells can be as small as a single city block; in rural areas a cell can be miles across. Each cell has a tower at its center providing services to devices there. When a device moves from one cell to another, the network automatically hands off the device from one tower to another, allowing the user to continue communicating without interruption.

Cells are too large to use the unlicensed, low-power spectrum used by wifi networks. Instead, cellular networks use spectrum licensed for their exclusive use. Because this spectrum is scarce, it is usually awarded by auction. Wireless auctions havegenerated tens of billions of dollarsin revenue for the US treasury since the first one was held in 1994.

What is the cloud?The cloud describes an approach to computing that has become popular in the early 2000s. By storing files on servers and delivering software over the internet, cloud computing provides users with a simpler, more reliable computing experience. Cloud computing allows consumers and businesses to treat computing as a utility, leaving the technical details to technology companies.

For example, in the 1990s, many people used Microsoft Office to edit documents and spreadsheets. They stored documents on their hard drives. And when a new version of Microsoft Office was released, customers had to purchase it and manually install it on their PCs.

In contrast, Google Docs is a cloud office suite. When a user visitsdocs.google.com, she automatically gets the latest version of Google Docs. Because her files are stored on Google’s servers, they’re available from any computer. Even better, she doesn’t have to worry about losing her files in a hard drive crash. (Microsoft now has its own cloud office suite called Office 365.)

There are many other examples. Gmail and Hotmail are cloud email services that have largely replaced desktop email clients such as Outlook. Dropbox is a cloud computing service that automatically synchronizes data between devices, saving people from having to carry files around on floppy disks. Apple’s iCloud automatically copies users’ music and other files from their desktop computer to their mobile devices, saving users the hassle of synchronizing via a USB connection.

Cloud computing is having a big impact for businesses too. In the 1990s, companies wanting to create a website needed to purchase and operate their own servers. But in 2006,Amazon.comlaunched Amazon Web Services, which allows customers to rent servers by the hour. That has lowered the barrier to entry for creating websites and made it much easier for sites to quickly expand capacity as they grow more popular.

What is a packet?A packet is the basic unit of information transmitted over the internet. Splitting information up into small, digestible pieces allows the network’s capacity to be used more efficiently.

A packet has two parts. Theheadercontains information that helps the packet get to its destination, including the length of the packet, its source and destination, and a checksum value that helps the recipient detect if a packet was damaged in transit. After the header comes the actual data. A packet can contain up to 64 kilobytes of data, which is roughly 20 pages of plain text.

If internet routers experience congestion or other technical problems, they are allowed to deal with it by simply discarding packets. It’s the sending computer’s responsibility to detect that a packet didn’t reach its destination and send another copy. This approach might seem counterintuitive, but it simplifies the internet’s core infrastructure, leading to higher performance at lower cost.

What is the World Wide Web?The World Wide Web is a popular way to publish information on the internet. The web was created byTimothy Berners-Lee, a computer programmer at the European scientific research organizationCERN, in 1991. It offered a more powerful and user-friendly interface than other internet applications. The web supported hyperlinks, allowing users to browse from one document to another with a single click.

Over time, the web became increasingly sophisticated, supporting images, audio, video, and interactive content. In the mid-1990s, companies such as Yahoo andAmazon.combegan building profitable businesses based on the web. In the 2000s, full-featured web-based applications such as Yahoo Maps and Google Docs were created.

In 1994, Berners-Lee created theWorld Wide Web Consortium(W3C) to be the web’s official standards organization. He is still the W3Cs director and continues to oversee the development of web standards. However, the web is an open platform, and the W3C can’t compel anyone to adopt its recommendations. In practice, the organizations with the most influence over the web are Microsoft, Google, Apple, and Mozilla, the companies that produce the leading web browsers. Any technologies adopted by these four become de facto web standards.

The web has become so popular that many people now regard it as synonymous with the internet itself. But technically, the web is just one of many internet applications. Other applications include email and BitTorrent.

What’s a web browser?A web browser is a computer program that allows users to download and view websites. Web browsers are available for desktop computers, tablets, and mobile phones.

The first widely used browser wasMosaic, created by researchers at the University of Illinois. The Mosaic team moved to California to foundNetscape, which built the first commercially successful web browser in 1994.

Netscape’s popularity was soon eclipsed by Microsoft’sInternet Explorer, but an open source version of Netscape’s browser became the modernFirefoxbrowser. Apple released itsSafaribrowser in 2003, and Google released a browser calledChromein 2008. By 2015, Chrome had grown to be themost popular web browserwith a market share around 50 percent. Internet Explorer, Firefox, and Safari also had significant market share.

What is SSL?SSL, short for Secure Sockets Layer, is a family of encryption technologies that allows web users to protect the privacy of information they transmit over the internet.

When you visit a secure website such asGmail.com, you’ll see a lock next to the URL, indicating that your communications with the site are encrypted. Here’s what that looks like in Google’s Chrome browser:

That lock is supposed to signal that third parties won’t be able to read any information you send or receive. Under the hood, SSL accomplishes that by transforming your data into a coded message that only the recipient knows how to decipher. If a malicious party is listening to the conversation, it will only see a seemingly random string of characters, not the contents of your emails, Facebook posts, credit card numbers, or other private information.

SSL was introduced by Netscape in 1994. In its early years, it was only used on a few types of websites, such as online banking sites. By the early 2010s, Google, Yahoo, and Facebook all used SSL encryption for their websites and online services.More recently, there has been a movement toward making the use of SSL universal. In 2015, Mozilla announced that future versions of the Firefox browser would treat the lack of SSL encryption as a security flaw, as a way to encourage all websites to upgrade. Google is consideringtaking the same stepwith Chrome.

What is the Domain Name System?The Domain Name System (DNS) is the reason you can access Vox by typingvox.cominto your browser rather than a hard-to-remember numeric address such as 216.146.46.10.

The system is hierarchical. For example, the .com domain is administered by a company called Verisign. Verisign assigns sub-domains likegoogle.comandvox.com. Owners of these second-level domains, in turn, can create sub-domains such asmail.google.comandmaps.google.com.

Because popular websites use domain names to identify themselves to the public, the security of DNS has become an increasing concern. Criminals and government spies alike have sought to compromise DNS in order to impersonate popular websites such asfacebook.comandgmail.comand intercept their private communications. A standard calledDNSSECseeks to beef up DNS security with encryption, but few people have adopted it.

Who decides what domain names exist and who gets them?The domain name system is administered by theInternet Corporation for Assigned Names and Numbers(ICANN), a non-profit organization based in California. ICANN was founded in 1998. It was granted authority over DNS by the US Commerce Department, though it has increasingly asserted its independence from the US government.

There are two types of domain names. The first is generic top-level domains (gTLDs) such as .com, .edu, .org, and .gov. Because the internet originated in the United States, these domains tend to be most popular there. Authority over these domains is usually delegated to private organizations.

There are also country-code top-level domains (ccTLDs). Each country in the world has its own 2-letter code. For example, the ccTLD for the United States is .us, Great Britain’s is .uk, and China’s is .cn. These domains are administered by authorities in each country. Some ccTLDs, such as .tv (for the island nation of Tuvalu) and .io (the British Indian Ocean Territory), have become popular for use outside of their home countries.

In 2011, ICANNvotedto make it easier to create new gTLDs. As a result, there may be dozens or even hundreds of new domains in the next few years.

https://www.ietf.org/

https://www.icann.org/

https://en.wikipedia.org/wiki/Internet_protocol_suite

How does the Internet Work?

The Internet works through a packet routing network in accordance with the Internet Protocol (IP), the Transport Control Protocol (TCP) and other protocols.

What’s a protocol?

A protocol is a set of rules specifying how computers should communicate with each other over a network. For example, the Transport Control Protocol has a rule that if one computer sends data to another computer, the destination computer should let the source computer know if any data was missing so the source computer can re-send it. Or the Internet Protocol which specifies how computers should route information to other computers by attaching addresses onto the data it sends.

What’s a packet?

Data sent across the Internet is called a message. Before a message is sent, it is first split in many fragments called packets. These packets are sent independently of each other. The typical maximum packet size is between 1000 and 3000 characters. The Internet Protocol specifies how messages should be packetized.

What’s a packet routing network?

It is a network that routes packets from a source computer to a destination computer. The Internet is made up of a massive network of specialized computers called routers. Each router’s job is to know how to move packets along from their source to their destination. A packet will have moved through multiple routers during its journey.

When a packet moves from one router to the next, it’s called a hop. You can use the command line-tool traceroute to see the list of hops packets take between you and a host.

Command-line utility traceroute showing all the hops between my computer and google’s servers

The Internet Protocol specifies how network addresses should be attached to the packet’s headers, a designated space in the packet containing its meta-data. The Internet Protocol also specifies how the routers should forward the packets based on the address in the header.

Where did these Internet routers come from? Who owns them?

These routers originated in the 1960s as ARPANET, a military project whose goal was a computer network that was decentralized so the government could access and distribute information in the case of a catastrophic event. Since then, a number of Internet Service Providers (ISP) corporations have added routers onto these ARPANET routers.

There is no single owner of these Internet routers, but rather multiple owners: The government agencies and universities associated with ARPANET in the early days and ISP corporations like AT&T and Verizon later on.

Asking who owns the Internet is like asking who owns all the telephone lines. No one entity owns them all; many different entities own parts of them.

Do the packets always arrive in order? If not, how is the message re-assembled?

The packets may arrive at their destination out of order. This happens when a later packet finds a quicker path to the destination than an earlier one. But packet’s header contains information about the packet’s order relative to the entire message. The Transport Control Protocol uses this info for reconstructing the message at the destination.

Do packets always make it to their destination?

The Internet Protocol makes no guarantee that packets will always arrive at their destinations. When that happens, it’s called called a packet loss. This typically happens when a router receives more packets it can process. It has no option other than to drop some packets.

However, the Transport Control Protocol handles packet loss by performing re-transmissions. It does this by having the destination computer periodically send acknowledgement packets back to the source computer indicating how much of the message it has received and reconstructed. If the destination computer finds there are missing packets, it sends a request to the source computer asking it to resend the missing packets.

When two computers are communicating through the Transport Control Protocol, we say there is a TCP connection between them.

What do these Internet addresses look like?

These addresses are called IP addresses and there are two standards.

The first address standard is called IPv4 and it looks like 212.78.1.25 . But because IPv4 supports only 2³² (about 4 billion) possible addresses, the Internet Task Force proposed a new address standard called IPv6, which look like 3ffe:1893:3452:4:345:f345:f345:42fc . IPv6 supports 2¹²⁸ possible addresses, allowing for much more networked devices, which will be plenty more than the as of 2017 current 8+ billion networked devices on the Internet.

As such, there is a one-to-one mapping between IPv4 and IPv6 addresses. Note the switch from IPv4 to IPv6 is still in progress and will take a long time. As of 2014, Google revealed their IPv6 traffic was only at 3%.

How can there be over 8 billion networked devices on the Internet if there are only about 4 billion IPv4 addresses?

It’s because there are public and private IP addresses. Multiple devices on a local network connected to the Internet will share the same public IP address. Within the local network, these devices are differentiated from each other by private IP addresses, typically of the form 192.168.xx or 172.16.x.x or 10.x.x.x where x is a number between 1 and 255. These private IP addresses are assigned by Dynamic Host Configuration Protocol (DHCP).

For example, if a laptop and a smart phone on the same local network both make a request to www.google.com, before the packets leave the modem, it modifies the packet headers and assigns one of its ports to that packet. When the google server responds to the requests, it sends data back to the modem at this specific port, so the modem will know whether to route the packets to the laptop or the smart phone.

In this sense, IP addresses aren’t specific to a computer, but more the connection which the computer connects to the Internet with. The address that is unique to your computer is the MAC address, which never changes throughout the life of the computer.

This protocol of mapping private IP addresses to public IP addresses is called the Network Address Translation (NAT) protocol. It’s what makes it possible to support 8+ billion networked devices with only 4 billion possible IPv4 addresses.

How does the router know where to send a packet? Does it need to know where all the IP addresses are on the Internet?

Every router does not need to know where every IP address is. It only needs to know which one of its neighbors, called an outbound link, to route each packet to. Note that IP Addresses can be broken down into two parts, a network prefix and a host identifier. For example, 129.42.13.69 can be broken down into

129.4213.69

All networked devices that connect to the Internet through a single connection (ie. college campus, a business, or ISP in metro area) will all share the same network prefix.

Routers will send all packets of the form 129.42.*.* to the same location. So instead of keeping track of billions of IP addresses, routers only need to keep track of less than a million network prefix.

**But a router still needs to know a lot of network prefixes . If a new

router is added to the Internet how does it know how to handle packets for all these network prefixes?**

A new router may come with a few preconfigured routes. But if it encounters a packet it does not know how to route, it queries one of its neighboring routers. If the neighbor knows how to route the packet, it sends that info back to the requesting router. The requesting router will save this info for future use. In this way, a new router builds up its own routing table, a database of network prefixes to outbound links. If the neighboring router does not know, it queries its neighbors and so on.

How do networked computers figure out ip addresses based on domain names?

We call looking up the IP address of a human-readable domain name like www.google.com resolving the IP address”. Computers resolve IP addresses through the Domain Name System (DNS), a decentralized database of mappings from domain names to IP addresses.

To resolve an IP address, the computer first checks its local DNS cache, which stores the IP address of web sites it has visited recently. If it can’t find the IP address there or that IP address* record has expired, it queries the ISPs* DNS servers which are dedicated to resolving IP addresses. If the ISPs DNS servers can’t find resolve the IP address, they query the root name servers, which can resolve every domain name for a given top-level domain . Top-level domains are the words to the right of the right-most period in a domain name. .com .net .org are some examples of top-level domains.

How do applications communicate over the Internet?

Like many other complex engineering projects, the Internet is broken down into smaller independent components, which work together through well-defined interfaces. These components are called the Internet Network Layers and they consist of Link Layer, Internet Layer, Transport Layer, and Application Layer. These are called layers because they are built on top of each other; each layer uses the capabilities of the layers beneath it without worrying about its implementation details.

Internet applications work at the Application Layer and don’t need to worry about the details in the underlying layers. For example, an application connects to another application on the network via TCP using a construct called a socket, which abstracts away the gritty details of routing packets and re-assembling packets into messages.

What do each of these Internet layers do?

At the lowest level is the Link Layer which is the physical layer” of the Internet. The Link Layer is concerned with transmitting data bits through some physical medium like fiber-optic cables or wifi radio signals.

On top of the Link Layer is the Internet Layer. The Internet Layer is concerned with routing packets to their destinations. The Internet Protocol mentioned earlier lives in this layer (hence the same name). The Internet Protocol dynamically adjusts and reroutes packets based on network load or outages. Note it does not guarantee packets always make it to their destination, it just tries the best it can.

On top of the Internet Layer is the Transport Layer. This layer is to compensate for the fact that data can be loss in the Internet and Link layers below. The Transport Control Protocol mentioned earlier lives at this layer, and it works primarily to re-assembly packets into their original messages and also re-transmit packets that were loss.

The Application Layer sits on top. This layer uses all the layers below to handle the complex details of moving the packets across the Internet. It lets applications easily make connections with other applications on the Internet with simple abstractions like sockets. The HTTP protocol which specifies how web browsers and web servers should interact lives in the Application Layer. The IMAP protocol which specifies how email clients should retrieve email lives in the Application Layer. The FTP protocol which specifies a file-transferring protocol between file-downloading clients and file-hosting servers lives in the Application Layer.

What’s a client versus a server?

While clients and servers are both applications that communicate over the Internet, clients are closer to the user” in that they are more user-facing applications like web browsers, email clients, or smart phone apps. Servers are applications running on a remote computer which the client communicates over the Internet when it needs to.

A more formal definition is that the application that initiates a TCP connection is the client, while the application that receives the TCP connection is the server.

How can sensitive data like credit cards be transmitted securely over the Internet?

In the early days of the Internet, it was enough to ensure that the network routers and links are in physically secure locations. But as the Internet grew in size, more routers meant more points of vulnerability. Furthermore, with the advent of wireless technologies like WiFi, hackers could intercept packets in the air; it was not enough to just ensure the network hardware was physically safe. The solution to this was encryption and authentication through SSL/TLS.

What is SSL/TLS?

SSL stands for Secured Sockets Layer. TLS stands for Transport Layer Security. SSL was first developed by Netscape in 1994 but a later more secure version was devised and renamed TLS. We will refer to them together as SSL/TLS.

SSL/TLS is an optional layer that sits between the Transport Layer and the Application Layer. It allows secure Internet communication of sensitive information through encryption and authentication.

Encryption means the client can request that the TCP connection to the server be encrypted. This means all messages sent between client and server will be encrypted before breaking it into packets. If hackers intercept these packets, they would not be able to reconstruct the original message.

Authentication means the client can trust that the server is who it claims to be. This protects against man-in-the-middle attacks, which is when a malicious party intercepts the connection between client and server to eavesdrop and tamper with their communication.

We see SSL in action whenever we visit SSL-enabled websites on modern browsers. When the browser requests a web site using the https protocol instead of http, it’s telling the web server it wants an SSL encrypted connection. If the web server supports SSL, a secure encrypted connection is made and we would see a lock icon next to the address bar on the browser.

The medium.com web server is SSL-enabled. The browser can connect to it over https to ensure that communication is encrypted. The browser is also confident it is communicating with a real medium.com server, and not a man-in-the-middle.

How does SSL authenticate the identity of a server and encrypt their communication?

It uses asymmetric encryption and SSL certificates.

Asymmetric encryption is an encryption scheme which uses a public key and a private key. These keys are basically just numbers derived from large primes. The private key is used to decrypt data and sign documents. The public key is used to encrypt data and verify signed documents. Unlike symmetric encryption, asymmetric encryption means the ability to encrypt does not automatically confer the ability to decrypt. It does this by using principles in a mathematical branch called number theory.

An SSL certificate is a digital document that consists of a public key assigned to a web server. These SSL certificates are issued to the server by certificate authorities. Operating systems, mobile devices, and browsers come with a database of some certificate authorities so it can verify SSL certificates.

When a client requests an SSL-encrypted connection with a server, the server sends back its SSL certificate. The client checks that the SSL certificate

  • is issued to this server
  • is signed by a trusted certificate authority
  • has not expired.

The client then uses the SSL certificate’s public key to encrypt a randomly generated temporary secret key and send it back to the server. Because the server has the corresponding private key, it can decrypt the client’s temporary secret key. Now both client and server know this temporary secret key, so they can both use it to symmetrically encrypt the messages they send to each other. They will discard this temporary secret key after their session is over.

What happens if a hacker intercepts an SSL-encrypted session?

Suppose a hacker intercepted every message sent between the client and the server. The hacker sees the SSL certificate the server sends as well as the client’s encrypted temporary secret key. But because the hacker doesn’t have the private key it can’t decrypt the temporarily secret key. And because it doesn’t have the temporary secret key, it can’t decrypt any of the messages between the client and server.

Summary

  • The Internet started as ARPANET in the 1960s with the goal of a decentralized computer network.
  • Physically, the Internet is a collection of computers moving bits to each other over wires, cables, and radio signals.
  • Like many complex engineering projects, the Internet is broken up into various layers, each concerned with solving only a smaller problem. These layers connect to each other in well-defined interfaces.
  • There are many protocols that define how the Internet and its applications should work at the different layers: HTTP, IMAP, SSH, TCP, UDP, IP, etc. In this sense, the Internet is as much a collection of rules for how computers and programs should behave as it is a physical network of computers.
  • With the growth of the Internet, advent of WIFI, and e-commerce needs, SSL/TLS was developed to address security concerns.

Tags
internet

Date
February 9, 2024