A web server is a system that processes requests via HTTP protocol, you request a file from the server and it responds with the requested file, which might give you an idea that web servers are only used for the web, but actually they can also be found embedded in devices such as printers, routers, when you open your router configuration page this is a web server behind it, when you open the printer configuration page there is also a web server behind it serving your requests, so web servers are an important part today. In this post, we will talk about Linux web server and how to install it and configure it to serve you content to others.
Table of Contents
- 1 How Webserver Works
- 2 Linux Webserver Implementations
- 3 Understanding HTTP
- 4 Install Apache Webserver
- 5 Configuring Apache Webserver
- 6 Apache Process Ownership
- 7 suEXEC Support
- 8 Apache Authentication
- 9 Troubleshooting Apache Webserver
How Webserver Works
Your browser establishes a connection with the web server and requests a document on the server.
The server decodes the request and maps the virtual path to a real one, which should match an existing file on the server. The server sends the file to the browser with some information such as its MIME type, the length of the content and some other useful information.
Sometimes the requested file is a static page like HTML pages or dynamic pages like PHP, Java, Perl or any other server side language.
For example, when you type www.google.com, the browser queries the DNS about the IP address of the computer: www.google.com Once the browser gets the answer from the DNS, it establishes a TCP connection using port 80 and requests the default web page, and google server will send the page to the browser in HTML format.
I recommend you to review Linux DNS server post to know how DNS servers work.
Linux Webserver Implementations
There are many Linux web server available for you to use:
- Apache server
- Apache Tomcat
- Monkey HTTP Daemon (used especially for embedded system)
There are more Linux web servers, but this list is the most used web servers.
The most used web servers are Apache and Nginx.
Nginx doesn’t support the ability to execute scripts in the same way that Apache does, but it can be used in conjunction with application servers that can execute scripts.
In this post, we will use apache server for several reasons:
- It is stable.
- It is extremely flexible.
- It has proved to be secure.
We’ll walk through the process of installing and configuring the Apache server on Linux, but before we get into the steps, let’s review some of the basics of HTTP as well as some of the internals of Apache.
When a web client connects to a web server, the client contacts the server’s TCP port 80. Once connected, the web server says nothing; it’s up to the client to issue HTTP commands (also methods) for its requests to the server. Along with each command comes a request header that includes information about the client.
To view these request headers, in chrome open chrome devtools, then open network panel and visit google.com and check the request headers, you should see something like this:
As you can see the HTTP GET command which asks the server to fetch a file, the other information tells the server about the client, the kind of file formats the client will accept, and so forth.
Additional headers may be sent with the request header. For example, when a client uses a hyperlink to get to the site, a header entry showing the client’s originating site will also appear in the header.
When it receives a blank line, the server knows that the request header is complete. Once the request header is received, it responds with the requested content, prefixed by a server header.
The server header provides information about the server like the amount of data the client is about to receive, the type of data, and other information.
You can check the response headers from the browser network panel.
Install Apache Webserver
You can install apache server on Red Hat based distros using the following command:
$ dnf -y httpd
Or if you are using a Debian based distro you can install it like this:
$ apt-get -y install apache2
The Apache web server service is called httpd on Red Hat based distros like CentOS, while it is called apache2 in Debian based distros.
As the firewall probably will be filtering out the traffic, we’ll have to add a rule to permit http traffic port 80.
$ iptables -I INPUT 1 -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT
Or if you are using firewalld, you can use the following command:
$ firewall-cmd --add-port=80/tcp
You can make changes to firewall permanent as we discussed this in the Linux iptables firewall post.
To start your service and enable it on boot:
$ systemctl start httpd
$ systemctl enable httpd
You can check if your service is running or not, using the following command:
$ systemctl status httpd
Now open your browser and visit http://localhost or http://[::1]/ if you are using IP v6 and if your installation goes well you should see your HTML homepage.
Configuring Apache Webserver
You can start adding files to Apache right away in the /var/www/html directory for top-level pages.
Just remember to make sure that any files or directories placed in that directory are world-readable.
The Apache default web page is index.html, try to change the content of this page and see the results on your browser.
The configuration files for Apache are located in the /etc/httpd/conf/ directory.
On Debian based systems, the main configuration file for Apache is at /etc/apache2/apache2.conf file.
We can’t discuss every option for apache on a single post, but we will discuss the most important options.
You call them options or directives, it’s up to you.
This is used for specifying the configuration directory for the web server. On Red Hat based distros, the ServerRoot option is /etc/httpd/ directory. On Debian distros the ServerRoot option is /etc/apache2/.
This is the port on which the server listens for connection requests.
The default value for this option is 80 for no secure connections.
The Listen option can also be used to specify the particular IP addresses over which the web server accepts connections.
You can specify a different port other than 80 that is not in use by another service.
This is one of the techniques for running multiple web servers or sites on the same server.
When a site runs a web server on a non-standard port such as port 8080 will require the port number to be explicitly stated like this:
This option defines the hostname and port that the server uses to identify itself.
This option allows you to specify what you want Apache to return as the hostname of the web server to visitors.
This defines the directory on the web server from which HTML files will be served to requesting clients.
Which is /var/www/html in our file.
This sets a limit on the number of simultaneous requests that the web server will serve.
This is used for adding other modules into Apache’s running configuration.
Apache comes with many modules by default and automatically includes them in the default installation.
There are a lot of apache modules like those:
mod_cgid: Allows the execution of CGI scripts on the web server.
mod_ssl: Provides strong cryptography for the Apache web server via SSL and TLS protocols.
mod_userdir: Allows user content to be served from user specific directories
If you want to disable loading a specific module you can comment the Load module line that contains that module.
Or if you use Debian based distros like Ubuntu, you can use these commands:
$ a2enmod modulename
This command to enable the module
$ a2dismod modulename
This command to disable the module.
All these commands do is create symlink under /etc/apache2/mods-enabled directory with the file that contains the module you want to enable. All files under this directory are included in apache configuration by default, so any file will exist in this directory will include what’s inside it.
And if you use a2dismod the symlink will be removed.
If you enable or disable a module you have to reload or restart apache web server.
This option allows you to include other configuration files.
You can store all the configuration for different virtual domains in properly named files, and Apache will automatically know to include them at runtime.
This option defines the subdirectory within each user’s home directory, where can place your content that they want to make accessible via the web server. This directory is usually named public_html.
For example, if you have use adam wants to make web content available via web browser.
First, we make a public_html folder under his home directory.
Then set the permission for the public_html folder:
$ chmod 644 public_html
Now if you put an index.html file, it will be accessible via the browser like this:
The Alias option allows web content to be stored in any other location on the file system that is different from the location specified by the DocumentRoot directive.
Like you have files outside DocumentRoot and you want them to be available to the visitors.
Alias URL_Path Actual_Path
This option specifies the location where errors from the web server will be logged.
This option makes it possible for a single web server to host multiple websites.
It works by allowing the web server to provide different content based on the hostname, port number, or IP address that is being requested by the client.
For example, if you want to set up a virtual host for a host named www.example.com.
First, create a VirtualHost option in /etc/httpd/conf/httpd.conf file.
And specify the DocumentRoot and ServerName like this:
Keep in mind that the value of the ServerName option in the VirtualHost container must be resolvable via DNS.
These are the most used apache options.
There are two types of virtual hosts you can define in Apache web server:
- Name-based virtual hosts
- IP-based virtual hosts
The NameVirtualHost directive defines which addresses can be virtual hosts; the asterisk (*) means any name or address this server. You can write them like this:
DocumentRoot "/ home/user2/public_html/"
If you have more than one IP address or you want to use SSL certificate the website must be on a dedicated IP address. You can write IIP-based virtual hosts like this:
DocumentRoot "/ home/user2/public_html/"
Apache Process Ownership
We know from the Linux process management post that each process has an owner and that owner has specific rights on the system. And we know that every process inherits its permissions of its parent process.
Also, we know from that post that there is an exception in Linux for that rule.
The exception is that all Programs configured with the SetUID bit do not inherit rights from their parent process, but rather start with the rights specified by the owner of the file itself.
Like /bin/su program which is owned by root and has SetUID bit set. If the user adam runs this program, it does not inherit the permission from adam, but it acts as a root user running it.
The Apache web server must start with root permissions because it needs to bind itself to port 80 and only root can do this.
Once it does this, Apache can give up its rights and run as a non-root user.
Based on the Linux distro you use, the user could be one of the following:
nobody, www, apache, www-data, or daemon.
Apache can read only the files that the user has permissions to read.
As user nobody, the scripts and processes don’t have access to the same key files that root can access.
I delayed introducing two more options for apache till reaching that point.
This specifies the user ID with which the web server will answer requests.
The user should have enough privileges to access files and directories that supposed to be available to visitors.
This specifies the group name of the Apache web server process.
Security is very important for sites that use executable scripts such as CGI or PHP scripts.
The use that you will use will have permission to read and write the content of all sites on the server. But we want to ensure that only the members of a particular site can read their own site only.
This is very important because if a site of them got compromised or evil user from one of the sites, the user will be able to read all files since the apache user has permission to do that.
So how to solve this problem?
A popular method is to use suEXEC. suEXEC is a program that runs with root permissions and makes CGI programs run as the user and group IDs of a specific user, not the user and group running the Apache server.
You can specify the user on each virtual host like this:
SuExecUserGroup adam adamGroup
Just that simple.
You may want to restrict some parts to specific visitors. It’s like a password protected directory.
In Apache, you can store authentication information file called .htpasswd file.
You can use the htpasswd command to do that.
First, create the .htpasswd file using the htpasswd command:
$ htpasswd -c /home/adam/.htpassswd myuser
The -c option is needed the first time you run htpasswd, but when you need to add more users you shouldn’t use -c because it will overwrite the file.
Then create a .htaccess file in the public_html folder and write the following:
AuthName is required, you can use any string you want.
AuthType Basic says that you’re using htpasswd style user file.
AuthUserFile points to the file that contains the generated password from htpasswd command.
The Order line indicates that Apache must deny access by default, and only allow access when specified in the user htpasswd file.
The require directive means any user in the .htpasswd file is allowed.
Troubleshooting Apache Webserver
If you modify the httpd.conf file and restart or reload apache web server and it did not work, then you have typed the wrong configuration, however, this is not the only case that you need to troubleshoot apache, you may look at the apache logs to see how the service works so you can diagnose the problem and solve it.
The two main log files for apache are error_log and access_log files.
You can find those files in /var/log/httpd/ directory in Red Hat based distros, or in /var/log/apache2/ directory if you are using Debian based distros.
The access_log file contains every request to apache web server with the details about client requested that resource.
The error_log file contains all of the errors that occur in Apache.
You can use tail command to watch the log file:
$ tail -f /var/log/httpd/error_log
I recommend you to review the Linux syslog server post to know more about logging.
On the upcoming posts, we will talk about apache security tricks and tweaking.
I hope you find working with Linux web server easy and interesting.