Citation preview
version 2.6
by Marliza Ramly Zurina Saaya Wahidah Md Shah Mohammad Radzi Motsidi Haniza Nahar Faculty of Information and Communication Technology Universiti Teknikal Malaysia Melaka (UTeM) May 2007
Copyright © 2007 Fakulti Teknologi Maklumat dan Komunikasi, UTeM
TABLE OF CONTENT 1.
PROXY SERVERS......................................................................................... 1 1.2
2.
INTERNET CACHING ................................................................................... 4 2.1 2.2 2.3 2.4
3.
3.4 3.5
3.6
4.2
ACCESS CONTROLS ........................................................................ 25 List of ACL type......................................................................... 26 src .......................................................................................... 27 srcdomain ................................................................................ 28 dst .......................................................................................... 29 dstdomain ................................................................................ 29 srcdom_regex........................................................................... 30 dstdom_regex........................................................................... 30 time ........................................................................................ 31 url_regex ................................................................................. 32 urlpath_regex ........................................................................... 33 port......................................................................................... 34 proto ....................................................................................... 35 method .................................................................................... 36 browser ................................................................................... 36 proxy_auth............................................................................... 37 maxconn.................................................................................. 38 Create custom error page ........................................................... 39 EXERCISES ................................................................................. 40
CACHING.................................................................................................... 42 5.1 5.2
6.
HARDWARE AND SOFTWARE REQUIREMENT ............................................ 10 DIRECTORY STRUCTURE .................................................................. 11 GETTING AND INSTALLING SQUID ........................................................ 11 Custom Configuration for Network ............................................... 11 INSTALL SQUID............................................................................. 16 BASIC SQUID CONFIGURATION .......................................................... 17 Configure SQUID ....................................................................... 17 Basic Configuration.................................................................... 17 Starting Squid Daemon ............................................................. 19 Starting Squid Daemon .............................................................. 20 BASIC CLIENT SOFTWARE CONFIGURATION ............................................ 22 Configuring Internet Browser ...................................................... 22 Using proxy.pac File................................................................... 23
ACL CONFIGURATION............................................................................... 25 4.1
5.
HIERARCHICAL CACHING ...................................................................4 TERMINOLOGY FOR HIERARCHICAL CACHING .............................................5 INTERNET CACHE PROTOCOL ...............................................................7 BASIC NEIGHBOUR SELECTION PROCESS .................................................7
INTRODUCTION TO SQUID ......................................................................... 9 3.1 3.2 3.3
4.
KEY FEATURES OF PROXY SERVERS ........................................................2 Proxy Servers and Caching ...........................................................2
CONCEPTS .................................................................................. 42 CONFIGURING A CACHE FOR PROXY SERVER ............................................ 42
SQUID AND WEBMIN ................................................................................. 47 i
6.1 ABOUT WEBMIN ................................................................................ 47 6.2 OBTAINING AND INSTALLING WEBMIN .................................................. 47 Installing from a tar. gz.............................................................. 48 Installing from an RPM ............................................................... 48 After Installation ....................................................................... 49 6.3 USING SQUID IN WEBMIN ................................................................ 49 6.4 PORTS AND NETWORKING ................................................................ 50 Proxy port ................................................................................ 51 ICP port ................................................................................... 51 Incoming TCP address................................................................ 51 Outgoing TCP address ................................................................ 52 Incoming UDP address ............................................................... 52 Outgoing UDP address ............................................................... 52 Multicast groups........................................................................ 52 TCP receive buffer ..................................................................... 53 6.5 OTHER CACHES ............................................................................ 53 Internet Cache Protocol.............................................................. 53 Parent and Sibling Relationships .................................................. 54 When to Use ICP?...................................................................... 54 6.6 OTHER PROXY CACHE SERVERS ......................................................... 55 Edit Cache Host ........................................................................ 56 Hostname ................................................................................ 56 Type........................................................................................ 57 Proxy port ................................................................................ 57 ICP port ................................................................................... 57 Proxy only? .............................................................................. 58 Send ICP queries? ..................................................................... 58 Default cache ........................................................................... 58 Round-robin cache? ................................................................... 58 ICP time-to-live ........................................................................ 59 Cache weighting........................................................................ 59 Closest only.............................................................................. 59 No digest?................................................................................ 59 No delay?................................................................................. 60 Login to proxy .......................................................................... 60 Multicast responder ................................................................... 60 Query host for domains, Don’t query for domains .......................... 60 Cache Selection Options ............................................................. 61 Directly fetch URLs containing ..................................................... 61 ICP query timeout ..................................................................... 62 Multicast ICP timeout................................................................. 62 Dead peer timeout .................................................................... 62 Memory Usage.......................................................................... 63 Memory usage limit ................................................................... 63 Maximum cached object size....................................................... 64 6.7 LOGGING ................................................................................... 64 Cache metadata file................................................................... 65 Use HTTPD log format ................................................................ 65 Log full hostnames .................................................................... 66 Logging netmask....................................................................... 66 6.8 CACHE OPTIONS ........................................................................... 67 6.9 ACCESS CONTROL ......................................................................... 68 Access Control Lists ................................................................... 69 Edit an ACL .............................................................................. 69 Creating new ACL...................................................................... 70 Available ACL Types................................................................... 71 6.10 ADMINISTRATIVE OPTIONS ............................................................... 75 7. ii
ANALYZER ................................................................................................ 78
7.1
7.2 7.3 7.4 7.5
STRUCTURE OF LOG FILE .................................................................. 78 Access log ................................................................................ 78 Cache log ................................................................................. 90 Store log.................................................................................. 93 METHODS ................................................................................... 96 Log Analysis Using Grep Command .............................................. 96 Log Analysis Using Sarg-2.2.3.1 .................................................. 96 SETUP SARG-2.2.3.1 .................................................................... 97 REPORT MANAGEMENT USING WEBMIN ................................................. 98 LOG ANALYSIS AND STATISTIC ........................................................ 105
iii
ABBREVIATIONS Abbreviation
Details
ACL
Access Control List
CARP
Cache Array Routing Protocol
CD
Compact Disk
DNS
Domain Name Service
FTP
File Transfer Protocol
GB
Gigabyte
HTCP
Hyper Text Caching Protocol
HTTP
Hypertext Transfer Protocol
I/O
Input/Output
ICP
Internet Cache Protocol
IP
Internet Protocol
LAN
Local Area Network
MAC
Media Access Control
MB
Megabyte
RAM
Random Access Memory
RPM
Red Hat Package Manager
RTT
Round Trip Time
SNMP
Simple Network Management Protocol
SSL
Secure Socket Layer
UDP
User Datagram Protocol
URL
Uniform Resource Locator
UTeM
Universiti Teknikal Malaysia Melaka
WCCP
Web Cache Coordination Protocol
iv
1
Chapter
1. Proxy Servers A Proxy Server is an intermediary server between the Internet browser and the remote server. It acts like a "middleman" between the two ends of the client/server network connection and also works with browsers and servers or other application by supporting underlying network protocols like HTTP. Furthermore, it store and download documents in its local cache so that the downloading time from the internet can be faster because the document is store in a local server. For example, lets imagine when a user want to download documents from the Internet browser with a specify URL address such as http://www.yahoo.com, which then the document will be transfer to workstation. (e.g UTeM to local workstation). In that situation, the internet browser communicates directly with the proxy server UTem to get the document. In addition, a cache is combined with a proxy server which will make it reliable for quicker transfer. In this matter, Internet browser will no longer contact the remote server directly but it request document from the proxy server.
1
Proxy Servers
1.2 Key features of proxy servers Four main functions provided are:
Firewalling and Filtering (security)
Connection Sharing
Administrative Control
Caching service
Proxy Servers and Caching Proxy Server with the caching of Web pages may leads to a better improvement for QoS in network as in Figure 1-1. It can be specified in three ways:
Caching may preserve bandwidth on the network and proliferate scalability
Enhancement of response time (e.g: HTTP proxy cache can load Web Pages more quickly into the browser)
Proxy server caches boost to the availability, where Web pages or other files in the cache remain accessible even if the original source or an intermediate network link goes offline.
2
Proxy Servers
client
client
Proxy Server
Internet
client
client
client
Figure 1-1: Generic Diagram for Proxy Server
3
2
Chapter
2. Internet Caching 2.1 Hierarchical Caching Cache Hierarchies are a logical extension of the caching concept. A sharing concept might help and give some benefit for a group of Web caches and a group of Web Clients. Figure 2-1 shows how it works. However, there are some disadvantages as well. It will depends on the specific situation discuss below whether the advantages will outweigh the disadvantages.
4
3
Proxy server caches returned page
5
client
Proxy server returns the requested page to the client
Proxy Server
Web server returns requested URL to proxy server
Yes
internet
1
Client browser initiates request to proxy server for the URL
Is requested page in proxy server cache?
2
No
Proxy server requests the page from the web server
Figure 2-1: Proxy Server Caching Process
4
Web server
Internet Caching
The major advantages are:
Additional cache hits. In general, the cache hits that are expected from the requested user will be at the neighbor caches.
Request routing. The availability to direct the HTTP traffic along a certain path can be done by routing requests to specific caches. (e.g., accessing the Internet with two paths, one of it is cheap and the other is being expensive, therefore, the user can send HTTP traffic over the cheapest link using the request routing.
The disadvantages among the concept:
Configuration hassles. The coordination from both parties are required to configure neighbors caches. As a result, it will put some weight to the exacerbates membership
Additional delay for cache misses. There are many factors to consider due to the delay. For example, delays between peers, link congestion, and whether or not ICP is used.
2.2 Terminology for Hierarchical Caching Cache It is refers to an HTTP proxy that store some requests. Objects It is a generic term for any document, image, or other type of data that available on the Web. Nowadays, the Uniform Resource Locators (URLs) will identify Web Page with objects (such as images, audio, video and binary files) rather than documents or pages only from the data available at HTTP, FTP, Gopher and other types of servers.
5
Internet Caching
Hit and misses It is a valid copy when a cache hit the requested existing object in a cache. If the object does not exist or no longer valid, it is refer to cache miss. That situation, a cache must forward cache misses toward the origin server. Origin Server It is the authoritative source for an object. For example, the origin server is the hostname in URL. Hierarchy vs. Mesh It is hierarchically arrange when the topology is like a tree structure or in mesh when the structure is flat. In either case these terms simply refer to the fact that caches can be ''connected'' to each other. In squid it can be seen at directory cache after creating it. Neighbours, Peers, Parents, Siblings In general, the terms neighbour and peer are the same for caches in a hierarchy or mesh. While, for parent and sibling will refer to the relationship between a pair of caches. Fresh, Stale, Refresh The status of cached objects can be refer to
A fresh object when a cache hit is returnable.
A stale object and refresh object when the Squid refresh it by including an IMS request header and forwarding the request on toward the origin server.
6
Internet Caching
2.3 Internet Cache Protocol A quick and efficient method of inter-cache communication in ICP's is by offering a mechanism to establish a complex cache hierarchies. The advantages by using are;
ICP can be utilized by Squid to provide an indication of network conditions.
ICP messages are transmitted as UDP packets. It is easier to implement because each cache needs to maintain only a single UDP socket.
ICP may convey to some disadvantages as well. One of the failures in ICP is when the links is highly congested, therefore the ICP become useless where its caching is needed most. Furthermore, an extra delay may be a factor in processing request due to the transmission time of the UDP packet. As a result, ICP is not the appropriate for this delay in some situation.
2.4 Basic Neighbour Selection Process Before describing Squid features for hierarchical caching, first lets briefly explain the neighbor selection process referring to. Squid must decide where to forward the request when it is unable to satisfy the request from cache. There are basically three choices can be use:
parent cache
sibling cache
origin server
7
Internet Caching
How ICP can make decision for Squid?
In parent and sibling cache, Squid will send an ICP query requested URL message to its neighbors. Usually in a UDP packets and Squid will remembers how many queries it sends for a given request.
By receiving ICP query in each neighboring, the URL will be search in its own cache. If a valid copy of the URL exists, then cache sends ICP_HIT, otherwise an ICP_MISS message.
The querying cache now collects the ICP replies from its peers.
If the cache receives an ICP_HIT reply from a peer, it immediately forwards the HTTP request to that peer.
If the cache does not receive an ICP_HIT reply, then all replies will be ICP_MISS.
Squid waits until it receives all replies, up to two seconds.
If one of the ICP_MISS replies comes from a parent, Squid forwards the request to the parent whose reply was the first to arrive. We call this reply the FIRST_PARENT_MISS. If there is no ICP_MISS from a parent cache, Squid forwards the request to the origin server.
We have described the basic algorithm, to which Squid offers numerous possible modifications, including mechanisms to:
Send ICP queries to some neighbours and not to others.
Include the origin server in the ICP “pinging” so that if the origin server reply arrives before any ICP_HITs, the request is forwarded there directly.
8
Disallow or require the use of some peers for certain requests.
3
Chapter
3. Introduction to Squid Squid is a high-performance proxy caching server for Web clients, support FTP, gopher, and HTTP data objects. It has two basic purposes;
to provide proxy service from machines that must pass Internet traffic through some form of masquerading firewall
caching
Unlike traditional caching software, Squid handles all requests in a single, non-blocking, I/O-driven process. Squid keeps meta data and especially hot objects cached in RAM, caches
DNS
lookups,
supports
non-blocking
DNS
lookups,
and
implements negative caching of failed requests. Squid consists of a main server program, a Domain Name System lookup program (dnsserver), a program for retrieving FTP data (ftpget) and some management and client tools. In other words Squid is 1. full featured Web proxy cache 2. free, open-source software 3. the result of many contributions by unpaid (and paid) volunteers
9
Introduction to Squid
Squid Support
proxy and caching of Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), and other
Uniform Resource Locators (URLs)
Proxiying for Secure Socket Layer (SSL)
cache hierarchies
Internet Cache Protocol (ICP), Hyper Text Caching Protocol (HTCP), Cache Array Routing Protocol (CARP), Cache Digests
transparent caching
Web Cache Coordination Protocol (WCCP) (Squid v2.3 and above)
extensive access controls
HTTP server acceleration
Simple Network Management Protocol (SNMP)
caching of DNS lookups
3.1 Hardware and Software Requirement
RAM Minimum RAM recommended = 128mb (scales by user count and size of disk cache)
Disk Small user count = 512MB to 1G Large user count = 16G to 24G
Most version on UNIX Also work on AIX, Digital UNIX, FreeBSD, Hp-UX, IRIX, LINUX, NetBSD, NextStep,SCO, Solaris and SunOS
10
Introduction to Squid
3.2 Directory Structure Squid normally creates a few directories shown in Table 3-1
Directories
Explaination
/var/cache
Stored the actual data
/etc/squid
Contains the squid.conf file which it is only squid config file
/var/log
Query each connection (example if the directory getting larger) Table 3-1: Squid Directory
3.3 Getting and installing squid Custom Configuration for Network There are three configurations for proxy server in the network. The configuration file will follow to the requirement for the usage in your network. They are transparency proxy, reverse proxy and web cache proxy.
11
Introduction to Squid
Configuring squid for transparency
Internet
Transparent Proxy Server
10.1.1.1
10.1.1.1 80
client client client
client
client LAN
client
client LAN
Figure 3-1: Transparent Proxy
A Transparent proxy (Figure 3-1) is configured when you want to grab a certain type of traffic at your gateway or router and send it through a proxy without the knowledge of the user or client. In other words, router will forward all traffic to port 80 to proxy machine using a route policy. By using squid as transparent proxy, it will involve two part of process: 1. squid need to be configured properly to accept non-proxy requests 2. web traffic gets redirected to the squid port
12
Introduction to Squid
This type of transparency proxy is suitable for
Intercept the network traffic transparently to the browser
Simplified administration- the browser does not need to be configured to talk to a cache
Central control – the user cannot change the browser to bypass the cache
The disadvantages of using this type of proxy are
Browser dependency – transparent proxy does not work very well with certain web-browsers
User control Transparent – caching takes control away from the user where the user will change ISPs to either avoid it or get it
Configuring squid for reverse proxy
Internet
client
Reverse Proxy Server
Web Server Cluster
Figure 3-2: Reverse Proxy
13
Introduction to Squid
A Reverse Proxy (also known as Web Server Acceleration) (Figure 3-2) is a method of reducing the load on a busy web server by using a web cache between the server and the internet. In this case, when a client browser makes a request, the DNS will route the request to the reverse proxy (this is not the actual web server). Then the reverse proxy will check its cache to find out whether the request contains is available to fulfill the client request. If not, it will contact the real web server and downloads the requested contains to its disk cache. Benefits that can be gained are 1. security improvement 2. scalability improvement without increasing the complexity of maintenance too much. 3. easy burden on a web server that provides both static and dynamic content. The static content can be cached on the reverse proxy while the web server will be freed up to better handle the dynamic content. To run Squid as an accelerator, you probably want to listen on port 80. Hence, you have to define the machine you are accelerating for. (not covered in this chapter).
14
Introduction to Squid
Configuring squid for Web Cache proxy Internet
Router
Web Cache Proxy Server
Router
client
client
client
client
client
Figure 3-3 Web Cache Proxy
By default, squid is configured as a direct proxy (Figure 3-3). In order to cache web traffic with squid, the browser must be configured to use the squid proxy. This needs the following information
proxy server's IP address
port number by which the proxy server accepts connections
15
Introduction to Squid
3.4 Install squid The Squid proxy caching server software package comes with Fedora Core V6. Therefore, we do not have to install it. Just manage the configuration file to make it work. If no Squid installed in your server you can install it from Squid RPM file. To do so, you need to download the RPM file from the Internet or copy it from installation CD. Then run this command # rpm –i squid-2.6.STABLE4-1.fc6.i386.rpm NOTE: The RPM file name may be differ depends on the version of Squid you have downloaded
Alternatively, you can install it from Squid installation script where it can be downloaded from official Squid Proxy server web site, http://www.squid-cache.org.
To
do
so,
you
need
to
copy
the
installation folder into your local drive and run the following command. # ./configure # make # make install
NOTE: Make sure all the dependency files are already installed in your machine before starting to install Squid
16
Introduction to Squid
3.5 Basic Squid Configuration Configure SQUID All Squid configuration files are kept in the directory /etc/squid.
The following paragraph of this chapter will works through the options that may need some further changes to get Squid to run. Most people will not need to change all of these settings. What usually needs to change is at least one part of the configuration file though: the default file in squid.conf, which denies the access to the browser. If you don't change this, Squid will not be very useful.
Basic Configuration All of squid configuration goes in one file - squid.conf. This section details up the configuration of Squid as a caching proxy only, not as http-accelerator. Some basic configuration need to be implemented. First, uncomment and edit the following lines in the configuration file found at default file /etc/squid/squid.conf To construct the squid server, do the following tasks 1. log in as root to the machine 2. type the following command # vi /etc/squid/squid.conf The above command will open Squid configuration file for editing
17
Introduction to Squid
Then, set the port on which Squid listens. Normally, Squid will listen on port 3128. While it may convenient to listen on this port, network administrators often configure the proxy to listen on port 8080 as well. This is a non-well-known port, while (port 1024 are well-known ports and are restricted from being used ordinary users processes), and is therefore not going to be in conflict with other ports such as 80, 443, 22, 23, etc. Squid need not be restricted to one port. It could easily be started in two or more ports. At squid.conf file, find out the following sentence for some changes or leave it as default if its port is 3128. http_port Check http_port 3128 (is a default.) or http_port 8080 3128 (for multiple port) .
18
Introduction to Squid
Additionally, if you have multiple networks cards in your proxy server, and would like to restrict the proxy to start on port 8080 on the first network card and port 3128 on the second network card. You can add the following sentence. http_port
10.1.5.49:8080
10.0.5.50:3128
http_access By default http_access is denied. The Access Control Lists (ACL) rules should be modified to allow access only to the trusted clients. This is important because it prevents people from stealing your network resources. ACL will be discussed in Chapter 4. cache_dir This directive specifies the cache directory storage format and its size as given below. cache_dir ufs /var/spool/squid 100 16 256 The value 100 denotes 100MB cache size. This can be adjusted to the required size. (cache will be discuss later in Chapter 5) cache_effective_user cache_effective_ group
NOTE: You can edit the squid.conf file by using gedit instead of command line
19
Introduction to Squid
Starting Squid Daemon In this chapter, we will learn how to start Squid. Make sure you have finished editing the configuration file. Then you can start Squid for the first time. First, you have to check the error in conf file. Type this command at your terminal
# squid -k parse If error detected, for example # squid –k parse FATAL: could not determine fully qualified hostname, Please set ‘visible hostname’ Squid Cache (versio 2.6.STABLE4):Terminated abnormally. CPU Usage:0.0004 seconds=0.0004 user+0.000 sys Maximum Resident Size:0KB Page faults with physical i/o:0 Aborted. Solution : Add the following sentence in squid.conf file visible_hostname localhost If no error detected, continue with the following command to start squid. (This is temporarily step to start the squid)
# service squid start
If everything is working fine, then your console displays: Starting squid: . If you want to stop the service, # service squid stop Then your console will display:
20
[OK]
Introduction to Squid
Stopping squid: .
[OK]
You should be a privileged user to start or stop squid. For permanent step, try this command # chkconfig –list # chkconfig –-level 5 squid on You can restart the squid service by typing #/etc/init.d/squid restart While the daemon is running, there are several ways you can run the squid command to change how the daemon works by using this options: # squid –k reconfigure - causes Squid to read again its configuration file
#squid –k shutdown - causes Squid to exit after waiting briefly for current connections to exit #squid –k interrupt - shuts down Squid immediately, without waiting for connections to close
#squid –k kill – kills Squid immediately, without closing connections or log files. (use this option only if other methods don’t work)
21
Introduction to Squid
3.6 Basic Client Software Configuration Basic Configuration To configure any browser, you need at least two pieces of information:
Proxy server's IP Address
Port number that the proxy server is accepting the requests
Configuring Internet Browser
The following section will explain the steps to configure proxy server in Internet Explorer, Mozilla Firefox and Opera. Internet Explorer 7.0 1. Select the Tools menu option 2. Select Internet Options 3. Click on the Connection tab 4. Select LAN settings 5. The Internet using a proxy server 6. Check the box in proxy server Æ Type in the proxy IP address in the Address field, and the port number in the Port field. Example:
Address : 10.0.5.10 Port : 3128
Mozilla Firefox 1. Click Tools Æ Options Æ Advanced 2. Click at Network Æ go to connection Æ Settings 3. At the configure proxies to Access Internet
22
Introduction to Squid
4. Choose manual proxy configuration 5. At HTTP Proxy: 10.0.5.10
Port: 3128
6. Check the box to use the proxy server for all protocols 7. Then click OK 8. Now, the client can access the internet. Opera 9.1 1. Click Tools Æ Preferences Æ Advanced 2. Choose Network 3. Click at Proxy Sever Check
HTTP
: 10.0.5.10
Port :3128
HTTPs
: 10.0.5.10
Port :3128
FTP
: 10.0.5.10
Port :3128
Gropher
: 10.0.5.10
Port :3128
4. Then, Click OK
Using proxy.pac File This setting is for the clients when they want to have browsers pick up proxy setting automatically. The browser can be configured with a simple proxy.pac file as shown in the example below;
function FindProxyForURL(url, host) { if (isInNet(myIpAddress(), "10.0.5.0", "255.255.255.0")) return "PROXY 10.0.5.10:3128"; else return "DIRECT"; }
23
Introduction to Squid
proxy.pac needs to be installed in a web server such as Apache, and the client can configure proxy server using the automatic configuration script. This script is useful when there is possibility that the proxy server will change its IP address. To access the script, client needs to add the URL of proxy.pac in its automatic configuration proxy script (Figure 3-4).
Figure 3-4: Using automatic configuration script
24
4
Chapter
4. ACL Configuration 4.1 Access controls Access control lists (ACL) are the most important part in configuring Squid. The main use of the ACL is to implement simple access control where it is used to restrict other people from using cache infrastructure without certain permission. Rules can be written for almost any type of requirement. It can be very complex for large organisations or just a simple configuration to home users. ACL is written in squid.conf file using the following formats acl name type (string|"filename") [string2] ["filename2"] name is a variable defined by user and it should be descriptive while type is defined accordingly and it will be described in the next section .
25
ACL Configuration
There are two elements in access control: classes and operators. Classes are defined by the acl, while the name of the operators varies. The most common operators are http_access and icp_access. The actions for this operator are allow and deny. allow is used to allow or enable the ACL while deny used to deny or restrict the ACL General format for operator http_access
allow|deny
[!]aclname [!]aclname2 ... ]
List of ACL type ACL Type
Details
src
client IP address
srcdomain
client domain name
dst
destination’s IP address
dstdomain
destination’s domain name
srcdom_regex
Regular expression describing client domain name
dstdom_regex
Regular expression describing destination domain name
time
specify the time
url_regex
Regular
expression
describing
whole
URL
of
URL
of
destination (web server) urlpath_regex
Regular
expression
describing
path
of
destination (not include its domain name) port
Specify port number
proto
Specify protocol
method
Specify method
browser
Specify browser
proxy_auth
User authentication via external processes
maxconn
Specify number of connection
26
ACL Configuration
src Description This ACL allows server to recognize client (the computer which will use server as proxy to get access to the internet ) using its IP address. The IP address can be listed using single IP address, range of IP or using defined IP address in an external file. Syntax acl
aclname
src
ip-address/netmask .. (clients IP address)
acl
aclname
src
addr1-addr2/netmask .. (range of addresses)
acl
aclname
src
“filename” ..(client's IP address in external file)
Example 1 acl fullaccess src “/etc/squid/fullaccess.txt” http_access allow fullaccess This ACL is using external file named fullaccess.txt where fullaccess.txt consist of list of IP address of the client. Example of fullaccess.txt 198.123.56.12 198.123.56.13 198.123.56.34 Example 2 acl office.net src 192.123.56.0/255.255.255.0 http_access allow office.net This ACL set the source address for office.net in range 192.123.56.x to access the Internet using http_access allow operator
27
ACL Configuration
srcdomain Description This ACL allows server to recognize client using client’s computer name. To do so, squid needs to reverse DNS lookup (from client ipaddress to client domain-name) before this ACL is interpreted, it can cause processing delays. Syntax acl
aclname
srcdomain domain-name..(reverse lookup client IP)
Example 1 acl staff.net srcdomain staff20 staff21 http_access allow staff.net This ACL is for clients with computer name staff20 and staff21. The operator http_access is allowing the ACL named staff.net to access the Internet. This option is not really effective since the computer must do reverse name lookup to determine the source name.
NOTE: Please ensure the DNS server in running in order to use DNS lookup service
28
ACL Configuration
dst Description This is same as src, the difference is only it refers to Server’s IP address (destination). First, Squid will dns-lookup for IP Address from the domain-name, which is in request header, and then interpret it Syntax acl
aclname
dst
ip_address/netmask .. (URL host's or the site
IP address) Example 1 acl tunnel dst 209.8.233.0/24 http_access deny tunnel This ACL deny any node with IP 209.8.233.x Example 2 acl allow_ip dst 209.8.233.0-209.8.233.100/255.255.0.0 http_access allow allow_ip This ACL is allowing destination with IP address range from 209.8.233.0 to 209.8.233.100.
dstdomain Description This ACL recognize destination using its domain. This is the effective method to control specific domain Syntax acl
aclname
dstdomain
domain.com
(domain name from the site's URL)
29
ACL Configuration
Example 1 acl banned_domain dstdomain www.terrorist.com http_access deny banned_domain This ACL deny destionation with domain www.terrorist.com
srcdom_regex Description This ACL is almost similar to srcdomain where the server needs to reverse DNS lookup (from client ip-address to client domain-name) before this ACL is interpreted. The difference is this ACL allow the usage of regular expression in defining the client’s domain. Syntax acl
aclname
srcdom_regex -i
source_domain_regex
Example 1 acl staff.net srcdom_regex -i staff http_access allow staff.net This ACL allows all the node with the domain contains word staff to access the internet. Option -i is used to make expression caseinsensitive
dstdom_regex Description This ACL allows server to recognize destination using its domain regular expression. Syntax acl
30
aclname
dstdom_regex -i
dst_domain_regex
ACL Configuration
Example 1 acl banned_domain dstdom_regex -i terror porn http_access deny banned_domain This ACL denies client to access the destinations that contain word terrorist or porn in its domain name. For example the access to the domain www.terrorist.com and www.pornoragphy.net will be denied by proxy server.
time Description This ACL allows server to control the service using time function. The accessibility to the network can be set according the scheduled time in ACL Syntax acl
aclname
time
day abbrevs h1:m1h2:m2
where h1:m1 must be less than h2:m2 and day will be represented using abbreviation in Table 4-1
day
abbreviations
S
Sunday
M
Monday
T
Tuesday
W
Wednesday
H
Thursday
F
Friday
A
Saturday
Table 4-1 Abbreviation for Day
31
ACL Configuration
Example 1 acl SABTU time A 9:00-17:00 ACL SABTU refers to day of Saturday from 9:00 to 17:00 Example 2 acl pagi time 9:00-11:00 acl office.net 10.2.3.0/24 http_access deny pagi office.net pagi refers time from 9:00 to 11:00, while office.net refer to the clients' IP. This combination of ACLs deny the access for office.net if the time is between 9.00am to 11.00 am
url_regex Description The url_regex means to search the entire URL for the regular expression you specify. Note that these regular expressions are casesensitive. To make them case-insensitive, use the -i option Syntax acl
aclname
url_regex -i
url_regex ..
Example 1 acl banned_url url_regex -i terror porn http_access deny banned_url This ACL deny URL that contains word terrorist or porn. For example, the following destination will be denied by the proxy server; http://www.google.com/pornography http://www.news.com/terrorist.html http://www.terror.com/
32
ACL Configuration
urlpath_regex Description The urlpath_regex is regular expression pattern matching from URL but excluding protocol and hostname. If
URL
is
http://www.free.com/latest/games/tetris.exe
then
this
acltype only looks after http://www.free.com/. It will leave out the http protocol and www.free.com hostname. Syntax acl
aclname
urlpath_regex
pattern
Example 1 acl blocked_free urlpath_regex free http_access deny blocked_free This ACL will blocked any URL that only containing "free'' not "Free”, and without referring to protocol and hostname. These regular expressions are case-sensitive. To make them caseinsensitive, add the –i option. Example 2 acl blocked_games urlpath_regex –i games http_access deny blocked_games blocked_games refers to the URL containing word “games” no matter if the spelling in upper or lower case. Example 3 To block several URL. acl block_site urlpath_regex –i “/etc/squid/acl/block_site” http_access deny block_site
33
ACL Configuration
To block several URL, it is recommended to put the lists in one file. As in Example 3, all block_site list is in /etc/squid/acl/block_site file. File block_site may containing, for example \.exe$ \.mp3$
port Description Access can be controlled by destination (server) port address Syntax acl
aclname
port port-number
Example 1 Deny requests to unknown ports acl Safe_ports port 80 acl Safe_ports port 21 acl Safe_ports port 443 563
# http # ftp # https, snews
http_access deny !Safe_ports Example 2 Deny to several untrusted ports acl safeport port “/etc/squid/acl/safeport” http_access deny safeport
34
ACL Configuration
proto Description This specifies the transfer protocol Syntax acl
aclname
proto
protocol
Example 1 acl protocol proto HTTP FTP This refers protocols HTTP and FTP Example 2 acl manager proto cache_object http_access allow manager localhost http_access deny manager Only allow cachemgr access from localhost. Example 3 acl ftp proto FTP http_access deny ftp http_access allow all This command should block every ftp request
35
ACL Configuration
method Description This specifies the type of the method of the request Syntax acl
aclname
method
method-type
Example 1 acl connect method CONNECT http_access allow localhost http_access allow allowed_clients http_access deny connect the CONNECT method to prevent outside people from trying to connect to the proxy server
browser Description Regular expression pattern matching on the request's user-agent header. To grep the user-agent header information, squid.conf should be added this line: useragent_log /var/log/squid/useragent.log Then, try to run the Mozilla browser. The user-agent header for Mozilla should be as in the example. Syntax acl
aclname
browser
pattern
Example 1 acl mozilla browser ^Mozilla/5\.0 http_access deny mozilla This command will deny Mozilla browsers or any other browser related to it. 36
ACL Configuration
proxy_auth Description User authentication via external processes. proxy_auth requires an EXTERNAL
authentication
program
to
check
username/password
combinations. In this configuration, we use the NCSA authentication method because it is the easiest method to implement. Syntax acl
aclname
proxy_auth
username...
Example 1 To validate a listing of users, we should do the following steps. Creating passwd file # touch # chown # chmod
/etc/squid/passwd root.squid /etc/squid/passwd 640 /etc/squid/passwd
Adding users # htpasswd
/etc/squid/passwd shah
You will be prompted to enter a passwd for that user. In the example is the passwd for user shah. Setting rules auth_param basic program /usr/lib/squid/ncsa_auth /etc/squid/passwd auth_param basic children 5 auth_param basic realm Squid proxy-caching web-server auth_param basic credentialsttl 2 hours These listings are already in the configuration file but need to be adjusted to suit your environments.
37
ACL Configuration
Authentication configuration acl LOGIN proxy_auth REQUIRED http_access allow LOGIN This command will only allow user that have been authenticated during accessing network connection. CAUTION !! proxy_auth can't be used in a transparent proxy.
maxconn Description A limit on the maximum number of connections from a single client IP address. It is an ACL that will be true if the user has more than maxconn connections open. Syntax acl
aclname
maxconn
number_of_connection
Example 1 acl someuser src 10.0.5.0/24 acl 5conn maxconn 5 http_access deny someuser 5conn The command will restrict users in 10.0.5.0/24 subnet to have only five (5) maximum connections at once. If exceed, the error page will appear. Other users are not restricted to this command by adding the last line.
CAUTION !! The maxconn ACL requires the client_db feature. If client_db is disabled (for example with client_db off) then maxconn ALCs will not work.
38
ACL Configuration
Create custom error page # vi /etc/squid/error/ERROR_MESSAGE Append the following
ERROR : ACCESS DENIED FROM PROXY SERVER
The site is blocked due to IT policy
Please contact helpdesk for more information:
Phone: 06-2333333 (ext 33)
Email:
[email protected] CAUTION !! Do not include HTML close tags
Displaying custom error message acl blocked_port port 80 deny_info ERROR_MESSAGE block_port http_access deny block_port
39
ACL Configuration
4.2 Exercises 1.
Why the users still can do the download process with the
following configuration. acl download urlpath_regex -i \.exe$ acl office_hours time 09:00-17:00 acl GET method GET acl it_user1 src 192.168.1.88 acl it_user2 src 192.168.1.89 acl nodownload1 src 192.168.1.10 acl nodownload2 src 192.168.1.11 http_access http_access http_access http_access
allow allow allow allow
it_user1 it_user2 nodownload1 nodownload2
http_access deny GET office_hours nodownload1 nodownload2 http_access deny all The configuration should deny the nodownload1 and nodownload2. the allow lines should be deleted.
40
ACL Configuration
2.
Why this configuration still bypasses the game.free.com?
acl ban dstdomain free.com http_access deny ban
3.
The following access control configuration will never work. Why?
acl ME src 10.0.0.1 acl YOU src 10.0.0.2 http_access allow ME YOU
41
Caching
5
Chapter
5. Caching 5.1 Concepts
Caching (a.k.a proxy server) is the process of storing data on the intermediate system between the Web server and the client.
The proxy server can simply send the content requested by the client form it copy in cache.
The assumption is that later requests for the same data can be serviced more quickly by not having to go all the way back to the original server.
Caching also can reduce demands on network resources and on the information servers.
5.2 Configuring a cache for proxy server There are a lot of parameters related to caching in Squid and these parameters can be divided into three main groups as below: A. Cache size B. Cache directories and log file path name C. Peer cache servers and Squid hierarchy
42
Caching
However, in the following subsection, only the first two groups will be covered. A. Cache Size The following are the common parameters used in cache size. i. cache_mem Syntax cache_mem size(MB) This parameter specifies the amount of cache memory (RAM) used to store in-transit object (ones that are currently being used), hot objects (one that are used often) and negative-cached object (recent failed request). Default size value is 8MB. Example: cache_mem 16 MB
ii. maximum_object_size Syntax maximum_object_size
size(MB)
This parameter used if you want not to cache file that are larger or equal to the size set. Default size value is 4MB. Example: maximum_object_size 8 MB
43
Caching
iii. ipcache_size Syntax ipcache_size
size(MB)
This parameter used to set how many IP address resolution values Squid stores. Default value size is 1MB. Example: ipcache_size 32MB iv. ipcache_high Syntax ipcache_high percentage This parameter specifies the percentage that causes Squid to start clearing out the least-used IP address resolution. Usually the default value is always used. Example: ipcache_high 95 v. ipcache_low Syntax ipcache_low
percentage
This parameter specifies the percentage that causes Squid to stop clearing out the least-used IP address resolution. Usually the default value is always used.
44
Caching
Example: ipcache_low 90 B. Cache Directories i. cache_dir Syntax cache_dir
type dir
size(MB)
L1
L2
This parameter specifies the directory/directories in which cache swap files are stored. The default dir is /var/spool/squid directory. We can specify how much disk space to use for cache in megabytes (100 is the default), the default number of firstlevel directories (L1) and second-level directories (L2) is 16 and 256 respectively. Example: cache_dir aufs /var/cache01 7000 16 256
NOTE: /var/cache01 is a partition that have been created during Linux Fedora installation
Formula to calculate the first-level directories (L1): Given : x=Size of cache dir in KB (e.g., 6GB = 6,000,000KB) y=Average object size (e.g, 13KB) z=Objects per L2 directories (Assuming 256) calculate: L1 = number of L1 directories L2 = number of L2 directories such that: L1 x L2 = x / y / z 45
Caching
Example : x = 6GB = 6 * 1024 *1024 = 6291456 KB so ; x / y / z = 6291456 / 13 / 256 = 1890 and L1 * L2 = x / y / z L1 * 256 = 1890 L1
= 7
ii. access_log Syntax cache_log dir This parameter specifies the location where the HTTP and ICP accesses are stored. The default dir /var/log/squid/access.log is always used. Example: cache_log /var/log/squid/access.log
46
6
Chapter
6. SQUID and Webmin 6.1 About Webmin Webmin is a graphical user interface for system administration for Unix. It is a web-based system and can be installed in most of the Unix system. Webmin is a free software and the installation package can be downloaded from the Net. Webmin is largely based on Perl, and it is running as its own process, and web server. It usually uses TCP port 10000 for communicating, and can be configured to use SSL if OpenSSL is installed.
6.2 Obtaining and Installing Webmin Webmin installation package is available at the official Webmin site http://www.webmin.com/download.html. You can download the latest package and locate it in the local machine.
47
SQUID and Webmin
Installation of Webmin differs slightly depending on which type of package you choose to install. Note that Webmin requires a relatively recent Perl for any of these installation methods to work. Nearly all, if not all, modern UNIX and UNIX-like OS variants now include Perl as a standard component of the OS, so this should not be an issue.
Installing from a tar. gz First you must untar and unzip the archive in the directory where you would like Webmin to be installed. The most common location for installation from tarballs is /usr/local. Some sites prefer /opt. If you’re using GNU tar, you can do this all on one command line: #tar zxvf webmin-1.340.tar.gz If you have a less capable version of tar, you must unzip the file first and then untar it: # gunzip webmin-1.340.tar.gz # tar xvf webmin-1.340.tar.gz Next, you need to change to the directory that was created when you untarred the archive, and execute the setup.sh script, as shown in the following example. The script will ask several questions about your system and your preferences for the installation. Generally, accepting the default values will work. The command for installation as below: # ./setup.sh
Installing from an RPM Installing from an RPM is even easier. You only need to run one command: # rpm -Uvh webmin-1.340-1.noarch.rpm
48
SQUID and Webmin
This will copy all of the Webmin files to the appropriate locations and run the install script with appropriate default values. For example, the Webmin perl files will be installed in /usr/libexec/webmin while the configuration files will end up in /etc/webmin. Webmin will then be started on port 10000. You may log in using root as the login name and your system root password as the password. It's unlikely you will need to change any of these items from the command line, because they can all be modified using Webmin. If you do need to make any changes, you can do so in miniserv.conf in /etc/webmin.
After Installation After
installation,
your
Webmin
installation
will
behave
nearly
identically, regardless of operating system vendor or version, location of installation, or method of installation. The only apparent differences between systems will be that some have more or fewer modules because some are specific to one OS. Others will feature slightly different versions of modules to take into account different functioning of the underlying system. For example, the package manager module may behave differently, or be missing from the available options entirely, depending on your OS.
6.3 Using Squid in Webmin
To launch Webmin, open a web browser, such as Netscape or Mozilla Firefox, on any machine that has network access to the server on which you wish to log in. Browse to port 10000 on the IP or host name of the server using http://computername:10000/. Go to menu Squid Proxy Server (in submenu Server) to open the main panel (Figure 6-1)
49
SQUID and Webmin
Figure 6-1: Squid Proxy Main Page
6.4 Ports and Networking The Ports and Networking page provides you with the ability to configure most of the network level options of Squid. Squid has a number of options to define what ports Squid operates on, what IP addresses it uses for client traffic and intercache traffic, and multicast options. Usually, on dedicated caching systems these options will not be useful. But in some cases you may need to adjust these to prevent the Squid daemon from interfering with other services on the system or on your network.
50
SQUID and Webmin
Proxy port Sets the network port on which Squid operates. This option is usually 3128 by default and can almost always be left on this address, except when multiple Squids are running on the same system, which is usually ill-advised. This option corresponds to the http_port option in squid.conf.
ICP port This is the port on which Squid listens for Internet Cache Protocol, or ICP, messages. ICP is a protocol used by web caches to communicate and share data. Using ICP it is possible for multiple web caches to share cached entries so that if any one local cache has an object, the distant origin server will not have to be queried for the object. Further, cache hierarchies can be constructed of multiple caches at multiple privately interconnected sites to provide improved hit rates and higherquality web response for all sites. More on this in later sections. This option correlates to the icp_port directive.
Incoming TCP address The address on which Squid opens an HTTP socket that listens for client connections and connections from other caches. By default Squid does not bind to any particular address and will answer on any address that is active on the system. This option is not usually used, but can provide some additional level of security, if you wish to disallow any outside network users from proxying through your web cache. This option correlates to the tcp_incoming_address directive.
51
SQUID and Webmin
Outgoing TCP address Defines the address on which Squid sends out packets via HTTP to clients and other caches. Again, this option is rarely used. It refers to the tcp_ outgoing_address directive.
Incoming UDP address Sets the address on which Squid will listen for ICP packets from other web caches. This option allows you to restrict which subnets will be allowed to connect to your cache on a multi-homed, or containing multiple
subnets,
Squid
host.
This
option
correlates
to
the
udp_incoming_address directive.
Outgoing UDP address The address on which Squid will send out ICP packets to other web caches. This option correlates to the udp_outgoing_address.
Multicast groups The multicast groups that Squid will join to receive multicast ICP requests. This option should be used with great care, as it is used to configure your Squid to listen for multicast ICP queries. Clearly if your server is not on the MBone, this option is useless. And even if it is, this may not be an ideal choice.
52
SQUID and Webmin
TCP receive buffer The size of the buffer used for TCP packets being received. By default Squid uses whatever the default buffer size for your operating system is. This should probably not be changed unless you know what you’re doing, and there is little to be gained by changing it in most cases. This correlates to the tcp_recv_bufsize directive.
6.5 Other Caches The Other Caches page provides an interface to one of Squid’s most interesting, but also widely misunderstood, features. Squid is the reference implementation of ICP, a simple but effective means for multiple caches to communicate with each other regarding the content that is available on each. This opens the door for many interesting possibilities when one is designing a caching infrastructure.
Internet Cache Protocol It is probably useful to discuss how ICP works and some common usages for ICP within Squid, in order to quickly make it clear what it is good for, and perhaps even more importantly, what it is not good for. The most popular uses for ICP are discussed, and more good ideas will probably arise in the future as the Internet becomes even more global in scope and the web-caching infrastructure must grow with it.
53
SQUID and Webmin
Parent and Sibling Relationships The ICP protocol specifies that a web cache can act as either a parent or a sibling. A parent cache is simply an ICP capable cache that will answer both hits and misses for child caches, while a sibling will only answer hits for other siblings. This subtle distinction means simply that a parent cache cans proxy for caches that have no direct route to the Internet. A sibling cache, on the other hand, cannot be relied upon to answer all requests, and your cache must have another method to retrieve requests that cannot come from the sibling. This usually means that in sibling relationships, your cache will also have a direct connection to the Internet or a parent proxy that can retrieve misses from the origin servers. ICP is a somewhat chatty protocol, in that an ICP request will be sent to every neighbor cache each time a cache miss occurs. By default, whichever cache replies with an ICP hit first will be the cache used to request the object.
When to Use ICP? ICP is often used in situations wherein one has multiple Internet connections, or several types of paths to Internet content. Finally, it is possible,
though
usually
not
recommended,
to
implement
a
rudimentary form of load balancing through the use of multiple parents and multiple child web caches. One of the common uses of ICP is cache meshes. A cache mesh is, in short, a number of web caches at remote sites interconnected using ICP. The web caches could be in different cities, or they could be in different buildings of the same university or different floors in the same office building. This type of hierarchy allows a large number of caches to benefit from a larger client population than is directly available to it.
54
SQUID and Webmin
All other things being equal, a cache that is not overloaded will perform better (with regard to hit ratio) with a larger number of clients. Simply put, a larger client population leads to a higher quality of cache content, which in turn leads to higher hit ratios and improved bandwidth savings. So, whenever it is possible to increase the client population without overloading the cache, such as in the case of a cache mesh, it may be worth considering. Again, this type of hierarchy can be improved upon by the use of Cache Digests, but ICP is usually simpler to implement and is a widely supported standard, even on non-Squid caches. Finally, ICP is also sometimes used for load balancing multiple caches at the same site. ICP, or even Cache Digests for that matter, are almost never the best way to implement load balancing. Using ICP for load balancing can be achieved in a few ways. •
Through have several local siblings, which can each provide hits to the others’ clients, while the client load is evenly divided across the number of caches.
•
Using fast but low-capacity web cache in front of two or more lower-cost, but higher-capacity, parent web caches. The parents will then provide the requests in a roughly equal amount.
6.6 Other Proxy Cache Servers This section of the Other Caches page provides a list of currently configured sibling and parent caches, and also allows one to add more neighbor caches. Clicking the name of a neighbor cache will allow you to edit it. This section also provides the vital information about the neighbor caches, such as the type (parent, sibling, multicast), the proxy or HTTP port, and the ICP or UDP port of the caches. Note that
55
SQUID and Webmin
Proxy port is the port where the neighbor cache normally listens for client traffic, which defaults to 3128.
Edit Cache Host Clicking a cache peer name or clicking Add another cache on the primary Other Caches page brings you to this page, which allows you to edit most of the relevant details about neighbor caches (Figure 6-2)
Figure 6-2: Create cache Host page
Hostname The name or IP address of the neighbor cache you want your cache to communicate with. Note that this will be one-way traffic. Access Control Lists, or ACLs, are used to allow ICP requests from other caches. ACLs are covered later. This option plus most of the rest of the options on this page correspond to cache_ peer lines in squid.conf.
56
SQUID and Webmin
Type The type of relationship you want your cache to have with the neighbor cache. If the cache is upstream, and you have no control over it, you will need to consult with the administrator to find out what kind of relationship you should set up. If it is configured wrong, cache misses will likely result in errors for your users. The options here are sibling, parent, and multicast.
Proxy port The port on which the neighbor cache is listening for standard HTTP requests. Even though the caches transmit availability data via ICP, actual web objects are still transmitted via HTTP on the port usually used for standard client traffic. If your neighbor cache is a Squid-based cache, then it is likely to be listening on the default port of 3128. Other common ports used by cache servers include 8000, 8888, 8080, and even 80 in some circumstances.
ICP port The port on which the neighbor cache is configured to listen for ICP traffic. If your neighbor cache is a Squid-based proxy, this value can be found by checking the icp_port directive in the squid.conf file on the neighbor cache. Generally, however, the neighbor cache will listen on the default port 3130.
57
SQUID and Webmin
Proxy only? A simple yes or no question to tell whether objects fetched from the neighbor cache should be cached locally. This can be used when all caches are operating well below their client capacity, but disk space is at a premium or hit ratio is of prime importance.
Send ICP queries? Tells your cache whether or not to send ICP queries to a neighbor. The default is Yes, and it should probably stay that way. ICP queries is the method by which Squid knows which caches are responding and which caches are closest or best able to quickly answer a request.
Default cache This is switched to Yes if this neighbor cache is to be the last-resort parent cache to be used in the event that no other neighbor cache is present as determined by ICP queries. Note that this does not prevent it from being used normally while other caches are responding as expected. Also, if this neighbor is the sole parent proxy, and no other route to the Internet exists, this should be enabled.
Round-robin cache? Choose whether to use round-robin scheduling between multiple parent caches in the absence of ICP queries. This should be set on all parents that you would like to schedule in this way.
58
SQUID and Webmin
ICP time-to-live Defines the multicast TTL for ICP packets. When using multicast ICP, it is usually wise for security and bandwidth reasons to use the minimum tty suitable for your network.
Cache weighting Sets the weight for a parent cache. When using this option it is possible to set higher numbers for preferred caches. The default value is 1, and if left unset for all parent caches, whichever cache responds positively first to an ICP query will be sent a request to fetch that object.
Closest only Allows
you
to
specify
that
your
cache
wants
only
CLOSEST_PARENT_MISS replies from parent caches. This allows your cache to then request the object from the parent cache closest to the origin server.
No digest? Chooses whether this neighbor cache should send cache digests. No NetDB exchange When using ICP, it is possible for Squid to keep a database of network information about the neighbor caches, including availability and RTT, or Round Trip Time, information. This usually allows Squid to choose more wisely which caches to make requests to when multiple caches have the requested object.
59
SQUID and Webmin
No delay? Prevents accesses to this neighbor cache from affecting delay pools. Delay pools, discussed in more detail later, are a means by which Squid can regulate bandwidth usage. If a neighbor cache is on the local network, and bandwidth usage between the caches does not need to be restricted, then this option can be used.
Login to proxy Select this if you need to send authentication information when challenged by the neighbor cache. On local networks, this type of security is unlikely to be necessary.
Multicast responder Allows Squid to know where to accept multicast ICP replies. Because multicast is fed on a single IP to many caches, Squid must have some way of determining which caches to listen to and what options apply to that particular cache. Selecting Yes here configures Squid to listen for multicast replies from the IP of this neighbor cache.
Query host for domains, Don’t query for domains These two options are the only options on this page to configure a directive other than cache_peer in Squid. In this case it sets the cache_peer_domain option. This allows you to configure whether requests for certain domains can be queried via ICP and which should not. It is often used to configure caches not to query other caches for content within the local domain. Another common usage, such as in 60
SQUID and Webmin
the national web hierarchies discussed above, is to define which web cache is used for requests destined for different TLDs. So, for example, if one has a low cost satellite link to the U. S. backbone from another country that is preferred for web traffic over the much more expensive land line, one can configure the satellite-connected cache as the cache to query for all .com, .edu, .org, net, .us, and .gov addresses.
Cache Selection Options This
section
provides
configuration
options
for
general
ICP
configuration (Figure 6-3). These options affect all of the other neighbor caches that you define.
Figure 6-3: Global ICP options
Directly fetch URLs containing Allows you to configure a match list of items to always fetch directly rather than query a neighbor cache. The default here is cgi-bin ? and should continue to be included unless you know what you’re doing. This helps prevent wasting intercache bandwidth on lots of requests that are usually never considered cacheable, and so will never return hits
from
your
neighbor
caches.
This
option
sets
the
hierarchy_stoplist directive. 61
SQUID and Webmin
ICP query timeout The time in milliseconds that Squid will wait before timing out ICP requests. The default allows Squid to calculate an optimum value based on average RTT of the neighbor caches. Usually, it is wise to leave this unchanged. However, for reference, the default value in the distant past was 2000, or 2 seconds. This option edits the icp_ query_ timeout directive.
Multicast ICP timeout Timeout in milliseconds for multicast probes, which are sent out to discover the number of active multicast peers listening on a given multicast address. This configures the mcast_icp_query_timeout directive and defaults to 2000 ms, or 2 seconds.
Dead peer timeout Controls how long Squid waits to declare a peer cache dead. If there are no ICP replies received in this amount of time, Squid will declare the peer dead and will not expect to receive any further ICP replies. However, it continues to send ICP queries for the peer and will mark it active again on receipt of a reply. This timeout also affects when Squid expects to receive ICP replies from peers. If more than this number of seconds has passed since the last ICP reply was received, Squid will not expect to receive an ICP reply on the next query. Thus, if your time between requests is greater than this timeout, your cache will send more requests DIRECT rather than through the neighbor caches.
62
SQUID and Webmin
Memory Usage This page provides access to most of the options available for configuring the way Squid uses memory and disks (Figure 6-4). Most values on this page can remain unchanged, except in very high load or low resource environments, where tuning can make a measurable difference in how well Squid performs. Gambar memory usage
Figure 6-4: Memory and disk usage
Memory usage limit The limit on how much memory Squid will use for some parts of its core data. Note that this does not restrict or limit Squid’s total process size. What it does do is set aside a portion of RAM for use in storing intransit and hot objects, as well as negative cached objects. Generally, the default value of 8MB is suitable for most situations, though it is safe to lower it to 4 or 2MB in extremely low load situations. It can also be raised significantly on high-memory systems to increase 63
SQUID and Webmin
performance by a small margin. Keep in mind that large cache directories increase the memory usage of Squid by a large amount, and even a machine with a lot of memory can run out of memory and go into swap if cache memory and disk size are not appropriately balanced. This option edits the cache_mem directive. See the section on cache directories for more complete discussion of balancing memory and storage.
Maximum cached object size The size of the largest object that Squid will attempt to cache. Objects larger than this will never be written to disk for later use. Refers to the maximum_object_size directive. IP address cache size, IP cache highwater mark, IP address low-water mark The size of the cache used for IP addresses and the high and low water marks for the cache, respectively. This option configures the ipcache_size, ipcache_high, and ipcache_low directives, which default to 1024 entries, 95%, and 90%.
6.7 Logging Squid provides a number of logs that can be used when debugging problems and when measuring the effectiveness and identifying users and the sites they visit (Figure 6-5). Because Squid can be used to “snoop” on user’s browsing habits, one should carefully consider privacy laws in your region and, more importantly, be considerate to your users. That being said, logs can be very valuable tools in ensuring that your users get the best service possible from your cache.
64
SQUID and Webmin
Figure 6-5: Logging configuration
Cache metadata file Filename used in each store directory to store the Web cache metadata, which is a sort of index for the Web cache object store. This is not a human readable log, and it is strongly recommended that you leave it in its default location on each store directory, unless you really know what you're doing. This option correlates to the cache_swap_log directive.
Use HTTPD log format Allows you to specify that Squid should write its access.log in HTTPD common log file format, such as that used by Apache and many other Web servers. This allows you to parse the log and generate reports using a wider array of tools. However, this format does not provide several types of information specific to caches, and is generally less 65
SQUID and Webmin
useful when tracking cache usage and solving problems. Because there are several effective tools for parsing and generating reports from the Squid standard access logs, it is usually preferable to leave this at its default of being off. This option configures the emulate_httpd_log directive. The Calamaris cache access log analyzer does not work if this option is enabled.
Log full hostnames Configures whether Squid will attempt to resolve the host name, so the the fully qualified domain name can be logged. This can, in some cases, increase latency of requests. This option correlates to the log_fqdn directive.
Logging netmask Defines what portion of the requesting client IP is logged in the access.log. For privacy reasons it is often preferred to only log the network or subnet IP of the client. For example, a netmask of 255.255.255.0 will log the first three octets of the IP, and fill the last octet with a zero. This option configures the client_netmask directive.
66
SQUID and Webmin
6.8 Cache Options The Cache Options page provides access to some important parts of the Squid configuration file. This is where the cache directories are configured as well as several timeouts and object size options (Figure 6-6).
Figure 6-6: Configuring Squids Cache Directories
The directive is cache_dir while the options are the type of filesystem, the path to the cache directory, the size allotted to Squid, the number of top level directories, and finally the number of second level directories. In the example, I've chosen the filesystem type ufs, which is a name for all standard UNIX filesystems. This type includes the standard Linux ext2 filesystem as well. Other possibilities for this option include aufs and diskd.
The next field is simply the space, in megabytes, of the disk that you want to allow Squid to use. Finally, the directory fields define the upper and lower level directories for Squid to use
67
SQUID and Webmin
6.9 Access Control There are three types of option for configuring ICP access control. These three types of definition are separated in the Webmin panel into three sections. The first is labeled Access control lists, which lists existing ACLs and provides a simple interface for generating and editing lists of match criteria (Figure 6-7). The second is labeled Proxy restrictions and lists the current restrictions in place and the ACLs they effect. Finally, the ICP restrictions section lists the existing access rules regarding ICP messages from other Web caches.
Figure 6-7: Access Control Lists
68
SQUID and Webmin
Access Control Lists The first field in the table represents the name of the ACL, which is simply an assigned name, that can be just about anything the user chooses. The second field is the type of the ACL, which can be one of a number of choices that indicates to Squid what part of a request should be matched against for this ACL. The possible types include the requesting clients address, the Web server address or host name, a regular expression matching the URL, and many more. The final field is the actual string to match. Depending on what the ACL type is, this may be an IP address, a series of IP addresses, a URL, a host name, etc.
Edit an ACL To edit an existing ACL, simply click on the highlighted name. You will then be presented with a screen containing all relevant information about the ACL. Depending on the type of the ACL, you will be shown different data entry fields. The operation of each type is very similar, so for this example, you'll step through editing of the localhost ACL. Clicking the localhost button presents the page that's shown in Figure 6-8
Figure 6-8: Edit an ACL
69
SQUID and Webmin
The title of the table is Client Address ACL which means the ACL is of the Client Address type, and tells Squid to compare the incoming IP address with the IP address in the ACL. It is possible to select an IP based on the originating IP or the destination IP. The netmask can also be used to indicate whether the ACL matches a whole network of addresses, or only a single IP. It is possible to include a number of addresses, or ranges of addresses in these fields. Finally, the Failure URL is the address to send clients to if they have been denied access due to matching this particular ACL. Note that the ACL by itself does nothing, there must also be a proxy restriction or ICP restriction rule that uses the ACL for Squid to use the ACL.
Creating new ACL Creating a new ACL is equally simple (Figure 6-9). From the ACL page, in the Access control lists section, select the type of ACL you'd like to create. Then click Create new ACL. From there, as shown, you can enter any number of ACLs for the list.
Figure 6-9: Creating an ACL
70
SQUID and Webmin
Available ACL Types Browser Regexp A regular expression that matches the client’s browser type based on the user agent header. This allows for ACL's operating based on the browser type in use, for example, using this ACL type, one could create an ACL for Netscape users and another for Internet Explorer users. This could then be used to redirect Netscape users to a Navigator enhanced page, and IE users to an Explorer enhanced page. Probably not the wisest use of an administrators time, but does indicate the unmatched flexibility of Squid. This ACL type correlates to the browser ACL type. Client IP Address The IP address of the requesting client, or the clients IP address. This option refers to the src ACL in the Squid configuration file. An IP address and netmask are expected. Address ranges are also accepted. Client Hostname Matches against the client domain name. This option correlates to the srcdomain ACL, and can be either a single domain name, or a list or domain names, or the path to a file that contains a list of domain names. If a path to a file, it must be surrounded parentheses. This ACL type can increase the latency, and decrease throughput significantly on a loaded cache, as it must perform an address-to-name lookup for each request, so it is usually preferable to use the Client IP Address type.
71
SQUID and Webmin
Client Hostname Regexp Matches against the client domain name. This option correlates to the srcdom_regex ACL, and can be either a single domain name, or a list of domain names, or a path to a file that contains a list of domain names. If a path to a file, it must be surrounded parentheses Date and Time This type is just what it sounds like, providing a means to create ACLs that are active during certain times of the day or certain days of the week. This feature is often used to block some types of content or some sections of the Internet during business or class hours. Many companies block pornography, entertainment, sports, and other clearly non-work related sites during business hours, but then unblock them after hours. This might improve workplace efficiency in some situations (or it might just offend the employees). This ACL type allows you to enter days of the week and a time range, or select all hours of the selected days. This ACL type is the same as the time ACL type directive. Ethernet Address The ethernet or MAC address of the requesting client. This option only works for clients on the same local subnet, and only for certain platforms. Linux, Solaris, and some BSD variants are the supported operating systems for this type of ACL. This ACL can provide a somewhat secure method of access control, because MAC addresses are usually harder to spoof than IP addresses, and you can guarantee that your clients are on the local network (otherwise no ARP resolution can take place).
72
SQUID and Webmin
External Auth This ACL type calls an external authenticator process to decide whether the request will be allowed. Note that authentication cannot work on a transparent proxy or HTTP accelerator. The HTTP protocol does not provide for two authentication stages (one local and one on remote Web sites). So in order to use an authenticator, your proxy must operate as a traditional proxy, where a client will respond appropriately to a proxy authentication request as well as external Web server authentication requests. This correlates to the proxy_auth directive.
External Auth Regex As above, this ACL calls an external authenticator process, but allows regex pattern or case insensitive matches. This option correlates to the proxy_auth_regex directive.
Proxy IP Address The local IP address on which the client connection exists. This allows ACLs to be constructed that only match one physical network, if multiple interfaces are present on the proxy, among other things. This option configures the myip directive.
Request Method This ACL type matches on the HTTP method in the request headers. This includes the methods GET, PUT, etc. This corresponds to the method ACL type directive.
73
SQUID and Webmin
URL Path Regex This ACL matches on the URL path minus any protocol, port, and host name
information.
It
does
not
include,
for
example,
the
"http://www.swelltech.com" portion of a request, leaving only the actual path to the object. This option correlates to the urlpath_regex directive. URL Port This ACL matches on the destination port for the request, and configures the port ACL directive. URL Protocol This ACL matches on the protocol of the request, such as FTP, HTTP, ICP, etc. URL Regexp Matches using a regular expression on the complete URL. This ACL can be used to provide access control based on parts of the URL or a case insensitive match of the URL, and much more. This option is equivalent to the url_regex ACL type directive.
Web Server Address This ACL matches based on the destination Web server's IP address. Squid a single IP, a network IP with netmask, as well as a range of addresses
in
the
form
"192.168.1.1-192.168.1.25".
This
option
correlates to the dst ACL type directive.
Web Server Hostname This ACL matches on the host name of the destination Web server.
74
SQUID and Webmin
Web Server Regexp Matches using a regular expression on the host name of the destination Web server.
6.10
Administrative Options
Administrative Options provides access to several of the behind the scenes options of Squid. This page allows you to configure a diverse set of options, including the user ID and group ID of the Squid process, cache hierarchy announce settings, and the authentication realm (Figure 6-10)
Figure 6-10: Administrative Options
Run as Unix user and group The user name and group name Squid will operate as. Squid is designed to start as root but very soon after drop to the user/group specified here. This allows you to restrict, for security reasons, the permissions that Squid will have when operating. By default, Squid will operate as either nobody user and the nogroup group, or in the case of some Squids installed from RPM as squid user and group. These
75
SQUID and Webmin
options
correlate
to
the
cache_effective_user
and
cache_effective_group directives.
Proxy authentication realm The
realm
that
will
be
reported
to
clients
when
performing
authentication. This option usually defaults to Squid proxy-caching web server, and correlates to the proxy_auth_realm directive. This name will likely appear in the browser pop-up window when the client is asked for authentication information. Cache manager email address The email address of the administrator of this cache. This option corresponds to the cache_mgr directive and defaults to either webmaster or root on RPM based systems. This address will be added to any error pages that are displayed to clients. Visible hostname The host name that Squid will advertise itself on. This affects the host name that Squid uses when serving error messages. This option may need to be configured in cache clusters if you receive IP-Forwarding errors. This option configures the visible_hostname.
Unique hostname Configures the unique_hostname directive, and sets a unique host name for Squid to report in cache clusters in order to allow detection of forwarding loops. Use this if you have multiple machines in a cluster with the same Visible Hostname. Cache announce host, port and file The host address and port that Squid will use to announce its availability to participate in a cache hierarchy. The cache announce file is simply a file containing a message to be sent with announcements. 76
SQUID and Webmin
These options correspond to the announce_host, announce_port, and announce_file directives.
Announcement period Configures the announce_period directive, and refers to the frequency at which Squid will send announcement messages to the announce host.
Most of the content in Chapter 6 is taken from Unix System Administration with Webmin by Joe Cooper (2002) available online at http://www.swelltech.com/support/webminguide/
77
7
Chapter
7. Analyzer 7.1 Structure of log file In Fedora, the Squid log files are stored in the /var/log/squid directory by default. It makes 3 log files which are:
Access log
Cache log
Store log
Throughout this section, each log attribute will be discussed including it content as well as how these logs might help admin debugging potential problems.
Access log Location : /var/log/squid/access.log
Description
It contains entries of each time the cache has been hit or missed when a client requests HTTP content.
78
Analyzer
The identity of the host making the request (IP address) and the content they are requesting.
It also provides the expected time when content is being used from cache and when the remote server must be accessed to obtain the content.
It contains the http transactions made by the users.
Format Option 1 : This option will be used if the emulate http daemon log is off. Native format (emulate_httpd_log off) Timestamp Elapsed Client Action/Code Size Method URI Ident Hierarchy/From Content
Option 2 : This option will be used if the emulate http daemon log is on. Common format (emulate_httpd_log on) Client Ident - [Timestamp1] "Method URI" Type Size
With: Timestamp The time when the request is completed (socket closed). The format is "Unix time" (seconds since Jan 1, 1970) with millisecond resolution. Timestamp1 When the request is completed (Day/Month/CenturyYear:Hour:Minute:Second GMT-Offset) Elapsed The elapsed time of the request, in milliseconds. This is the time between the accept() and close() of the client socket.
79
Analyzer
Client The IP address of the connecting client, or the FQDN if the 'log_fqdn' option is enabled in the config file. Action The Action describes how the request was treated locally (hit, miss, etc). Code The HTTP reply code taken from the first line of the HTTP reply header. For ICP requests this is always "000." If the reply code was not given, it will be logged as "555." Size For TCP requests, the amount of data written to the client. For UDP requests, the size of the request. (in bytes) Method The HTTP request method (GET, POST, etc), or ICP_QUERY for ICP requests. URI The requested URI. Ident The result of the RFC931/ident lookup of the client username. If RFC931/ident lookup is disabled (default: `ident_lookup off'), it is logged as - . Hierarchy A description of how and where the requested object was fetched.
80
Analyzer
From Hostname of the machine where we got the object Content Content-type of the Object (from the HTTP reply header). The example of access.log file.
Figure 7-1 Access.log
From Figure 7-1, we know that the native format has been used. Here, we try to understand each format fields over the contents of access.log file. By taking the first line, we found the result as in Table 7-1
Format
Value
Timestamp
1173680297.727
Elapsed
450
Client
10.0.5.10
Action
TCP_MISS
Code
302
Size
786
Method
GET
URI
http://www.google.com/search?
Ident
–
Hierarchy
DIRECT
From
64.233.189.104
Content
text/html Table 7-1 The format and its value
81
Analyzer
There are some elaborations on: Timestamp
The timestamp represents in UNIX time with a millisecond resolution. However, it can be converted into more readable form by using this short Perl script: #! /usr/bin/perl -p s/^\d+\.\d+/localtime
$&/e;
Action
The TCP_ codes (Table 7-2) refer to requests on the HTTP port (usually 3128). Meanwhile the UDP_ codes refer to requests on the ICP port (usually 3130)
Codes
Explanation
TCP_HIT
A valid copy of the requested object was in the cache
TCP_MISS
The requested object was not in the cache
TCP_REFRESH_HIT
The requested object was cached but STALE. The IMS query for the object resulted in "304 not modified"
TCP_REF_FAIL_HIT
The requested object was cached but STALE. The IMS query failed and the stale object was delivered
TCP_REFRESH_MISS
The requested object was cached but STALE. The IMS query returned the new content
82
Analyzer
TCP_CLIENT_REFRESH_MISS The client issued a "no-cache" pragma, or
some
analogous
cache
control
command along with the request. Thus, the cache has to re-fetch the object TCP_IMS_HIT
The client issued an IMS request for an object which was in the cache and fresh
TCP_SWAPFAIL_MISS
The object was believed to be in the cache, but could not be accessed
TCP_NEGATIVE_HIT
Request for a negatively cached object, e.g. "404 not found", for which the cache
believes
inaccessible.
to Also
know refer
that to
it
is the
explainations for negative_ttl in your squid.conf file TCP_SWAPFAIL_MISS
The object was believed to be in the cache, but could not be accessed
TCP_MEM_HIT
A valid copy of the requested object was in the cache and it was in memory, thus avoiding disk accesses
TCP_DENIED
Access was denied for this request
TCP_OFFLINE_HIT
The requested object was retrieved from the cache during offline mode. The offline mode never validates any object, see offline_mode in squid.conf file.
UDP_HIT
A valid copy of the requested object 83
Analyzer
was in the cache UDP_MISS
The requested object is not in this cache
UDP_DENIED
Access was denied for this request
UDP_INVALID
An invalid request was received
UDP_MISS_NOFETCH
During "-Y" startup, or during frequent failures, a cache in hit only mode will return either UDP_HIT or this code. Neighbours will thus only fetch hits
NONE
Seen
with
errors
and
cachemgr
requests
Table 7-2 TCP codes and Explanation
Code
These codes are taken from RFC 2616 and verified for Squid. Squid-2 uses almost all codes except 307 (Temporary Redirect), 416 (Request Range Not Satisfiable) and 417 (Expectation Failed) Code
84
Explanation
000
Used mostly with UDP traffic
100
Continue
101
Switching Protocols
102
Processing
200
OK
Analyzer
201
Created
202
Accepted
203
Non-Authoritative Information
204
No Content
205
Reset Content
206
Partial Content
207
Multi Status
300
Multiple Choices
301
Moved Permanently
302
Moved Temporarily
303
See Other
304
Not Modified
305
Use Proxy
[307
Temporary Redirect]
400
Bad Request
401
Unauthorized
402
Payment Required
403
Forbidden
404
Not Found
405
Method Not Allowed
406
Not Acceptable
407
Proxy Authentication Required
408
Request Timeout
409
Conflict
410
Gone
411
Length Required
412
Precondition Failed
413
Request Entity Too Large
414
Request URI Too Large
415
Unsupported Media Type
85
Analyzer
[416
Request Range Not Satisfiable]
[417
Expectation Failed]
*424
Locked
*424
Failed Dependency
*433
Unprocessable Entity
500
Internal Server Error
501
Not Implemented
502
Bad Gateway
503
Service Unavailable
504
Gateway Timeout
505
HTTP Version Not Supported
*507
600
Insufficient Storage
Squid header parsing error
Method
Squid recognizes several request methods as defined in RFC 2616. Newer versions of Squid (2.2.STABLE5 and above) also recognize RFC 2518 ``HTTP Extensions for Distributed Authoring -- WEBDAV'' extensions (Table 7-3).
method
defined
cachabil. meaning
GET
HTTP/0.9
possibly
object
retrieval
and
simple
searches HEAD
HTTP/1.0
possibly
POST
HTTP/1.0
CC
metadata retrieval
or submit data (to a program)
Exp. PUT
HTTP/1.1
never
upload data (e.g. to a file)
DELETE
HTTP/1.1
never
remove resource (e.g. file)
TRACE
HTTP/1.1
never
appl. layer trace of request route
86
Analyzer
OPTIONS
HTTP/1.1
never
CONNECT
HTTP/1.1r3 never
request available comm. options tunnel SSL connection
ICP_QUERY Squid
never
used for ICP based exchanges
PURGE
Squid
never
remove object from cache.
PROPFIND
rfc2518
?
retrieve properties of an object
PROPATCH
rfc2518
?
change properties of an object
MKCOL
rfc2518
never
create a new collection
COPY
rfc2518
never
create a duplicate of src in dst
MOVE
rfc2518
never
atomically move src to dst
LOCK
rfc2518
never
Lock
an
object
against
modifications UNLOCK
rfc2518
never
unlock an object
Table 7-3 List of Methods
87
Analyzer
Hierarchy The following hierarchy codes are used in Squid-2 (Table 7-4):
Codes
Explanation
NONE
For TCP HIT, TCP failures, cachemgr requests and all UDP requests, there is no hierarchy information.
DIRECT
The object was fetched from the origin server.
SIBLING_HIT
The object was fetched from a sibling cache which replied with UDP_HIT.
PARENT_HIT
The object was requested from a parent cache which replied with UDP_HIT.
DEFAULT_PARENT
No ICP queries were sent. This parent was chosen because it was marked ``default'' in the config file.
SINGLE_PARENT
The object was requested from the only parent appropriate for the given URL.
FIRST_UP_PARENT
The object was fetched from the first parent in the list of parents.
NO_PARENT_DIRECT
The object was fetched from the origin server, because no parents existed for the given URL.
FIRST_PARENT_MISS
The object was fetched from the parent with the fastest (possibly weighted) round
88
Analyzer
trip time. CLOSEST_PARENT_MISS
This
parent
was
chosen,
because
it
included the lowest RTT measurement to the origin server. See also the closestsonly peer configuration option. CLOSEST_PARENT
The parent selection was based on our own RTT measurements.
CLOSEST_DIRECT
Our own RTT measurements returned a shorter time than any parent.
NO_DIRECT_FAIL
The
object
could
not
be
requested
because of a firewall configuration, see also never_direct and related material, and no parents were available. SOURCE_FASTEST
The origin site was chosen, because the source ping arrived fastest.
ROUNDROBIN_PARENT
No ICP replies were received from any parent. The parent was chosen, because it was marked for round robin in the config file and had the lowest usage count.
CACHE_DIGEST_HIT
The peer was chosen, because the cache digest predicted a hit. This option was later replaced in order to distinguish between parents and siblings.
CD_PARENT_HIT
The parent was chosen, because the 89
Analyzer
cache digest predicted a hit. CD_SIBLING_HIT
The sibling was chosen, because the cache digest predicted a hit.
NO_CACHE_DIGEST_DIR
This output seems to be unused?
ECT CARP
The peer was selected by CARP.
ANY_PARENT
part of src/peer_select.c:hier_strings[].
INVALID CODE
part of src/peer_select.c:hier_strings[].
Table 7-4 Hierarchy Codes in Squid-2
Cache log Location : /var/log/squid/cache.log
Description
It contains various messages such as information about Squid configuration, warnings about possible performance problems and serious errors.
Error and debugging messages of particular squid modules
Format [Timestamp1]| Message With Timestamp1 When the event occurred (Year/Month/Day Hour:Minute:Second)
90
Analyzer
Message Errors
Description of the event
ERR_READ_TIMEOUT
The
remote
site
or
network
is
unreachable - may be down. ERR_LIFETIME_EXP
The remote site or network may be too slow or down.
ERR_NO_CLIENTS_BIG_OBJ
All
clients
went
away
before
transmission completed and the object is too big to cache. ERR_READ_ERROR
The remote site or network may be down.
ERR_CLIENT_ABORT
Client
dropped
connection
before
transmission completed. Squid fetches the Object according to its settings for `quick_abort'. ERR_CONNECT_FAIL
The remote site or server may be down.
ERR_INVALID_REQ
Invalid HTTP request
ERR_UNSUP_REQ
Unsupported request
ERR_INVALID_URL
Invalid URL syntax
ERR_NO_FDS
Out of file descriptors
ERR_DNS_FAIL
DNS name lookup failure
91
Analyzer
ERR_NOT_IMPLEMENTED
Protocol Not Supported
ERR_CANNOT_FETCH
The requested URL can not currently be retrieved.
ERR_NO_RELAY
There is no WAIS relay host defined for this cache.
ERR_DISK_IO
The system disk is out of space or failing.
ERR_ZERO_SIZE_ OBJECT
The
remote
server
closed
the
connection before sending any data.
ERR_FTP_DISABLED
This
cache
is
configured
to
NOT
retrieve FTP objects. ERR_PROXY_DENIED
Access
Denied.
The
user
must
authenticate himself before accessing this cache. Table 7-5 List of Error Messages
92
Analyzer
The example of cache.log file (Figure 7-2).
Figure 7-2 Cache.log
Store log Location : /var/log/squid/store.log
Description
It contains the information and status of [not] stored objects
Format Timestamp Tag Code Date LM Expire Content Expect/Length Methods Key
With: Timestamp The time entry was logged. (Millisecond resolution since 00:00:00 UTC, January 1, 1970) Tag SWAPIN (swapped into memory from disk), SWAPOUT (saved to disk) or RELEASE (removed from cache) Code The HTTP replies code when available. For ICP requests this is always "0". If the reply code was not given, it will be logged as "555."
93
Analyzer
The following three fields are timestamps parsed from the HTTP reply headers. All are expressed in Unix time (i.e.(seconds since 00:00:00 UTC, January 1, 1970). A missing header is represented with -2 and an unparsable header is represented as -1. Date The time captures from the HTTP Date reply header. If the Date header is missing or invalid, the time of the request will be used instead. LM The value of the HTTP Last-Modified: reply header. Expires The value of the HTTP Expires: reply header. Content The HTTP Content-Type reply header. Expect The value of the HTTP Content-Length reply header. The Zero value will be returned if the Content-Length was missing. /Length The number of bytes of content actually read. If the Expect is nonzero, and not equal to the Length, the object will be released from the cache. Method The request method (GET, POST, etc).
94
Analyzer
Key The cache key. Often this is simply the URL. Cache objects which never become public will have cache keys that include a unique integer sequence number, the request method, and then the URL. ( /[post|put|head|connect]/URI ) The example of store.log file (Figure 7-3).
Figure 7-3 Store.log
Based on Figure 7-3, we try to understand each format fields over the contents of store.log file. By taking the second line, we found that (Table 7-6):
Format
Value
Timestamp
1173680297.727
Tag
Release
Code
-1
Date
FFFFFFFF
LM
7832CBDDD1604B89D0F75A2437F37AD7
Expire
302
Content
1173680306 -1 -1 text/html
Expect
-1
/Length
/278
Methode
GET
Key
http://www.google.com/search? Table 7-6 Format in Store.log
95
Analyzer
7.2 Methods Log Analysis Using Grep Command The log files also can be analysed using Linux or UNIX command such as grep. It is used to filter the required information from any log files. By using a terminal, follow the following commands in order to start analysis the related log file. For example: # cat /var/log/squid/access.log | grep www.google.com By referring Figure 7-4, the output shows the result of grep command for the access.log file. The same technique can be applied for cache.log and store.log files.
Figure 7-4 Analysis the Access.log using Grep command
Log Analysis Using Sarg-2.2.3.1 Basically, the preferred log file for analysis is the access.log file in the native format. We choose to use Squid Analysis Report Generator (Sarg) as a tool. It is used to analyze the users pattern concerning the Internet surfing. It generates reports in html including many fields such as users, IP addresses, bytes, sites and times. This tool can be downloaded from: http://linux.softpedia.com/get/Internet/Log-Analyzers/sarg-102.shtml 96
Analyzer
7.3 Setup Sarg-2.2.3.1 Step: Download software named Sarg-2.2.3.1.tar.gz for Linux and Unix environment. Make a new directory called installer located in the root path. # mkdir /installer Copy the downloaded file into the installer directory # copy sarg-2.2.3.1.tar.gz
/installer
Then, go into the directory and extract file Sarg-2.2.3.1.tar.gz using the following command. # tar –zxvf sarg-2.2.3.1.tar.gz After successfully extracted, go into sar-2.2.3.1 directory and start configure it. Follow these command: # # # #
cd /installer/sar-2.2.3.1 ./configure make make install
NOTE: Make sure the Squid already started before run the following script.
Go into sarg-2.2.3.1 directory, run the sarg script. # ./sarg The generated result will be kept at /var/www/html/squid-reports. It is recommended to view using GUI enviroment.
97
Analyzer
7.4 Report Management Using Webmin For managing the report, we choose to use Webmin which is a webbased interface for system administration for Unix. In our case, it helps admin to set some information such as the location of log source and report destination, the format of generated report, the size of report and also the schedule of automatic report to be generated. Step: 1. Make sure the webmin is already setup in the server. Then, open the browser and type http://127.0.0.1:10000/ to find the webmin. After that, login the webmin.
Figure 7-5 Login
2. Choose Server tab, and then click on Squid Analysis Report Generator. There are four (4) modules being offer such as Log Source and Report Destination, Report Option, Report Style and Scheduled Report Generation.
98
Analyzer
Figure 7-6 Sarg Main Modules in Webmin
3. Click on Log Source and Report Destination icon. In this module, admin allows to set the source of log file and also define the destination of generated report. For report maintenance, it also allows admin to set the number of report to keep in certain location and acknowledgement can be sent to admin’s e-mail. Note: Please check the sarg.conf file which is located in /usr/local/sarg/sarg.conf to ensure the correct path for locating the source of log files.
Figure 7-7 Setting on Source and Destination Report
99
Analyzer
After setting the changes, click on Save button. 4. Click on Report Option icon. In this module, admin can manages the pattern of generated report including data ordering, size of data displayed, data format and log file rotation. There are several types of report can be generates depending on the implementation of access control list (ACL) that has been set before. For log file rotation, it becomes important to ensure enough disk space to handle log file storage especially when it involves the long term evaluations. This can covers more in Scheduled Report Generation.
100
Analyzer
Figure 7-8 Setting on Report Content and Generation Option 101
Analyzer
5. Click on Report Style icon. Here, it allows admin to make the generated report looks more interesting in terms of language, title and other common style setting.
Figure 7-9 Setting on HTML Report Style and Colour Option
6. Click on Scheduled Report Generation icon. In this module, admin allows to define the frequency of generated report by enabling the selected or default schedule stated. Regarding to rotate feature in Squid, it is recommended to apply simple schedule. During a time of some idleness, the log files are safely transferred to the report destination in one burst. Before transport, the log files can be compressed during off-peak time. On the destination, the log file is concatenated into one file. Therefore one file for selected hour is the yield. However, it is depends on company’s requirement on how to generate report.
102
Analyzer
Figure 7-10 Setting on Scheduled Reporting Options
7. After setting some information in Scheduled Report Generation, the following statement will be displayed on the main page.
Figure 7-11 Generate Report Setting
103
Analyzer
There are some considerations to be taken: 1.
Should never delete access.log, store.log, cache.log while Squid is running. There is no recovery file.
2.
In squid.conf file, the following statements can be applied if admin wants to disable certain log file. For example: To disable access.log: cache_access_log /dev/null To disable store.log: cache_store_log none To disable cache.log: cache_log /dev/null However, the cache.log is not suitable to be disabled because it has
file
messages.
104
contains
many
important
status
and
debugging
Analyzer
7.5 Log Analysis and Statistic After running the Sarg analyser, the reports will generated for access.log. This can be found in /var/www/html/squid-reports.
Figure 7-12 Collection of Squid Report for Access.log
From Figure 7-12, throughout this example we found that there are three (3) reports generated. Basically, the latest version has no number at the end of the filename. Each time the access log file being analysed, the filename will renamed and an incremental number will be placed automatically at the end of the file. For example 2007Mar22-2007Mar22.2 was the first report had been generated compared to 2007Mar22-2007Mar22 which indicated as the latest version report.
105
Analyzer
Based on (Figure 7-13), the index.html file shows the list of reports that have been generated by Sarg. To get more detail information for a specific report, we need to click on the selected file name.
Figure 7-13 Summary of Squid reports
For example, a folder named as 2007Mar22-2007Mar22 has been selected and opened. From (Figure 7-14), there are several standard files which can be found in all Squid reports. Briefly, there are five (5) html reports show statistical information regarding to index, denied, download, siteuser and topsites. Besides, the folder also presents collection of report for specific user by their IP addressess.
106
Analyzer
Figure 7-14 Contents of 2007Mar22-2007Mar22 as example
The following figure will show the html reports: 1. Index html
2. Figure 7-15 Index html
107
Analyzer
3. Denied html
Figure 7-16 Denied html
4. Download html
Figure 7-17 Download html
108
Analyzer
5. Sites and Users
Figure 7-18 Siteuser html
109
Analyzer
6. Top 100 Sites
Figure 7-19 Topsites html
If we click on specific IP address, we will view all information as in Figure 7-20
Figure 7-20 Reports generated for specific user (IP Address)
110