Squid

version 2.6 by Marliza Ramly Zurina Saaya Wahidah Md Shah Mohammad Radzi Motsidi Haniza Nahar Faculty of Information an

Views 198 Downloads 5 File size 2MB

Report DMCA / Copyright

DOWNLOAD FILE

Recommend stories

Citation preview

version 2.6

by Marliza Ramly Zurina Saaya Wahidah Md Shah Mohammad Radzi Motsidi Haniza Nahar Faculty of Information and Communication Technology Universiti Teknikal Malaysia Melaka (UTeM) May 2007

Copyright © 2007 Fakulti Teknologi Maklumat dan Komunikasi, UTeM

TABLE OF CONTENT 1.

PROXY SERVERS......................................................................................... 1 1.2

2.

INTERNET CACHING ................................................................................... 4 2.1 2.2 2.3 2.4

3.

3.4 3.5

3.6

4.2

ACCESS CONTROLS ........................................................................ 25 List of ACL type......................................................................... 26 src .......................................................................................... 27 srcdomain ................................................................................ 28 dst .......................................................................................... 29 dstdomain ................................................................................ 29 srcdom_regex........................................................................... 30 dstdom_regex........................................................................... 30 time ........................................................................................ 31 url_regex ................................................................................. 32 urlpath_regex ........................................................................... 33 port......................................................................................... 34 proto ....................................................................................... 35 method .................................................................................... 36 browser ................................................................................... 36 proxy_auth............................................................................... 37 maxconn.................................................................................. 38 Create custom error page ........................................................... 39 EXERCISES ................................................................................. 40

CACHING.................................................................................................... 42 5.1 5.2

6.

HARDWARE AND SOFTWARE REQUIREMENT ............................................ 10 DIRECTORY STRUCTURE .................................................................. 11 GETTING AND INSTALLING SQUID ........................................................ 11 Custom Configuration for Network ............................................... 11 INSTALL SQUID............................................................................. 16 BASIC SQUID CONFIGURATION .......................................................... 17 Configure SQUID ....................................................................... 17 Basic Configuration.................................................................... 17 Starting Squid Daemon ............................................................. 19 Starting Squid Daemon .............................................................. 20 BASIC CLIENT SOFTWARE CONFIGURATION ............................................ 22 Configuring Internet Browser ...................................................... 22 Using proxy.pac File................................................................... 23

ACL CONFIGURATION............................................................................... 25 4.1

5.

HIERARCHICAL CACHING ...................................................................4 TERMINOLOGY FOR HIERARCHICAL CACHING .............................................5 INTERNET CACHE PROTOCOL ...............................................................7 BASIC NEIGHBOUR SELECTION PROCESS .................................................7

INTRODUCTION TO SQUID ......................................................................... 9 3.1 3.2 3.3

4.

KEY FEATURES OF PROXY SERVERS ........................................................2 Proxy Servers and Caching ...........................................................2

CONCEPTS .................................................................................. 42 CONFIGURING A CACHE FOR PROXY SERVER ............................................ 42

SQUID AND WEBMIN ................................................................................. 47 i

6.1 ABOUT WEBMIN ................................................................................ 47 6.2 OBTAINING AND INSTALLING WEBMIN .................................................. 47 Installing from a tar. gz.............................................................. 48 Installing from an RPM ............................................................... 48 After Installation ....................................................................... 49 6.3 USING SQUID IN WEBMIN ................................................................ 49 6.4 PORTS AND NETWORKING ................................................................ 50 Proxy port ................................................................................ 51 ICP port ................................................................................... 51 Incoming TCP address................................................................ 51 Outgoing TCP address ................................................................ 52 Incoming UDP address ............................................................... 52 Outgoing UDP address ............................................................... 52 Multicast groups........................................................................ 52 TCP receive buffer ..................................................................... 53 6.5 OTHER CACHES ............................................................................ 53 Internet Cache Protocol.............................................................. 53 Parent and Sibling Relationships .................................................. 54 When to Use ICP?...................................................................... 54 6.6 OTHER PROXY CACHE SERVERS ......................................................... 55 Edit Cache Host ........................................................................ 56 Hostname ................................................................................ 56 Type........................................................................................ 57 Proxy port ................................................................................ 57 ICP port ................................................................................... 57 Proxy only? .............................................................................. 58 Send ICP queries? ..................................................................... 58 Default cache ........................................................................... 58 Round-robin cache? ................................................................... 58 ICP time-to-live ........................................................................ 59 Cache weighting........................................................................ 59 Closest only.............................................................................. 59 No digest?................................................................................ 59 No delay?................................................................................. 60 Login to proxy .......................................................................... 60 Multicast responder ................................................................... 60 Query host for domains, Don’t query for domains .......................... 60 Cache Selection Options ............................................................. 61 Directly fetch URLs containing ..................................................... 61 ICP query timeout ..................................................................... 62 Multicast ICP timeout................................................................. 62 Dead peer timeout .................................................................... 62 Memory Usage.......................................................................... 63 Memory usage limit ................................................................... 63 Maximum cached object size....................................................... 64 6.7 LOGGING ................................................................................... 64 Cache metadata file................................................................... 65 Use HTTPD log format ................................................................ 65 Log full hostnames .................................................................... 66 Logging netmask....................................................................... 66 6.8 CACHE OPTIONS ........................................................................... 67 6.9 ACCESS CONTROL ......................................................................... 68 Access Control Lists ................................................................... 69 Edit an ACL .............................................................................. 69 Creating new ACL...................................................................... 70 Available ACL Types................................................................... 71 6.10 ADMINISTRATIVE OPTIONS ............................................................... 75 7. ii

ANALYZER ................................................................................................ 78

7.1

7.2 7.3 7.4 7.5

STRUCTURE OF LOG FILE .................................................................. 78 Access log ................................................................................ 78 Cache log ................................................................................. 90 Store log.................................................................................. 93 METHODS ................................................................................... 96 Log Analysis Using Grep Command .............................................. 96 Log Analysis Using Sarg-2.2.3.1 .................................................. 96 SETUP SARG-2.2.3.1 .................................................................... 97 REPORT MANAGEMENT USING WEBMIN ................................................. 98 LOG ANALYSIS AND STATISTIC ........................................................ 105

iii

ABBREVIATIONS Abbreviation

Details

ACL

Access Control List

CARP

Cache Array Routing Protocol

CD

Compact Disk

DNS

Domain Name Service

FTP

File Transfer Protocol

GB

Gigabyte

HTCP

Hyper Text Caching Protocol

HTTP

Hypertext Transfer Protocol

I/O

Input/Output

ICP

Internet Cache Protocol

IP

Internet Protocol

LAN

Local Area Network

MAC

Media Access Control

MB

Megabyte

RAM

Random Access Memory

RPM

Red Hat Package Manager

RTT

Round Trip Time

SNMP

Simple Network Management Protocol

SSL

Secure Socket Layer

UDP

User Datagram Protocol

URL

Uniform Resource Locator

UTeM

Universiti Teknikal Malaysia Melaka

WCCP

Web Cache Coordination Protocol

iv

1

Chapter

1. Proxy Servers A Proxy Server is an intermediary server between the Internet browser and the remote server. It acts like a "middleman" between the two ends of the client/server network connection and also works with browsers and servers or other application by supporting underlying network protocols like HTTP. Furthermore, it store and download documents in its local cache so that the downloading time from the internet can be faster because the document is store in a local server. For example, lets imagine when a user want to download documents from the Internet browser with a specify URL address such as http://www.yahoo.com, which then the document will be transfer to workstation. (e.g UTeM to local workstation). In that situation, the internet browser communicates directly with the proxy server UTem to get the document. In addition, a cache is combined with a proxy server which will make it reliable for quicker transfer. In this matter, Internet browser will no longer contact the remote server directly but it request document from the proxy server.

1

Proxy Servers

1.2 Key features of proxy servers Four main functions provided are: ƒ

Firewalling and Filtering (security)

ƒ

Connection Sharing

ƒ

Administrative Control

ƒ

Caching service

Proxy Servers and Caching Proxy Server with the caching of Web pages may leads to a better improvement for QoS in network as in Figure 1-1. It can be specified in three ways: ƒ

Caching may preserve bandwidth on the network and proliferate scalability

ƒ

Enhancement of response time (e.g: HTTP proxy cache can load Web Pages more quickly into the browser)

ƒ

Proxy server caches boost to the availability, where Web pages or other files in the cache remain accessible even if the original source or an intermediate network link goes offline.

2

Proxy Servers

client

client

Proxy Server

Internet

client

client

client

Figure 1-1: Generic Diagram for Proxy Server

3

2

Chapter

2. Internet Caching 2.1 Hierarchical Caching Cache Hierarchies are a logical extension of the caching concept. A sharing concept might help and give some benefit for a group of Web caches and a group of Web Clients. Figure 2-1 shows how it works. However, there are some disadvantages as well. It will depends on the specific situation discuss below whether the advantages will outweigh the disadvantages.

4

3

Proxy server caches returned page

5

client

Proxy server returns the requested page to the client

Proxy Server

Web server returns requested URL to proxy server

Yes

internet

1

Client browser initiates request to proxy server for the URL

Is requested page in proxy server cache?

2

No

Proxy server requests the page from the web server

Figure 2-1: Proxy Server Caching Process

4

Web server

Internet Caching

The major advantages are: ƒ

Additional cache hits. In general, the cache hits that are expected from the requested user will be at the neighbor caches.

ƒ

Request routing. The availability to direct the HTTP traffic along a certain path can be done by routing requests to specific caches. (e.g., accessing the Internet with two paths, one of it is cheap and the other is being expensive, therefore, the user can send HTTP traffic over the cheapest link using the request routing.

The disadvantages among the concept: ƒ

Configuration hassles. The coordination from both parties are required to configure neighbors caches. As a result, it will put some weight to the exacerbates membership

ƒ

Additional delay for cache misses. There are many factors to consider due to the delay. For example, delays between peers, link congestion, and whether or not ICP is used.

2.2 Terminology for Hierarchical Caching Cache It is refers to an HTTP proxy that store some requests. Objects It is a generic term for any document, image, or other type of data that available on the Web. Nowadays, the Uniform Resource Locators (URLs) will identify Web Page with objects (such as images, audio, video and binary files) rather than documents or pages only from the data available at HTTP, FTP, Gopher and other types of servers.

5

Internet Caching

Hit and misses It is a valid copy when a cache hit the requested existing object in a cache. If the object does not exist or no longer valid, it is refer to cache miss. That situation, a cache must forward cache misses toward the origin server. Origin Server It is the authoritative source for an object. For example, the origin server is the hostname in URL. Hierarchy vs. Mesh It is hierarchically arrange when the topology is like a tree structure or in mesh when the structure is flat. In either case these terms simply refer to the fact that caches can be ''connected'' to each other. In squid it can be seen at directory cache after creating it. Neighbours, Peers, Parents, Siblings In general, the terms neighbour and peer are the same for caches in a hierarchy or mesh. While, for parent and sibling will refer to the relationship between a pair of caches. Fresh, Stale, Refresh The status of cached objects can be refer to ƒ

A fresh object when a cache hit is returnable.

ƒ

A stale object and refresh object when the Squid refresh it by including an IMS request header and forwarding the request on toward the origin server.

6

Internet Caching

2.3 Internet Cache Protocol A quick and efficient method of inter-cache communication in ICP's is by offering a mechanism to establish a complex cache hierarchies. The advantages by using are; ƒ

ICP can be utilized by Squid to provide an indication of network conditions.

ƒ

ICP messages are transmitted as UDP packets. It is easier to implement because each cache needs to maintain only a single UDP socket.

ICP may convey to some disadvantages as well. One of the failures in ICP is when the links is highly congested, therefore the ICP become useless where its caching is needed most. Furthermore, an extra delay may be a factor in processing request due to the transmission time of the UDP packet. As a result, ICP is not the appropriate for this delay in some situation.

2.4 Basic Neighbour Selection Process Before describing Squid features for hierarchical caching, first lets briefly explain the neighbor selection process referring to. Squid must decide where to forward the request when it is unable to satisfy the request from cache. There are basically three choices can be use: ƒ

parent cache

ƒ

sibling cache

ƒ

origin server

7

Internet Caching

How ICP can make decision for Squid? ƒ

In parent and sibling cache, Squid will send an ICP query requested URL message to its neighbors. Usually in a UDP packets and Squid will remembers how many queries it sends for a given request.

ƒ

By receiving ICP query in each neighboring, the URL will be search in its own cache. If a valid copy of the URL exists, then cache sends ICP_HIT, otherwise an ICP_MISS message.

ƒ

The querying cache now collects the ICP replies from its peers.

ƒ

If the cache receives an ICP_HIT reply from a peer, it immediately forwards the HTTP request to that peer.

ƒ

If the cache does not receive an ICP_HIT reply, then all replies will be ICP_MISS.

ƒ

Squid waits until it receives all replies, up to two seconds.

ƒ

If one of the ICP_MISS replies comes from a parent, Squid forwards the request to the parent whose reply was the first to arrive. We call this reply the FIRST_PARENT_MISS. If there is no ICP_MISS from a parent cache, Squid forwards the request to the origin server.

We have described the basic algorithm, to which Squid offers numerous possible modifications, including mechanisms to: ƒ

Send ICP queries to some neighbours and not to others.

ƒ

Include the origin server in the ICP “pinging” so that if the origin server reply arrives before any ICP_HITs, the request is forwarded there directly.

ƒ

8

Disallow or require the use of some peers for certain requests.

3

Chapter

3. Introduction to Squid Squid is a high-performance proxy caching server for Web clients, support FTP, gopher, and HTTP data objects. It has two basic purposes; ƒ

to provide proxy service from machines that must pass Internet traffic through some form of masquerading firewall

ƒ

caching

Unlike traditional caching software, Squid handles all requests in a single, non-blocking, I/O-driven process. Squid keeps meta data and especially hot objects cached in RAM, caches

DNS

lookups,

supports

non-blocking

DNS

lookups,

and

implements negative caching of failed requests. Squid consists of a main server program, a Domain Name System lookup program (dnsserver), a program for retrieving FTP data (ftpget) and some management and client tools. In other words Squid is 1. full featured Web proxy cache 2. free, open-source software 3. the result of many contributions by unpaid (and paid) volunteers

9

Introduction to Squid

Squid Support ƒ

proxy and caching of Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), and other

ƒ

Uniform Resource Locators (URLs)

ƒ

Proxiying for Secure Socket Layer (SSL)

ƒ

cache hierarchies

ƒ

Internet Cache Protocol (ICP), Hyper Text Caching Protocol (HTCP), Cache Array Routing Protocol (CARP), Cache Digests

ƒ

transparent caching

ƒ

Web Cache Coordination Protocol (WCCP) (Squid v2.3 and above)

ƒ

extensive access controls

ƒ

HTTP server acceleration

ƒ

Simple Network Management Protocol (SNMP)

ƒ

caching of DNS lookups

3.1 Hardware and Software Requirement ƒ

RAM Minimum RAM recommended = 128mb (scales by user count and size of disk cache)

ƒ

Disk Small user count = 512MB to 1G Large user count = 16G to 24G

ƒ

Most version on UNIX Also work on AIX, Digital UNIX, FreeBSD, Hp-UX, IRIX, LINUX, NetBSD, NextStep,SCO, Solaris and SunOS

10

Introduction to Squid

3.2 Directory Structure Squid normally creates a few directories shown in Table 3-1

Directories

Explaination

/var/cache

Stored the actual data

/etc/squid

Contains the squid.conf file which it is only squid config file

/var/log

Query each connection (example if the directory getting larger) Table 3-1: Squid Directory

3.3 Getting and installing squid Custom Configuration for Network There are three configurations for proxy server in the network. The configuration file will follow to the requirement for the usage in your network. They are transparency proxy, reverse proxy and web cache proxy.

11

Introduction to Squid

Configuring squid for transparency

Internet

Transparent Proxy Server

10.1.1.1

10.1.1.1 80

client client client

client

client LAN

client

client LAN

Figure 3-1: Transparent Proxy

A Transparent proxy (Figure 3-1) is configured when you want to grab a certain type of traffic at your gateway or router and send it through a proxy without the knowledge of the user or client. In other words, router will forward all traffic to port 80 to proxy machine using a route policy. By using squid as transparent proxy, it will involve two part of process: 1. squid need to be configured properly to accept non-proxy requests 2. web traffic gets redirected to the squid port

12

Introduction to Squid

This type of transparency proxy is suitable for ƒ

Intercept the network traffic transparently to the browser

ƒ

Simplified administration- the browser does not need to be configured to talk to a cache

ƒ

Central control – the user cannot change the browser to bypass the cache

The disadvantages of using this type of proxy are ƒ

Browser dependency – transparent proxy does not work very well with certain web-browsers

ƒ

User control Transparent – caching takes control away from the user where the user will change ISPs to either avoid it or get it

Configuring squid for reverse proxy

Internet

client

Reverse Proxy Server

Web Server Cluster

Figure 3-2: Reverse Proxy

13

Introduction to Squid

A Reverse Proxy (also known as Web Server Acceleration) (Figure 3-2) is a method of reducing the load on a busy web server by using a web cache between the server and the internet. In this case, when a client browser makes a request, the DNS will route the request to the reverse proxy (this is not the actual web server). Then the reverse proxy will check its cache to find out whether the request contains is available to fulfill the client request. If not, it will contact the real web server and downloads the requested contains to its disk cache. Benefits that can be gained are 1. security improvement 2. scalability improvement without increasing the complexity of maintenance too much. 3. easy burden on a web server that provides both static and dynamic content. The static content can be cached on the reverse proxy while the web server will be freed up to better handle the dynamic content. To run Squid as an accelerator, you probably want to listen on port 80. Hence, you have to define the machine you are accelerating for. (not covered in this chapter).

14

Introduction to Squid

Configuring squid for Web Cache proxy Internet

Router

Web Cache Proxy Server

Router

client

client

client

client

client

Figure 3-3 Web Cache Proxy

By default, squid is configured as a direct proxy (Figure 3-3). In order to cache web traffic with squid, the browser must be configured to use the squid proxy. This needs the following information ƒ

proxy server's IP address

ƒ

port number by which the proxy server accepts connections

15

Introduction to Squid

3.4 Install squid The Squid proxy caching server software package comes with Fedora Core V6. Therefore, we do not have to install it. Just manage the configuration file to make it work. If no Squid installed in your server you can install it from Squid RPM file. To do so, you need to download the RPM file from the Internet or copy it from installation CD. Then run this command # rpm –i squid-2.6.STABLE4-1.fc6.i386.rpm NOTE: The RPM file name may be differ depends on the version of Squid you have downloaded

Alternatively, you can install it from Squid installation script where it can be downloaded from official Squid Proxy server web site, http://www.squid-cache.org.

To

do

so,

you

need

to

copy

the

installation folder into your local drive and run the following command. # ./configure # make # make install

NOTE: Make sure all the dependency files are already installed in your machine before starting to install Squid

16

Introduction to Squid

3.5 Basic Squid Configuration Configure SQUID All Squid configuration files are kept in the directory /etc/squid.

The following paragraph of this chapter will works through the options that may need some further changes to get Squid to run. Most people will not need to change all of these settings. What usually needs to change is at least one part of the configuration file though: the default file in squid.conf, which denies the access to the browser. If you don't change this, Squid will not be very useful.

Basic Configuration All of squid configuration goes in one file - squid.conf. This section details up the configuration of Squid as a caching proxy only, not as http-accelerator. Some basic configuration need to be implemented. First, uncomment and edit the following lines in the configuration file found at default file /etc/squid/squid.conf To construct the squid server, do the following tasks 1. log in as root to the machine 2. type the following command # vi /etc/squid/squid.conf The above command will open Squid configuration file for editing

17

Introduction to Squid

Then, set the port on which Squid listens. Normally, Squid will listen on port 3128. While it may convenient to listen on this port, network administrators often configure the proxy to listen on port 8080 as well. This is a non-well-known port, while (port 1024 are well-known ports and are restricted from being used ordinary users processes), and is therefore not going to be in conflict with other ports such as 80, 443, 22, 23, etc. Squid need not be restricted to one port. It could easily be started in two or more ports. At squid.conf file, find out the following sentence for some changes or leave it as default if its port is 3128. http_port Check http_port 3128 (is a default.) or http_port 8080 3128 (for multiple port) .

18

Introduction to Squid

Additionally, if you have multiple networks cards in your proxy server, and would like to restrict the proxy to start on port 8080 on the first network card and port 3128 on the second network card. You can add the following sentence. http_port

10.1.5.49:8080

10.0.5.50:3128

http_access By default http_access is denied. The Access Control Lists (ACL) rules should be modified to allow access only to the trusted clients. This is important because it prevents people from stealing your network resources. ACL will be discussed in Chapter 4. cache_dir This directive specifies the cache directory storage format and its size as given below. cache_dir ufs /var/spool/squid 100 16 256 The value 100 denotes 100MB cache size. This can be adjusted to the required size. (cache will be discuss later in Chapter 5) cache_effective_user cache_effective_ group

NOTE: You can edit the squid.conf file by using gedit instead of command line

19

Introduction to Squid

Starting Squid Daemon In this chapter, we will learn how to start Squid. Make sure you have finished editing the configuration file. Then you can start Squid for the first time. First, you have to check the error in conf file. Type this command at your terminal

# squid -k parse If error detected, for example # squid –k parse FATAL: could not determine fully qualified hostname, Please set ‘visible hostname’ Squid Cache (versio 2.6.STABLE4):Terminated abnormally. CPU Usage:0.0004 seconds=0.0004 user+0.000 sys Maximum Resident Size:0KB Page faults with physical i/o:0 Aborted. Solution : Add the following sentence in squid.conf file visible_hostname localhost If no error detected, continue with the following command to start squid. (This is temporarily step to start the squid)

# service squid start

If everything is working fine, then your console displays: Starting squid: . If you want to stop the service, # service squid stop Then your console will display:

20

[OK]

Introduction to Squid

Stopping squid: .

[OK]

You should be a privileged user to start or stop squid. For permanent step, try this command # chkconfig –list # chkconfig –-level 5 squid on You can restart the squid service by typing #/etc/init.d/squid restart While the daemon is running, there are several ways you can run the squid command to change how the daemon works by using this options: # squid –k reconfigure - causes Squid to read again its configuration file

#squid –k shutdown - causes Squid to exit after waiting briefly for current connections to exit #squid –k interrupt - shuts down Squid immediately, without waiting for connections to close

#squid –k kill – kills Squid immediately, without closing connections or log files. (use this option only if other methods don’t work)

21

Introduction to Squid

3.6 Basic Client Software Configuration Basic Configuration To configure any browser, you need at least two pieces of information: ƒ

Proxy server's IP Address

ƒ

Port number that the proxy server is accepting the requests

Configuring Internet Browser

The following section will explain the steps to configure proxy server in Internet Explorer, Mozilla Firefox and Opera. Internet Explorer 7.0 1. Select the Tools menu option 2. Select Internet Options 3. Click on the Connection tab 4. Select LAN settings 5. The Internet using a proxy server 6. Check the box in proxy server Æ Type in the proxy IP address in the Address field, and the port number in the Port field. Example:

Address : 10.0.5.10 Port : 3128

Mozilla Firefox 1. Click Tools Æ Options Æ Advanced 2. Click at Network Æ go to connection Æ Settings 3. At the configure proxies to Access Internet

22

Introduction to Squid

4. Choose manual proxy configuration 5. At HTTP Proxy: 10.0.5.10

Port: 3128

6. Check the box to use the proxy server for all protocols 7. Then click OK 8. Now, the client can access the internet. Opera 9.1 1. Click Tools Æ Preferences Æ Advanced 2. Choose Network 3. Click at Proxy Sever Check

HTTP

: 10.0.5.10

Port :3128

HTTPs

: 10.0.5.10

Port :3128

FTP

: 10.0.5.10

Port :3128

Gropher

: 10.0.5.10

Port :3128

4. Then, Click OK

Using proxy.pac File This setting is for the clients when they want to have browsers pick up proxy setting automatically. The browser can be configured with a simple proxy.pac file as shown in the example below;

function FindProxyForURL(url, host) { if (isInNet(myIpAddress(), "10.0.5.0", "255.255.255.0")) return "PROXY 10.0.5.10:3128"; else return "DIRECT"; }

23

Introduction to Squid

proxy.pac needs to be installed in a web server such as Apache, and the client can configure proxy server using the automatic configuration script. This script is useful when there is possibility that the proxy server will change its IP address. To access the script, client needs to add the URL of proxy.pac in its automatic configuration proxy script (Figure 3-4).

Figure 3-4: Using automatic configuration script

24

4

Chapter

4. ACL Configuration 4.1 Access controls Access control lists (ACL) are the most important part in configuring Squid. The main use of the ACL is to implement simple access control where it is used to restrict other people from using cache infrastructure without certain permission. Rules can be written for almost any type of requirement. It can be very complex for large organisations or just a simple configuration to home users. ACL is written in squid.conf file using the following formats acl name type (string|"filename") [string2] ["filename2"] name is a variable defined by user and it should be descriptive while type is defined accordingly and it will be described in the next section .

25

ACL Configuration

There are two elements in access control: classes and operators. Classes are defined by the acl, while the name of the operators varies. The most common operators are http_access and icp_access. The actions for this operator are allow and deny. allow is used to allow or enable the ACL while deny used to deny or restrict the ACL General format for operator http_access

allow|deny

[!]aclname [!]aclname2 ... ]

List of ACL type ACL Type

Details

src

client IP address

srcdomain

client domain name

dst

destination’s IP address

dstdomain

destination’s domain name

srcdom_regex

Regular expression describing client domain name

dstdom_regex

Regular expression describing destination domain name

time

specify the time

url_regex

Regular

expression

describing

whole

URL

of

URL

of

destination (web server) urlpath_regex

Regular

expression

describing

path

of

destination (not include its domain name) port

Specify port number

proto

Specify protocol

method

Specify method

browser

Specify browser

proxy_auth

User authentication via external processes

maxconn

Specify number of connection

26

ACL Configuration

src Description This ACL allows server to recognize client (the computer which will use server as proxy to get access to the internet ) using its IP address. The IP address can be listed using single IP address, range of IP or using defined IP address in an external file. Syntax acl

aclname

src

ip-address/netmask .. (clients IP address)

acl

aclname

src

addr1-addr2/netmask .. (range of addresses)

acl

aclname

src

“filename” ..(client's IP address in external file)

Example 1 acl fullaccess src “/etc/squid/fullaccess.txt” http_access allow fullaccess This ACL is using external file named fullaccess.txt where fullaccess.txt consist of list of IP address of the client. Example of fullaccess.txt 198.123.56.12 198.123.56.13 198.123.56.34 Example 2 acl office.net src 192.123.56.0/255.255.255.0 http_access allow office.net This ACL set the source address for office.net in range 192.123.56.x to access the Internet using http_access allow operator

27

ACL Configuration

srcdomain Description This ACL allows server to recognize client using client’s computer name. To do so, squid needs to reverse DNS lookup (from client ipaddress to client domain-name) before this ACL is interpreted, it can cause processing delays. Syntax acl

aclname

srcdomain domain-name..(reverse lookup client IP)

Example 1 acl staff.net srcdomain staff20 staff21 http_access allow staff.net This ACL is for clients with computer name staff20 and staff21. The operator http_access is allowing the ACL named staff.net to access the Internet. This option is not really effective since the computer must do reverse name lookup to determine the source name.

NOTE: Please ensure the DNS server in running in order to use DNS lookup service

28

ACL Configuration

dst Description This is same as src, the difference is only it refers to Server’s IP address (destination). First, Squid will dns-lookup for IP Address from the domain-name, which is in request header, and then interpret it Syntax acl

aclname

dst

ip_address/netmask .. (URL host's or the site

IP address) Example 1 acl tunnel dst 209.8.233.0/24 http_access deny tunnel This ACL deny any node with IP 209.8.233.x Example 2 acl allow_ip dst 209.8.233.0-209.8.233.100/255.255.0.0 http_access allow allow_ip This ACL is allowing destination with IP address range from 209.8.233.0 to 209.8.233.100.

dstdomain Description This ACL recognize destination using its domain. This is the effective method to control specific domain Syntax acl

aclname

dstdomain

domain.com

(domain name from the site's URL)

29

ACL Configuration

Example 1 acl banned_domain dstdomain www.terrorist.com http_access deny banned_domain This ACL deny destionation with domain www.terrorist.com

srcdom_regex Description This ACL is almost similar to srcdomain where the server needs to reverse DNS lookup (from client ip-address to client domain-name) before this ACL is interpreted. The difference is this ACL allow the usage of regular expression in defining the client’s domain. Syntax acl

aclname

srcdom_regex -i

source_domain_regex

Example 1 acl staff.net srcdom_regex -i staff http_access allow staff.net This ACL allows all the node with the domain contains word staff to access the internet. Option -i is used to make expression caseinsensitive

dstdom_regex Description This ACL allows server to recognize destination using its domain regular expression. Syntax acl

30

aclname

dstdom_regex -i

dst_domain_regex

ACL Configuration

Example 1 acl banned_domain dstdom_regex -i terror porn http_access deny banned_domain This ACL denies client to access the destinations that contain word terrorist or porn in its domain name. For example the access to the domain www.terrorist.com and www.pornoragphy.net will be denied by proxy server.

time Description This ACL allows server to control the service using time function. The accessibility to the network can be set according the scheduled time in ACL Syntax acl

aclname

time

day abbrevs h1:m1h2:m2

where h1:m1 must be less than h2:m2 and day will be represented using abbreviation in Table 4-1

day

abbreviations

S

Sunday

M

Monday

T

Tuesday

W

Wednesday

H

Thursday

F

Friday

A

Saturday

Table 4-1 Abbreviation for Day

31

ACL Configuration

Example 1 acl SABTU time A 9:00-17:00 ACL SABTU refers to day of Saturday from 9:00 to 17:00 Example 2 acl pagi time 9:00-11:00 acl office.net 10.2.3.0/24 http_access deny pagi office.net pagi refers time from 9:00 to 11:00, while office.net refer to the clients' IP. This combination of ACLs deny the access for office.net if the time is between 9.00am to 11.00 am

url_regex Description The url_regex means to search the entire URL for the regular expression you specify. Note that these regular expressions are casesensitive. To make them case-insensitive, use the -i option Syntax acl

aclname

url_regex -i

url_regex ..

Example 1 acl banned_url url_regex -i terror porn http_access deny banned_url This ACL deny URL that contains word terrorist or porn. For example, the following destination will be denied by the proxy server; http://www.google.com/pornography http://www.news.com/terrorist.html http://www.terror.com/

32

ACL Configuration

urlpath_regex Description The urlpath_regex is regular expression pattern matching from URL but excluding protocol and hostname. If

URL

is

http://www.free.com/latest/games/tetris.exe

then

this

acltype only looks after http://www.free.com/. It will leave out the http protocol and www.free.com hostname. Syntax acl

aclname

urlpath_regex

pattern

Example 1 acl blocked_free urlpath_regex free http_access deny blocked_free This ACL will blocked any URL that only containing "free'' not "Free”, and without referring to protocol and hostname. These regular expressions are case-sensitive. To make them caseinsensitive, add the –i option. Example 2 acl blocked_games urlpath_regex –i games http_access deny blocked_games blocked_games refers to the URL containing word “games” no matter if the spelling in upper or lower case. Example 3 To block several URL. acl block_site urlpath_regex –i “/etc/squid/acl/block_site” http_access deny block_site

33

ACL Configuration

To block several URL, it is recommended to put the lists in one file. As in Example 3, all block_site list is in /etc/squid/acl/block_site file. File block_site may containing, for example \.exe$ \.mp3$

port Description Access can be controlled by destination (server) port address Syntax acl

aclname

port port-number

Example 1 Deny requests to unknown ports acl Safe_ports port 80 acl Safe_ports port 21 acl Safe_ports port 443 563

# http # ftp # https, snews

http_access deny !Safe_ports Example 2 Deny to several untrusted ports acl safeport port “/etc/squid/acl/safeport” http_access deny safeport

34

ACL Configuration

proto Description This specifies the transfer protocol Syntax acl

aclname

proto

protocol

Example 1 acl protocol proto HTTP FTP This refers protocols HTTP and FTP Example 2 acl manager proto cache_object http_access allow manager localhost http_access deny manager Only allow cachemgr access from localhost. Example 3 acl ftp proto FTP http_access deny ftp http_access allow all This command should block every ftp request

35

ACL Configuration

method Description This specifies the type of the method of the request Syntax acl

aclname

method

method-type

Example 1 acl connect method CONNECT http_access allow localhost http_access allow allowed_clients http_access deny connect the CONNECT method to prevent outside people from trying to connect to the proxy server

browser Description Regular expression pattern matching on the request's user-agent header. To grep the user-agent header information, squid.conf should be added this line: useragent_log /var/log/squid/useragent.log Then, try to run the Mozilla browser. The user-agent header for Mozilla should be as in the example. Syntax acl

aclname

browser

pattern

Example 1 acl mozilla browser ^Mozilla/5\.0 http_access deny mozilla This command will deny Mozilla browsers or any other browser related to it. 36

ACL Configuration

proxy_auth Description User authentication via external processes. proxy_auth requires an EXTERNAL

authentication

program

to

check

username/password

combinations. In this configuration, we use the NCSA authentication method because it is the easiest method to implement. Syntax acl

aclname

proxy_auth

username...

Example 1 To validate a listing of users, we should do the following steps. Creating passwd file # touch # chown # chmod

/etc/squid/passwd root.squid /etc/squid/passwd 640 /etc/squid/passwd

Adding users # htpasswd

/etc/squid/passwd shah

You will be prompted to enter a passwd for that user. In the example is the passwd for user shah. Setting rules auth_param basic program /usr/lib/squid/ncsa_auth /etc/squid/passwd auth_param basic children 5 auth_param basic realm Squid proxy-caching web-server auth_param basic credentialsttl 2 hours These listings are already in the configuration file but need to be adjusted to suit your environments.

37

ACL Configuration

Authentication configuration acl LOGIN proxy_auth REQUIRED http_access allow LOGIN This command will only allow user that have been authenticated during accessing network connection. CAUTION !! proxy_auth can't be used in a transparent proxy.

maxconn Description A limit on the maximum number of connections from a single client IP address. It is an ACL that will be true if the user has more than maxconn connections open. Syntax acl

aclname

maxconn

number_of_connection

Example 1 acl someuser src 10.0.5.0/24 acl 5conn maxconn 5 http_access deny someuser 5conn The command will restrict users in 10.0.5.0/24 subnet to have only five (5) maximum connections at once. If exceed, the error page will appear. Other users are not restricted to this command by adding the last line.

CAUTION !! The maxconn ACL requires the client_db feature. If client_db is disabled (for example with client_db off) then maxconn ALCs will not work.

38

ACL Configuration

Create custom error page # vi /etc/squid/error/ERROR_MESSAGE Append the following

ERROR : ACCESS DENIED FROM PROXY SERVER

The site is blocked due to IT policy

Please contact helpdesk for more information:

Phone: 06-2333333 (ext 33)
Email: [email protected]


CAUTION !! Do not include HTML close tags

Displaying custom error message acl blocked_port port 80 deny_info ERROR_MESSAGE block_port http_access deny block_port

39

ACL Configuration

4.2 Exercises 1.

Why the users still can do the download process with the

following configuration. acl download urlpath_regex -i \.exe$ acl office_hours time 09:00-17:00 acl GET method GET acl it_user1 src 192.168.1.88 acl it_user2 src 192.168.1.89 acl nodownload1 src 192.168.1.10 acl nodownload2 src 192.168.1.11 http_access http_access http_access http_access

allow allow allow allow

it_user1 it_user2 nodownload1 nodownload2

http_access deny GET office_hours nodownload1 nodownload2 http_access deny all The configuration should deny the nodownload1 and nodownload2. the allow lines should be deleted.

40

ACL Configuration

2.

Why this configuration still bypasses the game.free.com?

acl ban dstdomain free.com http_access deny ban

3.

The following access control configuration will never work. Why?

acl ME src 10.0.0.1 acl YOU src 10.0.0.2 http_access allow ME YOU

41

Caching

5

Chapter

5. Caching 5.1 Concepts ƒ

Caching (a.k.a proxy server) is the process of storing data on the intermediate system between the Web server and the client.

ƒ

The proxy server can simply send the content requested by the client form it copy in cache.

ƒ

The assumption is that later requests for the same data can be serviced more quickly by not having to go all the way back to the original server.

ƒ

Caching also can reduce demands on network resources and on the information servers.

5.2 Configuring a cache for proxy server There are a lot of parameters related to caching in Squid and these parameters can be divided into three main groups as below: A. Cache size B. Cache directories and log file path name C. Peer cache servers and Squid hierarchy

42

Caching

However, in the following subsection, only the first two groups will be covered. A. Cache Size The following are the common parameters used in cache size. i. cache_mem Syntax cache_mem size(MB) This parameter specifies the amount of cache memory (RAM) used to store in-transit object (ones that are currently being used), hot objects (one that are used often) and negative-cached object (recent failed request). Default size value is 8MB. Example: cache_mem 16 MB

ii. maximum_object_size Syntax maximum_object_size

size(MB)

This parameter used if you want not to cache file that are larger or equal to the size set. Default size value is 4MB. Example: maximum_object_size 8 MB

43

Caching

iii. ipcache_size Syntax ipcache_size

size(MB)

This parameter used to set how many IP address resolution values Squid stores. Default value size is 1MB. Example: ipcache_size 32MB iv. ipcache_high Syntax ipcache_high percentage This parameter specifies the percentage that causes Squid to start clearing out the least-used IP address resolution. Usually the default value is always used. Example: ipcache_high 95 v. ipcache_low Syntax ipcache_low

percentage

This parameter specifies the percentage that causes Squid to stop clearing out the least-used IP address resolution. Usually the default value is always used.

44

Caching

Example: ipcache_low 90 B. Cache Directories i. cache_dir Syntax cache_dir

type dir

size(MB)

L1

L2

This parameter specifies the directory/directories in which cache swap files are stored. The default dir is /var/spool/squid directory. We can specify how much disk space to use for cache in megabytes (100 is the default), the default number of firstlevel directories (L1) and second-level directories (L2) is 16 and 256 respectively. Example: cache_dir aufs /var/cache01 7000 16 256

NOTE: /var/cache01 is a partition that have been created during Linux Fedora installation

Formula to calculate the first-level directories (L1): Given : x=Size of cache dir in KB (e.g., 6GB = 6,000,000KB) y=Average object size (e.g, 13KB) z=Objects per L2 directories (Assuming 256) calculate: L1 = number of L1 directories L2 = number of L2 directories such that: L1 x L2 = x / y / z 45

Caching

Example : x = 6GB = 6 * 1024 *1024 = 6291456 KB so ; x / y / z = 6291456 / 13 / 256 = 1890 and L1 * L2 = x / y / z L1 * 256 = 1890 L1

= 7

ii. access_log Syntax cache_log dir This parameter specifies the location where the HTTP and ICP accesses are stored. The default dir /var/log/squid/access.log is always used. Example: cache_log /var/log/squid/access.log

46

6

Chapter

6. SQUID and Webmin 6.1 About Webmin Webmin is a graphical user interface for system administration for Unix. It is a web-based system and can be installed in most of the Unix system. Webmin is a free software and the installation package can be downloaded from the Net. Webmin is largely based on Perl, and it is running as its own process, and web server. It usually uses TCP port 10000 for communicating, and can be configured to use SSL if OpenSSL is installed.

6.2 Obtaining and Installing Webmin Webmin installation package is available at the official Webmin site http://www.webmin.com/download.html. You can download the latest package and locate it in the local machine.

47

SQUID and Webmin

Installation of Webmin differs slightly depending on which type of package you choose to install. Note that Webmin requires a relatively recent Perl for any of these installation methods to work. Nearly all, if not all, modern UNIX and UNIX-like OS variants now include Perl as a standard component of the OS, so this should not be an issue.

Installing from a tar. gz First you must untar and unzip the archive in the directory where you would like Webmin to be installed. The most common location for installation from tarballs is /usr/local. Some sites prefer /opt. If you’re using GNU tar, you can do this all on one command line: #tar zxvf webmin-1.340.tar.gz If you have a less capable version of tar, you must unzip the file first and then untar it: # gunzip webmin-1.340.tar.gz # tar xvf webmin-1.340.tar.gz Next, you need to change to the directory that was created when you untarred the archive, and execute the setup.sh script, as shown in the following example. The script will ask several questions about your system and your preferences for the installation. Generally, accepting the default values will work. The command for installation as below: # ./setup.sh

Installing from an RPM Installing from an RPM is even easier. You only need to run one command: # rpm -Uvh webmin-1.340-1.noarch.rpm

48

SQUID and Webmin

This will copy all of the Webmin files to the appropriate locations and run the install script with appropriate default values. For example, the Webmin perl files will be installed in /usr/libexec/webmin while the configuration files will end up in /etc/webmin. Webmin will then be started on port 10000. You may log in using root as the login name and your system root password as the password. It's unlikely you will need to change any of these items from the command line, because they can all be modified using Webmin. If you do need to make any changes, you can do so in miniserv.conf in /etc/webmin.

After Installation After

installation,

your

Webmin

installation

will

behave

nearly

identically, regardless of operating system vendor or version, location of installation, or method of installation. The only apparent differences between systems will be that some have more or fewer modules because some are specific to one OS. Others will feature slightly different versions of modules to take into account different functioning of the underlying system. For example, the package manager module may behave differently, or be missing from the available options entirely, depending on your OS.

6.3 Using Squid in Webmin

To launch Webmin, open a web browser, such as Netscape or Mozilla Firefox, on any machine that has network access to the server on which you wish to log in. Browse to port 10000 on the IP or host name of the server using http://computername:10000/. Go to menu Squid Proxy Server (in submenu Server) to open the main panel (Figure 6-1)

49

SQUID and Webmin

Figure 6-1: Squid Proxy Main Page

6.4 Ports and Networking The Ports and Networking page provides you with the ability to configure most of the network level options of Squid. Squid has a number of options to define what ports Squid operates on, what IP addresses it uses for client traffic and intercache traffic, and multicast options. Usually, on dedicated caching systems these options will not be useful. But in some cases you may need to adjust these to prevent the Squid daemon from interfering with other services on the system or on your network.

50

SQUID and Webmin

Proxy port Sets the network port on which Squid operates. This option is usually 3128 by default and can almost always be left on this address, except when multiple Squids are running on the same system, which is usually ill-advised. This option corresponds to the http_port option in squid.conf.

ICP port This is the port on which Squid listens for Internet Cache Protocol, or ICP, messages. ICP is a protocol used by web caches to communicate and share data. Using ICP it is possible for multiple web caches to share cached entries so that if any one local cache has an object, the distant origin server will not have to be queried for the object. Further, cache hierarchies can be constructed of multiple caches at multiple privately interconnected sites to provide improved hit rates and higherquality web response for all sites. More on this in later sections. This option correlates to the icp_port directive.

Incoming TCP address The address on which Squid opens an HTTP socket that listens for client connections and connections from other caches. By default Squid does not bind to any particular address and will answer on any address that is active on the system. This option is not usually used, but can provide some additional level of security, if you wish to disallow any outside network users from proxying through your web cache. This option correlates to the tcp_incoming_address directive.

51

SQUID and Webmin

Outgoing TCP address Defines the address on which Squid sends out packets via HTTP to clients and other caches. Again, this option is rarely used. It refers to the tcp_ outgoing_address directive.

Incoming UDP address Sets the address on which Squid will listen for ICP packets from other web caches. This option allows you to restrict which subnets will be allowed to connect to your cache on a multi-homed, or containing multiple

subnets,

Squid

host.

This

option

correlates

to

the

udp_incoming_address directive.

Outgoing UDP address The address on which Squid will send out ICP packets to other web caches. This option correlates to the udp_outgoing_address.

Multicast groups The multicast groups that Squid will join to receive multicast ICP requests. This option should be used with great care, as it is used to configure your Squid to listen for multicast ICP queries. Clearly if your server is not on the MBone, this option is useless. And even if it is, this may not be an ideal choice.

52

SQUID and Webmin

TCP receive buffer The size of the buffer used for TCP packets being received. By default Squid uses whatever the default buffer size for your operating system is. This should probably not be changed unless you know what you’re doing, and there is little to be gained by changing it in most cases. This correlates to the tcp_recv_bufsize directive.

6.5 Other Caches The Other Caches page provides an interface to one of Squid’s most interesting, but also widely misunderstood, features. Squid is the reference implementation of ICP, a simple but effective means for multiple caches to communicate with each other regarding the content that is available on each. This opens the door for many interesting possibilities when one is designing a caching infrastructure.

Internet Cache Protocol It is probably useful to discuss how ICP works and some common usages for ICP within Squid, in order to quickly make it clear what it is good for, and perhaps even more importantly, what it is not good for. The most popular uses for ICP are discussed, and more good ideas will probably arise in the future as the Internet becomes even more global in scope and the web-caching infrastructure must grow with it.

53

SQUID and Webmin

Parent and Sibling Relationships The ICP protocol specifies that a web cache can act as either a parent or a sibling. A parent cache is simply an ICP capable cache that will answer both hits and misses for child caches, while a sibling will only answer hits for other siblings. This subtle distinction means simply that a parent cache cans proxy for caches that have no direct route to the Internet. A sibling cache, on the other hand, cannot be relied upon to answer all requests, and your cache must have another method to retrieve requests that cannot come from the sibling. This usually means that in sibling relationships, your cache will also have a direct connection to the Internet or a parent proxy that can retrieve misses from the origin servers. ICP is a somewhat chatty protocol, in that an ICP request will be sent to every neighbor cache each time a cache miss occurs. By default, whichever cache replies with an ICP hit first will be the cache used to request the object.

When to Use ICP? ICP is often used in situations wherein one has multiple Internet connections, or several types of paths to Internet content. Finally, it is possible,

though

usually

not

recommended,

to

implement

a

rudimentary form of load balancing through the use of multiple parents and multiple child web caches. One of the common uses of ICP is cache meshes. A cache mesh is, in short, a number of web caches at remote sites interconnected using ICP. The web caches could be in different cities, or they could be in different buildings of the same university or different floors in the same office building. This type of hierarchy allows a large number of caches to benefit from a larger client population than is directly available to it.

54

SQUID and Webmin

All other things being equal, a cache that is not overloaded will perform better (with regard to hit ratio) with a larger number of clients. Simply put, a larger client population leads to a higher quality of cache content, which in turn leads to higher hit ratios and improved bandwidth savings. So, whenever it is possible to increase the client population without overloading the cache, such as in the case of a cache mesh, it may be worth considering. Again, this type of hierarchy can be improved upon by the use of Cache Digests, but ICP is usually simpler to implement and is a widely supported standard, even on non-Squid caches. Finally, ICP is also sometimes used for load balancing multiple caches at the same site. ICP, or even Cache Digests for that matter, are almost never the best way to implement load balancing. Using ICP for load balancing can be achieved in a few ways. •

Through have several local siblings, which can each provide hits to the others’ clients, while the client load is evenly divided across the number of caches.



Using fast but low-capacity web cache in front of two or more lower-cost, but higher-capacity, parent web caches. The parents will then provide the requests in a roughly equal amount.

6.6 Other Proxy Cache Servers This section of the Other Caches page provides a list of currently configured sibling and parent caches, and also allows one to add more neighbor caches. Clicking the name of a neighbor cache will allow you to edit it. This section also provides the vital information about the neighbor caches, such as the type (parent, sibling, multicast), the proxy or HTTP port, and the ICP or UDP port of the caches. Note that

55

SQUID and Webmin

Proxy port is the port where the neighbor cache normally listens for client traffic, which defaults to 3128.

Edit Cache Host Clicking a cache peer name or clicking Add another cache on the primary Other Caches page brings you to this page, which allows you to edit most of the relevant details about neighbor caches (Figure 6-2)

Figure 6-2: Create cache Host page

Hostname The name or IP address of the neighbor cache you want your cache to communicate with. Note that this will be one-way traffic. Access Control Lists, or ACLs, are used to allow ICP requests from other caches. ACLs are covered later. This option plus most of the rest of the options on this page correspond to cache_ peer lines in squid.conf.

56

SQUID and Webmin

Type The type of relationship you want your cache to have with the neighbor cache. If the cache is upstream, and you have no control over it, you will need to consult with the administrator to find out what kind of relationship you should set up. If it is configured wrong, cache misses will likely result in errors for your users. The options here are sibling, parent, and multicast.

Proxy port The port on which the neighbor cache is listening for standard HTTP requests. Even though the caches transmit availability data via ICP, actual web objects are still transmitted via HTTP on the port usually used for standard client traffic. If your neighbor cache is a Squid-based cache, then it is likely to be listening on the default port of 3128. Other common ports used by cache servers include 8000, 8888, 8080, and even 80 in some circumstances.

ICP port The port on which the neighbor cache is configured to listen for ICP traffic. If your neighbor cache is a Squid-based proxy, this value can be found by checking the icp_port directive in the squid.conf file on the neighbor cache. Generally, however, the neighbor cache will listen on the default port 3130.

57

SQUID and Webmin

Proxy only? A simple yes or no question to tell whether objects fetched from the neighbor cache should be cached locally. This can be used when all caches are operating well below their client capacity, but disk space is at a premium or hit ratio is of prime importance.

Send ICP queries? Tells your cache whether or not to send ICP queries to a neighbor. The default is Yes, and it should probably stay that way. ICP queries is the method by which Squid knows which caches are responding and which caches are closest or best able to quickly answer a request.

Default cache This is switched to Yes if this neighbor cache is to be the last-resort parent cache to be used in the event that no other neighbor cache is present as determined by ICP queries. Note that this does not prevent it from being used normally while other caches are responding as expected. Also, if this neighbor is the sole parent proxy, and no other route to the Internet exists, this should be enabled.

Round-robin cache? Choose whether to use round-robin scheduling between multiple parent caches in the absence of ICP queries. This should be set on all parents that you would like to schedule in this way.

58

SQUID and Webmin

ICP time-to-live Defines the multicast TTL for ICP packets. When using multicast ICP, it is usually wise for security and bandwidth reasons to use the minimum tty suitable for your network.

Cache weighting Sets the weight for a parent cache. When using this option it is possible to set higher numbers for preferred caches. The default value is 1, and if left unset for all parent caches, whichever cache responds positively first to an ICP query will be sent a request to fetch that object.

Closest only Allows

you

to

specify

that

your

cache

wants

only

CLOSEST_PARENT_MISS replies from parent caches. This allows your cache to then request the object from the parent cache closest to the origin server.

No digest? Chooses whether this neighbor cache should send cache digests. No NetDB exchange When using ICP, it is possible for Squid to keep a database of network information about the neighbor caches, including availability and RTT, or Round Trip Time, information. This usually allows Squid to choose more wisely which caches to make requests to when multiple caches have the requested object.

59

SQUID and Webmin

No delay? Prevents accesses to this neighbor cache from affecting delay pools. Delay pools, discussed in more detail later, are a means by which Squid can regulate bandwidth usage. If a neighbor cache is on the local network, and bandwidth usage between the caches does not need to be restricted, then this option can be used.

Login to proxy Select this if you need to send authentication information when challenged by the neighbor cache. On local networks, this type of security is unlikely to be necessary.

Multicast responder Allows Squid to know where to accept multicast ICP replies. Because multicast is fed on a single IP to many caches, Squid must have some way of determining which caches to listen to and what options apply to that particular cache. Selecting Yes here configures Squid to listen for multicast replies from the IP of this neighbor cache.

Query host for domains, Don’t query for domains These two options are the only options on this page to configure a directive other than cache_peer in Squid. In this case it sets the cache_peer_domain option. This allows you to configure whether requests for certain domains can be queried via ICP and which should not. It is often used to configure caches not to query other caches for content within the local domain. Another common usage, such as in 60

SQUID and Webmin

the national web hierarchies discussed above, is to define which web cache is used for requests destined for different TLDs. So, for example, if one has a low cost satellite link to the U. S. backbone from another country that is preferred for web traffic over the much more expensive land line, one can configure the satellite-connected cache as the cache to query for all .com, .edu, .org, net, .us, and .gov addresses.

Cache Selection Options This

section

provides

configuration

options

for

general

ICP

configuration (Figure 6-3). These options affect all of the other neighbor caches that you define.

Figure 6-3: Global ICP options

Directly fetch URLs containing Allows you to configure a match list of items to always fetch directly rather than query a neighbor cache. The default here is cgi-bin ? and should continue to be included unless you know what you’re doing. This helps prevent wasting intercache bandwidth on lots of requests that are usually never considered cacheable, and so will never return hits

from

your

neighbor

caches.

This

option

sets

the

hierarchy_stoplist directive. 61

SQUID and Webmin

ICP query timeout The time in milliseconds that Squid will wait before timing out ICP requests. The default allows Squid to calculate an optimum value based on average RTT of the neighbor caches. Usually, it is wise to leave this unchanged. However, for reference, the default value in the distant past was 2000, or 2 seconds. This option edits the icp_ query_ timeout directive.

Multicast ICP timeout Timeout in milliseconds for multicast probes, which are sent out to discover the number of active multicast peers listening on a given multicast address. This configures the mcast_icp_query_timeout directive and defaults to 2000 ms, or 2 seconds.

Dead peer timeout Controls how long Squid waits to declare a peer cache dead. If there are no ICP replies received in this amount of time, Squid will declare the peer dead and will not expect to receive any further ICP replies. However, it continues to send ICP queries for the peer and will mark it active again on receipt of a reply. This timeout also affects when Squid expects to receive ICP replies from peers. If more than this number of seconds has passed since the last ICP reply was received, Squid will not expect to receive an ICP reply on the next query. Thus, if your time between requests is greater than this timeout, your cache will send more requests DIRECT rather than through the neighbor caches.

62

SQUID and Webmin

Memory Usage This page provides access to most of the options available for configuring the way Squid uses memory and disks (Figure 6-4). Most values on this page can remain unchanged, except in very high load or low resource environments, where tuning can make a measurable difference in how well Squid performs. Gambar memory usage

Figure 6-4: Memory and disk usage

Memory usage limit The limit on how much memory Squid will use for some parts of its core data. Note that this does not restrict or limit Squid’s total process size. What it does do is set aside a portion of RAM for use in storing intransit and hot objects, as well as negative cached objects. Generally, the default value of 8MB is suitable for most situations, though it is safe to lower it to 4 or 2MB in extremely low load situations. It can also be raised significantly on high-memory systems to increase 63

SQUID and Webmin

performance by a small margin. Keep in mind that large cache directories increase the memory usage of Squid by a large amount, and even a machine with a lot of memory can run out of memory and go into swap if cache memory and disk size are not appropriately balanced. This option edits the cache_mem directive. See the section on cache directories for more complete discussion of balancing memory and storage.

Maximum cached object size The size of the largest object that Squid will attempt to cache. Objects larger than this will never be written to disk for later use. Refers to the maximum_object_size directive. IP address cache size, IP cache highwater mark, IP address low-water mark The size of the cache used for IP addresses and the high and low water marks for the cache, respectively. This option configures the ipcache_size, ipcache_high, and ipcache_low directives, which default to 1024 entries, 95%, and 90%.

6.7 Logging Squid provides a number of logs that can be used when debugging problems and when measuring the effectiveness and identifying users and the sites they visit (Figure 6-5). Because Squid can be used to “snoop” on user’s browsing habits, one should carefully consider privacy laws in your region and, more importantly, be considerate to your users. That being said, logs can be very valuable tools in ensuring that your users get the best service possible from your cache.

64

SQUID and Webmin

Figure 6-5: Logging configuration

Cache metadata file Filename used in each store directory to store the Web cache metadata, which is a sort of index for the Web cache object store. This is not a human readable log, and it is strongly recommended that you leave it in its default location on each store directory, unless you really know what you're doing. This option correlates to the cache_swap_log directive.

Use HTTPD log format Allows you to specify that Squid should write its access.log in HTTPD common log file format, such as that used by Apache and many other Web servers. This allows you to parse the log and generate reports using a wider array of tools. However, this format does not provide several types of information specific to caches, and is generally less 65

SQUID and Webmin

useful when tracking cache usage and solving problems. Because there are several effective tools for parsing and generating reports from the Squid standard access logs, it is usually preferable to leave this at its default of being off. This option configures the emulate_httpd_log directive. The Calamaris cache access log analyzer does not work if this option is enabled.

Log full hostnames Configures whether Squid will attempt to resolve the host name, so the the fully qualified domain name can be logged. This can, in some cases, increase latency of requests. This option correlates to the log_fqdn directive.

Logging netmask Defines what portion of the requesting client IP is logged in the access.log. For privacy reasons it is often preferred to only log the network or subnet IP of the client. For example, a netmask of 255.255.255.0 will log the first three octets of the IP, and fill the last octet with a zero. This option configures the client_netmask directive.

66

SQUID and Webmin

6.8 Cache Options The Cache Options page provides access to some important parts of the Squid configuration file. This is where the cache directories are configured as well as several timeouts and object size options (Figure 6-6).

Figure 6-6: Configuring Squids Cache Directories

The directive is cache_dir while the options are the type of filesystem, the path to the cache directory, the size allotted to Squid, the number of top level directories, and finally the number of second level directories. In the example, I've chosen the filesystem type ufs, which is a name for all standard UNIX filesystems. This type includes the standard Linux ext2 filesystem as well. Other possibilities for this option include aufs and diskd.

The next field is simply the space, in megabytes, of the disk that you want to allow Squid to use. Finally, the directory fields define the upper and lower level directories for Squid to use

67

SQUID and Webmin

6.9 Access Control There are three types of option for configuring ICP access control. These three types of definition are separated in the Webmin panel into three sections. The first is labeled Access control lists, which lists existing ACLs and provides a simple interface for generating and editing lists of match criteria (Figure 6-7). The second is labeled Proxy restrictions and lists the current restrictions in place and the ACLs they effect. Finally, the ICP restrictions section lists the existing access rules regarding ICP messages from other Web caches.

Figure 6-7: Access Control Lists

68

SQUID and Webmin

Access Control Lists The first field in the table represents the name of the ACL, which is simply an assigned name, that can be just about anything the user chooses. The second field is the type of the ACL, which can be one of a number of choices that indicates to Squid what part of a request should be matched against for this ACL. The possible types include the requesting clients address, the Web server address or host name, a regular expression matching the URL, and many more. The final field is the actual string to match. Depending on what the ACL type is, this may be an IP address, a series of IP addresses, a URL, a host name, etc.

Edit an ACL To edit an existing ACL, simply click on the highlighted name. You will then be presented with a screen containing all relevant information about the ACL. Depending on the type of the ACL, you will be shown different data entry fields. The operation of each type is very similar, so for this example, you'll step through editing of the localhost ACL. Clicking the localhost button presents the page that's shown in Figure 6-8

Figure 6-8: Edit an ACL

69

SQUID and Webmin

The title of the table is Client Address ACL which means the ACL is of the Client Address type, and tells Squid to compare the incoming IP address with the IP address in the ACL. It is possible to select an IP based on the originating IP or the destination IP. The netmask can also be used to indicate whether the ACL matches a whole network of addresses, or only a single IP. It is possible to include a number of addresses, or ranges of addresses in these fields. Finally, the Failure URL is the address to send clients to if they have been denied access due to matching this particular ACL. Note that the ACL by itself does nothing, there must also be a proxy restriction or ICP restriction rule that uses the ACL for Squid to use the ACL.

Creating new ACL Creating a new ACL is equally simple (Figure 6-9). From the ACL page, in the Access control lists section, select the type of ACL you'd like to create. Then click Create new ACL. From there, as shown, you can enter any number of ACLs for the list.

Figure 6-9: Creating an ACL

70

SQUID and Webmin

Available ACL Types Browser Regexp A regular expression that matches the client’s browser type based on the user agent header. This allows for ACL's operating based on the browser type in use, for example, using this ACL type, one could create an ACL for Netscape users and another for Internet Explorer users. This could then be used to redirect Netscape users to a Navigator enhanced page, and IE users to an Explorer enhanced page. Probably not the wisest use of an administrators time, but does indicate the unmatched flexibility of Squid. This ACL type correlates to the browser ACL type. Client IP Address The IP address of the requesting client, or the clients IP address. This option refers to the src ACL in the Squid configuration file. An IP address and netmask are expected. Address ranges are also accepted. Client Hostname Matches against the client domain name. This option correlates to the srcdomain ACL, and can be either a single domain name, or a list or domain names, or the path to a file that contains a list of domain names. If a path to a file, it must be surrounded parentheses. This ACL type can increase the latency, and decrease throughput significantly on a loaded cache, as it must perform an address-to-name lookup for each request, so it is usually preferable to use the Client IP Address type.

71

SQUID and Webmin

Client Hostname Regexp Matches against the client domain name. This option correlates to the srcdom_regex ACL, and can be either a single domain name, or a list of domain names, or a path to a file that contains a list of domain names. If a path to a file, it must be surrounded parentheses Date and Time This type is just what it sounds like, providing a means to create ACLs that are active during certain times of the day or certain days of the week. This feature is often used to block some types of content or some sections of the Internet during business or class hours. Many companies block pornography, entertainment, sports, and other clearly non-work related sites during business hours, but then unblock them after hours. This might improve workplace efficiency in some situations (or it might just offend the employees). This ACL type allows you to enter days of the week and a time range, or select all hours of the selected days. This ACL type is the same as the time ACL type directive. Ethernet Address The ethernet or MAC address of the requesting client. This option only works for clients on the same local subnet, and only for certain platforms. Linux, Solaris, and some BSD variants are the supported operating systems for this type of ACL. This ACL can provide a somewhat secure method of access control, because MAC addresses are usually harder to spoof than IP addresses, and you can guarantee that your clients are on the local network (otherwise no ARP resolution can take place).

72

SQUID and Webmin

External Auth This ACL type calls an external authenticator process to decide whether the request will be allowed. Note that authentication cannot work on a transparent proxy or HTTP accelerator. The HTTP protocol does not provide for two authentication stages (one local and one on remote Web sites). So in order to use an authenticator, your proxy must operate as a traditional proxy, where a client will respond appropriately to a proxy authentication request as well as external Web server authentication requests. This correlates to the proxy_auth directive.

External Auth Regex As above, this ACL calls an external authenticator process, but allows regex pattern or case insensitive matches. This option correlates to the proxy_auth_regex directive.

Proxy IP Address The local IP address on which the client connection exists. This allows ACLs to be constructed that only match one physical network, if multiple interfaces are present on the proxy, among other things. This option configures the myip directive.

Request Method This ACL type matches on the HTTP method in the request headers. This includes the methods GET, PUT, etc. This corresponds to the method ACL type directive.

73

SQUID and Webmin

URL Path Regex This ACL matches on the URL path minus any protocol, port, and host name

information.

It

does

not

include,

for

example,

the

"http://www.swelltech.com" portion of a request, leaving only the actual path to the object. This option correlates to the urlpath_regex directive. URL Port This ACL matches on the destination port for the request, and configures the port ACL directive. URL Protocol This ACL matches on the protocol of the request, such as FTP, HTTP, ICP, etc. URL Regexp Matches using a regular expression on the complete URL. This ACL can be used to provide access control based on parts of the URL or a case insensitive match of the URL, and much more. This option is equivalent to the url_regex ACL type directive.

Web Server Address This ACL matches based on the destination Web server's IP address. Squid a single IP, a network IP with netmask, as well as a range of addresses

in

the

form

"192.168.1.1-192.168.1.25".

This

option

correlates to the dst ACL type directive.

Web Server Hostname This ACL matches on the host name of the destination Web server.

74

SQUID and Webmin

Web Server Regexp Matches using a regular expression on the host name of the destination Web server.

6.10

Administrative Options

Administrative Options provides access to several of the behind the scenes options of Squid. This page allows you to configure a diverse set of options, including the user ID and group ID of the Squid process, cache hierarchy announce settings, and the authentication realm (Figure 6-10)

Figure 6-10: Administrative Options

Run as Unix user and group The user name and group name Squid will operate as. Squid is designed to start as root but very soon after drop to the user/group specified here. This allows you to restrict, for security reasons, the permissions that Squid will have when operating. By default, Squid will operate as either nobody user and the nogroup group, or in the case of some Squids installed from RPM as squid user and group. These

75

SQUID and Webmin

options

correlate

to

the

cache_effective_user

and

cache_effective_group directives.

Proxy authentication realm The

realm

that

will

be

reported

to

clients

when

performing

authentication. This option usually defaults to Squid proxy-caching web server, and correlates to the proxy_auth_realm directive. This name will likely appear in the browser pop-up window when the client is asked for authentication information. Cache manager email address The email address of the administrator of this cache. This option corresponds to the cache_mgr directive and defaults to either webmaster or root on RPM based systems. This address will be added to any error pages that are displayed to clients. Visible hostname The host name that Squid will advertise itself on. This affects the host name that Squid uses when serving error messages. This option may need to be configured in cache clusters if you receive IP-Forwarding errors. This option configures the visible_hostname.

Unique hostname Configures the unique_hostname directive, and sets a unique host name for Squid to report in cache clusters in order to allow detection of forwarding loops. Use this if you have multiple machines in a cluster with the same Visible Hostname. Cache announce host, port and file The host address and port that Squid will use to announce its availability to participate in a cache hierarchy. The cache announce file is simply a file containing a message to be sent with announcements. 76

SQUID and Webmin

These options correspond to the announce_host, announce_port, and announce_file directives.

Announcement period Configures the announce_period directive, and refers to the frequency at which Squid will send announcement messages to the announce host.

Most of the content in Chapter 6 is taken from Unix System Administration with Webmin by Joe Cooper (2002) available online at http://www.swelltech.com/support/webminguide/

77

7

Chapter

7. Analyzer 7.1 Structure of log file In Fedora, the Squid log files are stored in the /var/log/squid directory by default. It makes 3 log files which are:

ƒ

Access log

ƒ

Cache log

ƒ

Store log

Throughout this section, each log attribute will be discussed including it content as well as how these logs might help admin debugging potential problems.

Access log Location : /var/log/squid/access.log

Description ƒ

It contains entries of each time the cache has been hit or missed when a client requests HTTP content.

78

Analyzer

ƒ

The identity of the host making the request (IP address) and the content they are requesting.

ƒ

It also provides the expected time when content is being used from cache and when the remote server must be accessed to obtain the content.

ƒ

It contains the http transactions made by the users.

Format Option 1 : This option will be used if the emulate http daemon log is off. Native format (emulate_httpd_log off) Timestamp Elapsed Client Action/Code Size Method URI Ident Hierarchy/From Content

Option 2 : This option will be used if the emulate http daemon log is on. Common format (emulate_httpd_log on) Client Ident - [Timestamp1] "Method URI" Type Size

With: Timestamp The time when the request is completed (socket closed). The format is "Unix time" (seconds since Jan 1, 1970) with millisecond resolution. Timestamp1 When the request is completed (Day/Month/CenturyYear:Hour:Minute:Second GMT-Offset) Elapsed The elapsed time of the request, in milliseconds. This is the time between the accept() and close() of the client socket.

79

Analyzer

Client The IP address of the connecting client, or the FQDN if the 'log_fqdn' option is enabled in the config file. Action The Action describes how the request was treated locally (hit, miss, etc). Code The HTTP reply code taken from the first line of the HTTP reply header. For ICP requests this is always "000." If the reply code was not given, it will be logged as "555." Size For TCP requests, the amount of data written to the client. For UDP requests, the size of the request. (in bytes) Method The HTTP request method (GET, POST, etc), or ICP_QUERY for ICP requests. URI The requested URI. Ident The result of the RFC931/ident lookup of the client username. If RFC931/ident lookup is disabled (default: `ident_lookup off'), it is logged as - . Hierarchy A description of how and where the requested object was fetched.

80

Analyzer

From Hostname of the machine where we got the object Content Content-type of the Object (from the HTTP reply header). The example of access.log file.

Figure 7-1 Access.log

From Figure 7-1, we know that the native format has been used. Here, we try to understand each format fields over the contents of access.log file. By taking the first line, we found the result as in Table 7-1

Format

Value

Timestamp

1173680297.727

Elapsed

450

Client

10.0.5.10

Action

TCP_MISS

Code

302

Size

786

Method

GET

URI

http://www.google.com/search?

Ident



Hierarchy

DIRECT

From

64.233.189.104

Content

text/html Table 7-1 The format and its value

81

Analyzer

There are some elaborations on: Timestamp ƒ

The timestamp represents in UNIX time with a millisecond resolution. However, it can be converted into more readable form by using this short Perl script: #! /usr/bin/perl -p s/^\d+\.\d+/localtime

$&/e;

Action ƒ

The TCP_ codes (Table 7-2) refer to requests on the HTTP port (usually 3128). Meanwhile the UDP_ codes refer to requests on the ICP port (usually 3130)

Codes

Explanation

TCP_HIT

A valid copy of the requested object was in the cache

TCP_MISS

The requested object was not in the cache

TCP_REFRESH_HIT

The requested object was cached but STALE. The IMS query for the object resulted in "304 not modified"

TCP_REF_FAIL_HIT

The requested object was cached but STALE. The IMS query failed and the stale object was delivered

TCP_REFRESH_MISS

The requested object was cached but STALE. The IMS query returned the new content

82

Analyzer

TCP_CLIENT_REFRESH_MISS The client issued a "no-cache" pragma, or

some

analogous

cache

control

command along with the request. Thus, the cache has to re-fetch the object TCP_IMS_HIT

The client issued an IMS request for an object which was in the cache and fresh

TCP_SWAPFAIL_MISS

The object was believed to be in the cache, but could not be accessed

TCP_NEGATIVE_HIT

Request for a negatively cached object, e.g. "404 not found", for which the cache

believes

inaccessible.

to Also

know refer

that to

it

is the

explainations for negative_ttl in your squid.conf file TCP_SWAPFAIL_MISS

The object was believed to be in the cache, but could not be accessed

TCP_MEM_HIT

A valid copy of the requested object was in the cache and it was in memory, thus avoiding disk accesses

TCP_DENIED

Access was denied for this request

TCP_OFFLINE_HIT

The requested object was retrieved from the cache during offline mode. The offline mode never validates any object, see offline_mode in squid.conf file.

UDP_HIT

A valid copy of the requested object 83

Analyzer

was in the cache UDP_MISS

The requested object is not in this cache

UDP_DENIED

Access was denied for this request

UDP_INVALID

An invalid request was received

UDP_MISS_NOFETCH

During "-Y" startup, or during frequent failures, a cache in hit only mode will return either UDP_HIT or this code. Neighbours will thus only fetch hits

NONE

Seen

with

errors

and

cachemgr

requests

Table 7-2 TCP codes and Explanation

Code ƒ

These codes are taken from RFC 2616 and verified for Squid. Squid-2 uses almost all codes except 307 (Temporary Redirect), 416 (Request Range Not Satisfiable) and 417 (Expectation Failed) Code

84

Explanation

000

Used mostly with UDP traffic

100

Continue

101

Switching Protocols

102

Processing

200

OK

Analyzer

201

Created

202

Accepted

203

Non-Authoritative Information

204

No Content

205

Reset Content

206

Partial Content

207

Multi Status

300

Multiple Choices

301

Moved Permanently

302

Moved Temporarily

303

See Other

304

Not Modified

305

Use Proxy

[307

Temporary Redirect]

400

Bad Request

401

Unauthorized

402

Payment Required

403

Forbidden

404

Not Found

405

Method Not Allowed

406

Not Acceptable

407

Proxy Authentication Required

408

Request Timeout

409

Conflict

410

Gone

411

Length Required

412

Precondition Failed

413

Request Entity Too Large

414

Request URI Too Large

415

Unsupported Media Type

85

Analyzer

[416

Request Range Not Satisfiable]

[417

Expectation Failed]

*424

Locked

*424

Failed Dependency

*433

Unprocessable Entity

500

Internal Server Error

501

Not Implemented

502

Bad Gateway

503

Service Unavailable

504

Gateway Timeout

505

HTTP Version Not Supported

*507

600

Insufficient Storage

Squid header parsing error

Method ƒ

Squid recognizes several request methods as defined in RFC 2616. Newer versions of Squid (2.2.STABLE5 and above) also recognize RFC 2518 ``HTTP Extensions for Distributed Authoring -- WEBDAV'' extensions (Table 7-3).

method

defined

cachabil. meaning

GET

HTTP/0.9

possibly

object

retrieval

and

simple

searches HEAD

HTTP/1.0

possibly

POST

HTTP/1.0

CC

metadata retrieval

or submit data (to a program)

Exp. PUT

HTTP/1.1

never

upload data (e.g. to a file)

DELETE

HTTP/1.1

never

remove resource (e.g. file)

TRACE

HTTP/1.1

never

appl. layer trace of request route

86

Analyzer

OPTIONS

HTTP/1.1

never

CONNECT

HTTP/1.1r3 never

request available comm. options tunnel SSL connection

ICP_QUERY Squid

never

used for ICP based exchanges

PURGE

Squid

never

remove object from cache.

PROPFIND

rfc2518

?

retrieve properties of an object

PROPATCH

rfc2518

?

change properties of an object

MKCOL

rfc2518

never

create a new collection

COPY

rfc2518

never

create a duplicate of src in dst

MOVE

rfc2518

never

atomically move src to dst

LOCK

rfc2518

never

Lock

an

object

against

modifications UNLOCK

rfc2518

never

unlock an object

Table 7-3 List of Methods

87

Analyzer

Hierarchy The following hierarchy codes are used in Squid-2 (Table 7-4):

Codes

Explanation

NONE

For TCP HIT, TCP failures, cachemgr requests and all UDP requests, there is no hierarchy information.

DIRECT

The object was fetched from the origin server.

SIBLING_HIT

The object was fetched from a sibling cache which replied with UDP_HIT.

PARENT_HIT

The object was requested from a parent cache which replied with UDP_HIT.

DEFAULT_PARENT

No ICP queries were sent. This parent was chosen because it was marked ``default'' in the config file.

SINGLE_PARENT

The object was requested from the only parent appropriate for the given URL.

FIRST_UP_PARENT

The object was fetched from the first parent in the list of parents.

NO_PARENT_DIRECT

The object was fetched from the origin server, because no parents existed for the given URL.

FIRST_PARENT_MISS

The object was fetched from the parent with the fastest (possibly weighted) round

88

Analyzer

trip time. CLOSEST_PARENT_MISS

This

parent

was

chosen,

because

it

included the lowest RTT measurement to the origin server. See also the closestsonly peer configuration option. CLOSEST_PARENT

The parent selection was based on our own RTT measurements.

CLOSEST_DIRECT

Our own RTT measurements returned a shorter time than any parent.

NO_DIRECT_FAIL

The

object

could

not

be

requested

because of a firewall configuration, see also never_direct and related material, and no parents were available. SOURCE_FASTEST

The origin site was chosen, because the source ping arrived fastest.

ROUNDROBIN_PARENT

No ICP replies were received from any parent. The parent was chosen, because it was marked for round robin in the config file and had the lowest usage count.

CACHE_DIGEST_HIT

The peer was chosen, because the cache digest predicted a hit. This option was later replaced in order to distinguish between parents and siblings.

CD_PARENT_HIT

The parent was chosen, because the 89

Analyzer

cache digest predicted a hit. CD_SIBLING_HIT

The sibling was chosen, because the cache digest predicted a hit.

NO_CACHE_DIGEST_DIR

This output seems to be unused?

ECT CARP

The peer was selected by CARP.

ANY_PARENT

part of src/peer_select.c:hier_strings[].

INVALID CODE

part of src/peer_select.c:hier_strings[].

Table 7-4 Hierarchy Codes in Squid-2

Cache log Location : /var/log/squid/cache.log

Description ƒ

It contains various messages such as information about Squid configuration, warnings about possible performance problems and serious errors.

ƒ

Error and debugging messages of particular squid modules

Format [Timestamp1]| Message With Timestamp1 When the event occurred (Year/Month/Day Hour:Minute:Second)

90

Analyzer

Message Errors

Description of the event

ERR_READ_TIMEOUT

The

remote

site

or

network

is

unreachable - may be down. ERR_LIFETIME_EXP

The remote site or network may be too slow or down.

ERR_NO_CLIENTS_BIG_OBJ

All

clients

went

away

before

transmission completed and the object is too big to cache. ERR_READ_ERROR

The remote site or network may be down.

ERR_CLIENT_ABORT

Client

dropped

connection

before

transmission completed. Squid fetches the Object according to its settings for `quick_abort'. ERR_CONNECT_FAIL

The remote site or server may be down.

ERR_INVALID_REQ

Invalid HTTP request

ERR_UNSUP_REQ

Unsupported request

ERR_INVALID_URL

Invalid URL syntax

ERR_NO_FDS

Out of file descriptors

ERR_DNS_FAIL

DNS name lookup failure

91

Analyzer

ERR_NOT_IMPLEMENTED

Protocol Not Supported

ERR_CANNOT_FETCH

The requested URL can not currently be retrieved.

ERR_NO_RELAY

There is no WAIS relay host defined for this cache.

ERR_DISK_IO

The system disk is out of space or failing.

ERR_ZERO_SIZE_ OBJECT

The

remote

server

closed

the

connection before sending any data.

ERR_FTP_DISABLED

This

cache

is

configured

to

NOT

retrieve FTP objects. ERR_PROXY_DENIED

Access

Denied.

The

user

must

authenticate himself before accessing this cache. Table 7-5 List of Error Messages

92

Analyzer

The example of cache.log file (Figure 7-2).

Figure 7-2 Cache.log

Store log Location : /var/log/squid/store.log

Description ƒ

It contains the information and status of [not] stored objects

Format Timestamp Tag Code Date LM Expire Content Expect/Length Methods Key

With: Timestamp The time entry was logged. (Millisecond resolution since 00:00:00 UTC, January 1, 1970) Tag SWAPIN (swapped into memory from disk), SWAPOUT (saved to disk) or RELEASE (removed from cache) Code The HTTP replies code when available. For ICP requests this is always "0". If the reply code was not given, it will be logged as "555."

93

Analyzer

The following three fields are timestamps parsed from the HTTP reply headers. All are expressed in Unix time (i.e.(seconds since 00:00:00 UTC, January 1, 1970). A missing header is represented with -2 and an unparsable header is represented as -1. Date The time captures from the HTTP Date reply header. If the Date header is missing or invalid, the time of the request will be used instead. LM The value of the HTTP Last-Modified: reply header. Expires The value of the HTTP Expires: reply header. Content The HTTP Content-Type reply header. Expect The value of the HTTP Content-Length reply header. The Zero value will be returned if the Content-Length was missing. /Length The number of bytes of content actually read. If the Expect is nonzero, and not equal to the Length, the object will be released from the cache. Method The request method (GET, POST, etc).

94

Analyzer

Key The cache key. Often this is simply the URL. Cache objects which never become public will have cache keys that include a unique integer sequence number, the request method, and then the URL. ( /[post|put|head|connect]/URI ) The example of store.log file (Figure 7-3).

Figure 7-3 Store.log

Based on Figure 7-3, we try to understand each format fields over the contents of store.log file. By taking the second line, we found that (Table 7-6):

Format

Value

Timestamp

1173680297.727

Tag

Release

Code

-1

Date

FFFFFFFF

LM

7832CBDDD1604B89D0F75A2437F37AD7

Expire

302

Content

1173680306 -1 -1 text/html

Expect

-1

/Length

/278

Methode

GET

Key

http://www.google.com/search? Table 7-6 Format in Store.log

95

Analyzer

7.2 Methods Log Analysis Using Grep Command The log files also can be analysed using Linux or UNIX command such as grep. It is used to filter the required information from any log files. By using a terminal, follow the following commands in order to start analysis the related log file. For example: # cat /var/log/squid/access.log | grep www.google.com By referring Figure 7-4, the output shows the result of grep command for the access.log file. The same technique can be applied for cache.log and store.log files.

Figure 7-4 Analysis the Access.log using Grep command

Log Analysis Using Sarg-2.2.3.1 Basically, the preferred log file for analysis is the access.log file in the native format. We choose to use Squid Analysis Report Generator (Sarg) as a tool. It is used to analyze the users pattern concerning the Internet surfing. It generates reports in html including many fields such as users, IP addresses, bytes, sites and times. This tool can be downloaded from: http://linux.softpedia.com/get/Internet/Log-Analyzers/sarg-102.shtml 96

Analyzer

7.3 Setup Sarg-2.2.3.1 Step: Download software named Sarg-2.2.3.1.tar.gz for Linux and Unix environment. Make a new directory called installer located in the root path. # mkdir /installer Copy the downloaded file into the installer directory # copy sarg-2.2.3.1.tar.gz

/installer

Then, go into the directory and extract file Sarg-2.2.3.1.tar.gz using the following command. # tar –zxvf sarg-2.2.3.1.tar.gz After successfully extracted, go into sar-2.2.3.1 directory and start configure it. Follow these command: # # # #

cd /installer/sar-2.2.3.1 ./configure make make install

NOTE: Make sure the Squid already started before run the following script.

Go into sarg-2.2.3.1 directory, run the sarg script. # ./sarg The generated result will be kept at /var/www/html/squid-reports. It is recommended to view using GUI enviroment.

97

Analyzer

7.4 Report Management Using Webmin For managing the report, we choose to use Webmin which is a webbased interface for system administration for Unix. In our case, it helps admin to set some information such as the location of log source and report destination, the format of generated report, the size of report and also the schedule of automatic report to be generated. Step: 1. Make sure the webmin is already setup in the server. Then, open the browser and type http://127.0.0.1:10000/ to find the webmin. After that, login the webmin.

Figure 7-5 Login

2. Choose Server tab, and then click on Squid Analysis Report Generator. There are four (4) modules being offer such as Log Source and Report Destination, Report Option, Report Style and Scheduled Report Generation.

98

Analyzer

Figure 7-6 Sarg Main Modules in Webmin

3. Click on Log Source and Report Destination icon. In this module, admin allows to set the source of log file and also define the destination of generated report. For report maintenance, it also allows admin to set the number of report to keep in certain location and acknowledgement can be sent to admin’s e-mail. Note: Please check the sarg.conf file which is located in /usr/local/sarg/sarg.conf to ensure the correct path for locating the source of log files.

Figure 7-7 Setting on Source and Destination Report

99

Analyzer

After setting the changes, click on Save button. 4. Click on Report Option icon. In this module, admin can manages the pattern of generated report including data ordering, size of data displayed, data format and log file rotation. There are several types of report can be generates depending on the implementation of access control list (ACL) that has been set before. For log file rotation, it becomes important to ensure enough disk space to handle log file storage especially when it involves the long term evaluations. This can covers more in Scheduled Report Generation.

100

Analyzer

Figure 7-8 Setting on Report Content and Generation Option 101

Analyzer

5. Click on Report Style icon. Here, it allows admin to make the generated report looks more interesting in terms of language, title and other common style setting.

Figure 7-9 Setting on HTML Report Style and Colour Option

6. Click on Scheduled Report Generation icon. In this module, admin allows to define the frequency of generated report by enabling the selected or default schedule stated. Regarding to rotate feature in Squid, it is recommended to apply simple schedule. During a time of some idleness, the log files are safely transferred to the report destination in one burst. Before transport, the log files can be compressed during off-peak time. On the destination, the log file is concatenated into one file. Therefore one file for selected hour is the yield. However, it is depends on company’s requirement on how to generate report.

102

Analyzer

Figure 7-10 Setting on Scheduled Reporting Options

7. After setting some information in Scheduled Report Generation, the following statement will be displayed on the main page.

Figure 7-11 Generate Report Setting

103

Analyzer

There are some considerations to be taken: 1.

Should never delete access.log, store.log, cache.log while Squid is running. There is no recovery file.

2.

In squid.conf file, the following statements can be applied if admin wants to disable certain log file. For example: To disable access.log: cache_access_log /dev/null To disable store.log: cache_store_log none To disable cache.log: cache_log /dev/null However, the cache.log is not suitable to be disabled because it has

file

messages.

104

contains

many

important

status

and

debugging

Analyzer

7.5 Log Analysis and Statistic After running the Sarg analyser, the reports will generated for access.log. This can be found in /var/www/html/squid-reports.

Figure 7-12 Collection of Squid Report for Access.log

From Figure 7-12, throughout this example we found that there are three (3) reports generated. Basically, the latest version has no number at the end of the filename. Each time the access log file being analysed, the filename will renamed and an incremental number will be placed automatically at the end of the file. For example 2007Mar22-2007Mar22.2 was the first report had been generated compared to 2007Mar22-2007Mar22 which indicated as the latest version report.

105

Analyzer

Based on (Figure 7-13), the index.html file shows the list of reports that have been generated by Sarg. To get more detail information for a specific report, we need to click on the selected file name.

Figure 7-13 Summary of Squid reports

For example, a folder named as 2007Mar22-2007Mar22 has been selected and opened. From (Figure 7-14), there are several standard files which can be found in all Squid reports. Briefly, there are five (5) html reports show statistical information regarding to index, denied, download, siteuser and topsites. Besides, the folder also presents collection of report for specific user by their IP addressess.

106

Analyzer

Figure 7-14 Contents of 2007Mar22-2007Mar22 as example

The following figure will show the html reports: 1. Index html

2. Figure 7-15 Index html

107

Analyzer

3. Denied html

Figure 7-16 Denied html

4. Download html

Figure 7-17 Download html

108

Analyzer

5. Sites and Users

Figure 7-18 Siteuser html

109

Analyzer

6. Top 100 Sites

Figure 7-19 Topsites html

If we click on specific IP address, we will view all information as in Figure 7-20

Figure 7-20 Reports generated for specific user (IP Address)

110