I run a LAMP stack that hosts about 100 virtual hosts. Each virtual Host logs to its own CustomLog file using the "combined" format.
For one of these log files (just the one so far), our AWStats parser is choking on the log file citing bad data. It says that not all the lines are in the "custom log" format.
Create/Update database for config "/etc/awstats/awstats.example.com.conf" by AWStats version 7.0 (build 1.971) From data in log file "/wwwlogs/example.com.log"... Phase 1 : First bypass old records, searching new record... Direct access after last parsed record (after line 39179) AWStats did not find any valid log lines that match your LogFormat parameter, in the 50th first non commented lines read of your log. Your log file /wwwlogs/example.com.log must have a bad format or LogFormat parameter setup does not match this format. Your AWStats LogFormat parameter is: 1 This means each line in your web server log file need to have "combined log format" like this: 188.8.131.52 - - [10/Jan/2001:02:14:14 +0200] "GET / HTTP/1.1" 200 1234 "http://www.fromserver.com/from.htm" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" And this is an example of records AWStats found in your log file (the record number 50 in your log): 5%BE%C3%82%C2%A2s-strategic-plan?page=45 HTTP/1.1" 200 12980 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" Setup ('/etc/awstats/awstats.example.com.conf' file, web server or permissions) may be wrong. Check config file, permissions and AWStats documentation (in 'docs' directory).
I looked in the log file and found this is an intermittent problem. MOST of the lines look just fine. But every once in a while, there's a line that looks like it starts somewhere in the middle of the line.
184.108.40.206 - - [19/Jul/2016:13:11:19 -0400] "HEAD /blog/compensation-plans-commercial-lenders HTTP/1.1" 200 - "-" "Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/47.0 (Chrome)"
ozilla/5.0 (compatible; Baiduspider/2.0;+http://www.baidu.com/search/spider.html)"
That's the whole line. See how it starts in the middle of the word "Mozilla"? And it doesn't look like it's a continuation from the previous line either - like a random line break got inserted. It's like it's a whole new line that just didn't output the first 100 or so characters of the line.
Here's a different example:
1%C3%82%C2%AC%C3%83%C2%A2%C3%A2%E2%82%AC%C5%BE%C3%82%C2%A2s-strategic-plan?page=60 HTTP/1.1" 200
Here it looks like it's starting in the middle of the requested resource URI.
Does anyone know what might be causing this? Our site is up and running as far as anybody on the front end can tell, but AWStats is completely unable to parse the logs.
Apache version: 2.2.15
Operating system: CentOS 6
PHP Version: 5.5.36