rsyslog supprime les messages excessivement

Nous avons une machine CentOS agissant comme notre server syslog. J'ai récemment remarqué que la machine avait parfois perdu un grand nombre de messages syslog. Nous avons plus de 250 routeurs et commutateurs, ainsi que des pare-feu qui enregistrent toutes les connections ouvertes / fermées.

Je peux voir les messages arriver à la carte réseau Ethernet … par exemple, supposons que je passe en mode config sur l'un de nos routeurs et que nous sortions … cela génère un message syslog, et je renifle pour démontrer qu'il arrive au syslog server…

[mpenning@uglylogger net]$ sudo tshark -V udp and port 514 and host 214.110.12.246 Running as user "root" and group "root". This could be dangerous. Capturing on eth0 Frame 1 (151 bytes on wire, 151 bytes captured) Arrival Time: Apr 26, 2013 06:05:33.490721000 [Time delta from previous captured frame: 0.000000000 seconds] [Time delta from previous displayed frame: 0.000000000 seconds] [Time since reference or first frame: 0.000000000 seconds] Frame Number: 1 Frame Length: 151 bytes Capture Length: 151 bytes [Frame is marked: False] [Protocols in frame: eth:ip:udp:syslog] Ethernet II, Src: 50:57:a8:9f:68:3c (50:57:a8:9f:68:3c), Dst: Supermic_30:4a:07 (00:30:48:30:4a:07) Destination: Supermic_30:4a:07 (00:30:48:30:4a:07) Address: Supermic_30:4a:07 (00:30:48:30:4a:07) .... ...0 .... .... .... .... = IG bit: Individual address (unicast) .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) Source: 50:57:a8:9f:68:3c (50:57:a8:9f:68:3c) Address: 50:57:a8:9f:68:3c (50:57:a8:9f:68:3c) .... ...0 .... .... .... .... = IG bit: Individual address (unicast) .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) Type: IP (0x0800) Internet Protocol, Src: 214.110.12.246 (214.110.12.246), Dst: 214.110.16.4 (214.110.16.4) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 137 Identification: 0x0096 (150) Flags: 0x00 0.. = Reserved bit: Not Set .0. = Don't fragment: Not Set ..0 = More fragments: Not Set Fragment offset: 0 Time to live: 250 Protocol: UDP (0x11) Header checksum: 0xcf52 [correct] [Good: True] [Bad : False] Source: 214.110.12.246 (214.110.12.246) Destination: 214.110.16.4 (214.110.16.4) User Datagram Protocol, Src Port: 63873 (63873), Dst Port: syslog (514) Source port: 63873 (63873) Destination port: syslog (514) Length: 117 Checksum: 0x1ed8 [validation disabled] [Good Checksum: False] [Bad Checksum: False] Syslog message: LOCAL7.NOTICE: 187: .Apr 26 06:05:32.476 CDT: %SYS-5-CONFIG_I: Configured from console by admin on vty0 (214.110.26.29) 1011 1... = Facility: LOCAL7 - reserved for local use (23) .... .101 = Level: NOTICE - normal but significant condition (5) Message: 187: .Apr 26 06:05:32.476 CDT: %SYS-5-CONFIG_I: Configured from console by admin on vty0 (214.110.26.29) ^C1 packet captured [mpenning@uglylogger net]$ 

Cependant, ce message n'apparaît pas dans le syslog (même si un autre message est envoyé à 05:37:05) … Un seul de sept ou huit de ces messages apparaît dans le file journal …

 [mpenning@uglylogger net]$ pwd /var/log/net [mpenning@uglylogger net]$ [mpenning@uglylogger net]$ grep CONFIG network.log Apr 26 05:08:59 somedevice.company.local 779476: Apr 26 05:08:58.604 CDT: %SYS-5-CONFIG_I: Configured from console by cbutler on vty0 (214.110.31.65) Apr 26 05:37:05 w9-idf-lab 182: .Apr 26 05:37:04.452 CDT: %SYS-5-CONFIG_I: Configured from console by admin on vty0 (214.110.26.29) [mpenning@uglylogger net]$ 

J'ai essayé beaucoup de choses pour que ce server se connecte correctement …

  • Suppression de tous les files étrangers de la partition
  • Vérifié le NIC pour les rejets / erreurs
  • Reloaded rsyslogd
  • Reloaded le système
  • yum update
  • Démarré un server de noms de caching
  • Vérifié iostat (la charge est tout à fait raisonnable, voir ci-dessous)
  • Checked vmstat (charge à nouveau, est tout à fait raisonnable)

Je suis à court d'idées … comment puis-je empêcher ce server de saigner syslogs?


Sortie MISC

Liste du directory

 [mpenning@uglylogger net]$ ls -al total 3292356 drwxr-xr-x 3 root root 20480 Apr 26 05:17 . drwxr-xr-x. 13 root root 4096 Apr 21 03:39 .. drwxr-xr-x 2 root root 4096 Apr 19 04:22 daily -rw-r--r-- 1 root root 185081020 Apr 26 06:21 fw1.log -rw-r--r-- 1 root root 1936049300 Apr 26 04:15 fw1.log-20130426 -rw-r--r-- 1 root root 7629110 Apr 26 06:21 fw2.log -rw-r--r-- 1 root root 160792210 Apr 26 04:17 fw2.log-20130426 -rw-r--r-- 1 root root 118327 Apr 26 06:20 network.log -rw-r--r-- 1 root root 1572210 Apr 26 04:17 network.log-20130426 -rw-r--r-- 1 root root 73108 Apr 26 06:21 wireless.log -rw-r--r-- 1 root root 686104 Apr 26 04:17 wireless.log-20130426 [mpenning@uglylogger net]$ 

rsyslog.conf

 $ModLoad imudp $UDPServerRun 514 $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat $IncludeConfig /etc/rsyslog.d/*.conf if $msg contains 'APF-3-RCV_UNSUPP_MSG' then /dev/null &~ if $msg contains 'FWSM-2-106007' then /dev/null &~ if $fromhost-ip=='214.110.225.201' then /var/log/net/fw1.log &~ if $fromhost-ip=='214.110.225.202' then /var/log/net/fw1.log &~ if $fromhost-ip=='214.110.19.246' then /var/log/net/fw2.log &~ if $fromhost-ip=='214.110.19.253' then /var/log/net/fw2.log &~ if $fromhost-ip=='214.110.5.10' then /var/log/net/fw_other.log &~ if $fromhost-ip=='214.110.6.254' then /var/log/net/fw_other.log &~ if $fromhost-ip=='214.110.6.253' then /var/log/net/fw_other.log &~ if $fromhost-ip=='214.110.46.254' then /var/log/net/fw_other.log &~ if $fromhost-ip=='214.110.46.253' then /var/log/net/fw_other.log &~ if $fromhost-ip=='214.110.19.42' then /var/log/net/fw_other.log &~ if $fromhost-ip=='214.110.3.203' then /var/log/net/fw_other.log &~ if $fromhost-ip=='214.110.84.2' then @214.110.16.6 if $fromhost-ip=='214.110.84.2' then /var/log/net/wireless.log &~ if $fromhost-ip=='214.110.84.4' then @214.110.16.6 if $fromhost-ip=='214.110.84.4' then /var/log/net/wireless.log &~ if $fromhost-ip=='214.110.70.8' then @214.110.16.6 if $fromhost-ip=='214.110.70.8' then /var/log/net/wireless.log &~ if $fromhost-ip=='214.110.70.12' then @214.110.16.6 if $fromhost-ip=='214.110.70.12' then /var/log/net/wireless.log &~ if $fromhost-ip=='214.110.70.16' then @214.110.16.6 if $fromhost-ip=='214.110.70.16' then /var/log/net/wireless.log &~ if $fromhost-ip=='214.110.16.6' then /var/log/net/wireless.log &~ if $fromhost-ip startswith '214.110' then /var/log/net/network.log &~ if $fromhost-ip startswith '214.111' then /var/log/net/network.log &~ if $fromhost-ip startswith '214.112' then /var/log/net/network.log &~ *.info;mail.none;authpriv.none;cron.none /var/log/messages authpriv.* /var/log/secure mail.* -/var/log/maillog cron.* /var/log/cron *.emerg * uucp,news.crit /var/log/spooler local7.* /var/log/boot.log $template SpiceTmpl,"%TIMESTAMP%.%TIMESTAMP:::date-subseconds% %syslogtag% %syslogseverity-text%:%msg:::sp-if-no-1st-sp%%msg:::drop-last-lf%\n" :programname, startswith, "spice-vdagent" /var/log/spice-vdagent.log;SpiceTmpl 

iostat 5 … ces numbers sont typiques … la charge ne pique pas tant …

 [mpenning@uglylogger net]$ iostat 5 Linux 2.6.32-358.2.1.el6.i686 (uglylogger.local) 04/26/2013 _i686_ (4 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 2.25 0.05 0.80 0.37 0.00 96.52 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 10.67 364.09 536.25 995468936 1466172104 dm-0 33.82 15.35 268.50 41981146 734110616 dm-1 0.00 0.00 0.00 3056 0 dm-2 2.21 0.05 17.60 132282 48114080 dm-3 33.07 348.68 250.14 953334658 683895008 avg-cpu: %user %nice %system %iowait %steal %idle 0.05 0.00 0.00 0.15 0.00 99.80 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 1.00 0.00 8.00 0 40 dm-0 0.80 0.00 6.40 0 32 dm-1 0.00 0.00 0.00 0 0 dm-2 0.00 0.00 0.00 0 0 dm-3 0.20 0.00 1.60 0 8 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.05 0.30 0.00 99.65 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 1.00 0.00 33.60 0 168 dm-0 0.00 0.00 0.00 0 0 dm-1 0.00 0.00 0.00 0 0 dm-2 0.00 0.00 0.00 0 0 dm-3 4.20 0.00 33.60 0 168 avg-cpu: %user %nice %system %iowait %steal %idle 1.05 0.00 0.30 0.20 0.00 98.44 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 1.20 0.00 8.00 0 40 dm-0 0.80 0.00 6.40 0 32 dm-1 0.00 0.00 0.00 0 0 dm-2 0.00 0.00 0.00 0 0 dm-3 0.20 0.00 1.60 0 8 avg-cpu: %user %nice %system %iowait %steal %idle 1.40 0.00 0.30 0.15 0.00 98.14 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.80 0.00 6.40 0 32 dm-0 0.80 0.00 6.40 0 32 dm-1 0.00 0.00 0.00 0 0 dm-2 0.00 0.00 0.00 0 0 dm-3 0.00 0.00 0.00 0 0 avg-cpu: %user %nice %system %iowait %steal %idle 0.05 0.00 0.05 0.25 0.00 99.65 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 5.40 0.00 46.40 0 232 dm-0 0.00 0.00 0.00 0 0 dm-1 0.00 0.00 0.00 0 0 dm-2 0.00 0.00 0.00 0 0 dm-3 5.80 0.00 46.40 0 232 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.05 0.15 0.00 99.80 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.80 0.00 4.80 0 24 dm-0 0.60 0.00 4.80 0 24 dm-1 0.00 0.00 0.00 0 0 dm-2 0.00 0.00 0.00 0 0 dm-3 0.00 0.00 0.00 0 0 avg-cpu: %user %nice %system %iowait %steal %idle 3.71 0.00 1.05 0.10 0.00 95.14 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 1.20 0.00 19.20 0 96 dm-0 1.40 0.00 11.20 0 56 dm-1 0.00 0.00 0.00 0 0 dm-2 0.00 0.00 0.00 0 0 dm-3 1.00 0.00 8.00 0 40 avg-cpu: %user %nice %system %iowait %steal %idle 0.05 0.00 0.00 0.35 0.00 99.60 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 3.40 0.00 102.40 0 512 dm-0 1.40 0.00 11.20 0 56 dm-1 0.00 0.00 0.00 0 0 dm-2 0.00 0.00 0.00 0 0 dm-3 11.40 0.00 91.20 0 456 ^C [mpenning@uglylogger net]$ 

vmstat 5

 [mpenning@uglylogger net]$ vmstat 5 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- rb swpd free buff cache si so bi bo in cs us sy id wa st 0 1 0 240408 50068 3506120 0 0 46 67 2 2 2 1 97 0 0 0 0 0 240400 50068 3506152 0 0 0 19 1312 37 0 0 100 0 0 1 0 0 239160 50084 3508648 0 0 0 18 3943 2320 20 6 74 0 0 1 0 0 242632 50100 3510084 0 0 0 212 3561 2718 14 5 81 1 0 0 0 0 238168 50116 3510336 0 0 0 58 1612 2537 1 1 98 0 0 0 0 0 235440 50124 3510340 0 0 0 510 1340 1333 0 0 100 0 0 0 0 0 235564 50132 3510340 0 0 0 19 1470 37 0 0 100 0 0 0 0 0 235564 50140 3510344 0 0 0 28 1348 40 0 0 100 0 0 0 0 0 235440 50156 3510340 0 0 0 17 1497 37 0 0 100 0 0 0 0 0 235564 50156 3510348 0 0 0 70 1522 38 0 0 100 0 0 

df -h

 [mpenning@sasmars net]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_uglylogger-LogVol01 73G 17G 56G 23% / tmpfs 2.0G 0 2.0G 0% /dev/shm /dev/sda1 985M 126M 809M 14% /boot /dev/mapper/vg_uglylogger-LogVol00 20G 655M 18G 4% /home /dev/mapper/vg_uglylogger-LogVol03 592G 4.9G 557G 1% /var [mpenning@uglylogger net]$ 

netstat -s

 [mpenning@uglylogger net]$ netstat -s | head Ip: 452407434 total packets received 0 forwarded 0 incoming packets discarded 449256489 incoming packets delivered 156718754 requests sent out 150 reassemblies required 75 packets reassembled ok Icmp: 2423436 ICMP messages received [mpenning@uglylogger net]$ 

ifconfig eth0

 [mpenning@uglylogger net]$ ifconfig eth0 | grep -E "dropped|collisions" RX packets:466065257 errors:0 dropped:0 overruns:0 frame:0 TX packets:252751047 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 [mpenning@uglylogger net]$ 

ethtool -S eth0|grep -vw 0

 [mpenning@uglylogger net]$ sudo ethtool -S eth0|grep -vw 0 NIC statistics: rx_packets: 5206242266 tx_packets: 255859804 rx_bytes: 1205016281115 tx_bytes: 180592917566 rx_broadcast: 15001736 tx_broadcast: 2648 rx_multicast: 7518499 tx_multicast: 50236 multicast: 7518499 rx_no_buffer_count: 103 tx_restart_queue: 1 tx_tcp_seg_good: 10107428 rx_long_byte_count: 1205016281115 rx_csum_offload_good: 5189768707 rx_csum_offload_errors: 5 [mpenning@uglylogger net]$ 

lspci

 [mpenning@uglylogger net]$ sudo lspci 00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub (rev 01) 00:00.1 Unassigned class [ff00]: Intel Corporation E7500/E7501 Host RASUM Controller (rev 01) 00:02.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface B PCI-to-PCI Bridge (rev 01) 00:03.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface C PCI-to-PCI Bridge (rev 01) 00:1d.0 USB controller: Intel Corporation 82801CA/CAM USB Controller #1 (rev 02) 00:1d.1 USB controller: Intel Corporation 82801CA/CAM USB Controller #2 (rev 02) 00:1d.2 USB controller: Intel Corporation 82801CA/CAM USB Controller #3 (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42) 00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface Controller (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 02) 01:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 01:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 01:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 01:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 02:01.0 RAID bus controller: 3ware Inc 7xxx/8xxx-series PATA/SATA-RAID (rev 01) 03:02.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 01) 03:02.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 01) 04:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 04:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 04:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 04:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 05:02.0 Serial controller: 3Com Corp, Modem Division 56K FaxModem Model 5610 (rev 01) 07:01.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Rage XL (rev 27) [mpenning@uglylogger net]$ 

uname -a

 [mpenning@uglylogger net]$ uname -a Linux uglylogger.local 2.6.32-358.2.1.el6.i686 #1 SMP Tue Mar 12 21:42:46 UTC 2013 i686 i686 i386 GNU/Linux [mpenning@uglylogger net]$ 

Mettre à jour vers la dernière version de rsyslog. Nous avons eu ce problème précis au travail, et c'est la seule chose qui l'a résolu. La version précédente (s) avait des problèmes avec la résolution de nom, et même l'éteindre ne l'a pas résolu. La twig 7.x résout le problème. Je vais voir si je peux find le lien spécifique.

Je suppose que RedHat a rétrogradé tous les problèmes liés à la performance à la version rsyslog contenue dans le operating system.

rx_no_buffer_count: 103 semble être le problème de base ici.

Cela indique qu'il y a eu 103 packages TCP qui ont été supprimés AVANT qu'ils puissent être transmis au operating system.

La baisse a été enregistrée dans le NIC (et pas passé au operating system, donc netstat n'a pas montré cela). Pour résoudre le problème, vous devrez probablement augmenter la taille de la memory tampon de réception dans vos parameters NIC.

Faites un ethtool -g eth0 pour voir vos parameters actuels et possibles pour RX.

D'après mon expérience, le RX-ring-buffer à 2048 ou 3172 est plutôt bon.

Cela donnera au NIC le time nécessaire pour tamponner les packages TCP entrants jusqu'au matériel (cela nécessite un interrup- teur PCI) et le operating system a le time de traiter les packages.

Voici une explantion de RedHat , ce qui se passe ici.

Pour rendre ce changement persistant, changez ETHTOOL_OPS comme décrit ici .