-
Notifications
You must be signed in to change notification settings - Fork 4
/
RELEASE-collectl
2347 lines (2144 loc) · 127 KB
/
RELEASE-collectl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
RELEASE NOTES FOR COLLECTL
INSTALLATION
Installing the rpm
rpm -ihv collectl-x.y.z.noarch.rpm
Installing from source
unpack the tarball, which you've obviously done
follow the instructions in the README, which basically says to run INSTALL
Configure to start on boot
On RedHat based installations, collectl will not be configured to start
on boot, but can easily be set to do so with the command:
chkconfig collectl on
KNOWN PROBLEMS/RESTRICTIONS
- if system time is changed by more then the log rolling frequency after
collectl starts, multiple log files will be created during the next polling
cycle(s)
- never include aliased networks in the network summary calculations
COLLECTL CHANGES
4.3.10 Jun 12 2024
- Fixed an issue reported by Martin where nvme devices were not being aggregated in
the summary display for disks
- Tags will be properly updated now, I had missed that when I took this over
4.3.9 Jan 2 2024
- Added md devices to disk parse
4.3.8 Feb 7 2023
- Added Ceph rbd devices
- Added miissing network interface names (Reported by Martin A)
4.3.6 November 29 2022
- Removed the error check in sub vmstatInit
It was erroneous and that option is seldom used.
Reported-by Bill Torpey
4.3.5 October 28 2022
- Add default collection for Cepg rbd devices to the conf file
4.3.4 August 24, 2022
- Add patch for dskopts=F to allow double digit precision on playback
4.3.3 Apr 1, 2022
- Minor fix for HCA stats with Mellanox HCA's
4.3.2 Oct 7, 2021
- very minor bug. the last command in perfquery is not in a loop and prevents
the collectl daemon from starting via the service. [Thanks Edgar]
4.3.1 Sept 13, 2018
- very minor bug. If playing back a file with -P in its name, collectl
incorrectly interprets it as the plot format flag! [thanks laurence]
- incorrectly dividing $dskWaitR and $dskWaitW by $intSecs [thanks Jan]
- added 'm' switch to vmsum to only report VMs whose instance id is
>- this minimum value
4.3.0 Oct 3, 2017
- disable -sL, should have been done at same time -sl was
4.2.0 Jun 12, 2017
- Updated Plotfile docs to explain why you shouldn't leave off the -f
when using -P [thanks Bayard]
- added support for InfiniBand OPA V4 to read start from /sys instead of
having to rely on perfquery for 64 bit counters. [thanks frederic]
- removed previos bug introduced in V4.1.2 that was not properly calculating
disk summaries. If you do have any raw files collected with this version
you WILL be able to play them back properly or create and plot files with
this version
- although I'm stil leaving the lustre code in place because there is so much
of it, I did remove cciss disk types from non-lustre code
- finally removed col2tlviz from kit [thanks tom]
4.1.3 Apr 10, 2017
- throws 'unit var' building distro on openSUSE
4.1.2 Feb 27, 2017
- incorrectly requiring a + with --rawdskfilt to be at beginning
- when added support for 64bit IB counters it looks like I was only
saving 3 of the 4 values (loop only went to 3 instead of 4)
around line 4403. [thanks seb]
4.1.1 Nov 2, 2016
- added packet loss and fast restransmissions to TCP Extended versbose
output and renamed AkNoPy and PreAck to PurAck and HPAcks to be
consistent with earlier versions [thanks Sophie]
- add support for nvme disks [thanks fred]
- it turns out some people re-enable lustre support for the sake of
monitoring clients and to support that I had to add a check for
the lustre-client module which is now in a differetn location than
others [thanks fred]
4.1.0 Oct 7, 2016
- allow lexpr to pass formatting information for strings and numbers
[thanks Guy]
- modify the way misc.ph reports uptime to thousandths of a day [thanks, seb]
- added OPA interface support for -sx reporting and cleaned up some very
old code, like quadrics support! [thanks fred]
4.0.5 Apr 26, 2016
- rawdskfilt has been enhanced to allow a preceding + which will
cause the following string to be appended to the default filter
- needed to initialized anonH for numa stats [thanks andy]
- added 'hed' to known ethernet devices, used by HP Helion
4.0.4 Jan 29, 2016
- if you try to playback a file with --stats and it has recorded
processes or slabs, ignore them be removing from $subsys [thanks ghassen]
- playback of process data with -P was not skipping first interval and so
stats for first entry we not rates but rather raw numbers [thanks philippe]
- change 'yikes' message to something more meaningful [thanks rob and laurence]
- fixed problem with -sZ -P printing all 0s for thread count [thanks philippe]
- added /usr/lib/systemd/system/collectl.service, per sourceforge help
discussion on 2015-12-28 [thanks george]
- added disk read/write wait timing for disk detail in terminal, plot
and lexpr format [thanks bud]
- new switch dskremap allows one to change disk names on the fly because
in some cases such as etherd disks, the names are messy for use with
other tools like ganlia [thanks gabriel]
- removed access to disk name remapping file
4.0.3 July 2, 2015
- add AnonHuge memory to memory stats, both verbose and detailed as
well as lexpr [thanks, fred]
- if lexpr called with --import, throw an error
- tighten divide-by-zero test for -sM because it looks like in some cases when misses >0
we're getting occasional errors. could hits be somehow negative? [thanks Robert]
4.0.2 May 27, 2015
- add /bin/bash to list of 'known shells' excluded from output with
--procopt k
- generalize ethernet network device name to include ALL names
matching type 'p\dp' so we pick up p2p, p3p, p4p... [thanks Matt]
- collect nr_shmem so we can track shared memory, apparently something
I thought of but never acted on [thanks Christian]
- do not include guest cpu metrics in totals since already accounted
for in user time [thanks Philippe]
4.0.1
- change /usr/sbin to /usr/bin in init.d/collectl [thanks Ladislav]
- pattern match to exclude partitions from disk summary is WRONG and
we need to make sure name doesn't match cciss disks like c0d0!
[thanks, Laurent]
- changed help text for -retaddr to NOT use 'use' preceding -deb because
rpmbuild gets confused ang tries to include '-deb' as a dependency
[thanks dan]
- include 'en' network devices in summary data [thanks homerl]
- change buddyinfo to deal with less fields in /proc/buddyinfo as apparently
there are not always 11 of them [thanks greg]
- remove lustre from --showsubsys
- removed 'known problem' with older versions of Time::HiRes in these
release notes as that was quite a long time ago
4.0.0 Mar 9, 2015
- rare, but if selecting processes by parent pid or command name, it's
possible when a new pid is seen that it's already exited by the time
we try to read /proc/pid/stat, and it will return an undef value
- finally cleaned up code to read speeds from /sys to use internal
cat() to avoid misc 'Invalid Arg' errors. also fixed cat() to return
null when nothing read.
- added mlx5 as a new type of IB device name [thanks fred]
- get lustre version a different way because format changed [thanks Jeff]
also note that native lustre support in collectl is going away in
summer of 2015!
- lexpr was incorrectly reporting sys/user cpu details in the wrong
place and as a result showed up before the timestamp in some cases
- colmux has now been moved to the collectl package, release notes
to be continued here going forward
COLMUX CHANGES
5.0.0
- getHeader routine, removing -c/-i need to look for leading spaces
when stipping switches in case a UUID in command string which can
contain -c following by a hex string
4.9.2
- if ping fails, it still tries to ssh and fails, generating
meaninlgess uninitialized variable errors
- include 'ssh' in the error messages when check() fails (thanks KM5)
4.9.1 Mar 29, 2016
- assume collectl in same directory as colmux so you can install both
on network share BUT if colmux ends in 'pl', it's probably me doing
development/testing, so use collectl in /usr/bin. [thanks Paul]
4.9.0 Jan 06, 2016
- header name printing in single line mode not quite right for all
combinations of switches
- not trapping 'collectl not installed' errors and just returning
the node isn't reachable
- new switch -timerange will report warnings for any nodes found to
differ from others by more than this number of seconds
- added COMMUNICATIONS PROBLEMS section to man page and dropped
section describing what changed in Version 3
4.8.3 Mar 9, 2015
- -oT -test wasn't including time column in help output whereas -od and -oD
did [thanks, robbin]
- new switch: -retaddr tells collectl to connect back to this address rather
than the one colmux chooses by default which is default interface's addr
- change in way return address is determined because RHEL 7 changed the
format of the ifconfig output, changing Bcast to broadcast and dropping
addr: [thanks hank]
PRE-4.0 COLLECTL CHANGES
3.7.4-1 Sep 10, 2014
- typo in $netFilt (should have been $netFiltIgnore) preventing any
network from being included in totals when --netfilt specified, but
also made me rethink the way summaries are calculalted (see next item)
- 2 more network types were discovered to be causing double counting
in summaries, specifically vibr and vnets. since the exceptions occur
at a far greater rate it was decided that rather than have a default list
of those network types to exclude from the summaries, it makes far more
sense to have a list with those that SHOULD be included as well as a
mechanism for handling new summary types. This led to a reinterpretation
of --netfilt. see the man page and Network.html for more details
- removed references to XC, which is no longer supported
- use abs to generate path to exe, simpler and cleaner [thanks Jeff]
- extended the way formatit is loaded and changed the order that collectl.conf
is discovered, noting it should only effect people actually modifying
code or moving things to non-standard locations. it IS now documented
in Startup and Initialization. [thanks again, Jeff]
- set max lines to read for diskstats to 20000 for those with real large
disk counts where 10000 wasn't enough [thanks jean-marc]
- very rare, but if doing timing and no hires present, $microInterval gets set
to zero and the division by the interval blows up
- finally remembered to remove -G and --group which were replaced by --tworaw
- clarified description of -s defaults in manpage as well as adding a
pointer to the online documentation on file naming [thanks rob]
- added additional error message for when files match selection string
but none contain -date-time.raw [thanks rob]
- add support for newer kernel CPU stats: guest, guest_nice
- now that 2.4 kernels no longer supported, make sure CPU stats contain
at least softirq field
- change headers with % to PCT and remove space, also remove whitespace in
interrupt detail output for type and devices columns [thanks rob]
- new switch --ALL, selects summary and detail data for all subsystems
[thanks rob]
- new switch --full, selects --verbose, always includes RECORD separator and
includes which subsystem data is being reported with each interval in
the RECORD header to make parsing easier for rob [thanks rob]
- if you DON'T collect tcp data but want to play it back, variables weren't
initialized to 0 and you get uninit variable warnings
- if disk name ends with a digit (can only happen when manually changing
disk filtering in either collectl.conf or with --rawdskfilt, don't
include in disk summary stats [thanks guy]
- discovered a place where some numa counters go backwards! This MUST be a
kernel bug but inserted code to mitigate and warn if it happens [thanks rob]
- removed a line of code incorrectly initializing $HCAPosts[] because that is
now a doubly indexed array [thanks Jeff]
- discovered tap devices don't set default network speeds correctly and can
cause 'bogus' messages so use default max
- make 'Intrpt' header mixed case for CPU details, not all upper
- new 3rd option for --top, allows one to display the top-n processes sorted
by any column vertically, similar to playback mode, which in some cases
can be very handy
- if only 1 tcp subtype selected with --tcpfilt, was printing column
header of ERR and I've no idea why. Changed it to TCP.
- I didn't like --tcpfilt I by itself forcing --verbose so changed it to just
being in the --tcpfilt string will force it and updated man page as well
since --tcpfilt wasn't even documented in it
- As warned I'm in the process of direct support for lustre and you should
contact Peter Piela at TeraScala to get a copy of his lustre plugin.
Therefore -sl is being removed as a default. To get collectl's native
lustre support in daemon mode, you must add it to -s. Native support will
be completely removed around the summer of 2015.
3.7.3-1 Apr 1, 2014
- had to change 'defined(@array)' to remove the 'defined() which is
deprecated on RHEL7
3.7.2-2 Mar 31, 2014
- deal with process names in /proc/pid/stat that have embedded spaces
in them (ugh!!!) [thanks, guy]
- if HCA supports extended InfiniBand counters, read them from /sys if
present, otherwise read them with perfquery {thanks fred and roy]
- NOTE: error counters are not present when looked at extended
counters and so will be reported as 0
- removed IbDupCheck from collectl.conf since perfquery monitoring
always checks for dups
- since extended counters do not need to be cleared, you can now run
multiple copies of collectl when used
- fixed bug in -sX because it was generating wrong stats and more amazingly
nobody ever noticed
- removed quadrics and myrinet code, indicating end of an error for proprietary
interconnects, but without them we may not have gotten to 10Gb or IB as quickly
- new switch: --cpufilt allows filtering on CPU number in the same
way as dskfile and netfilt, primarily for use with high cpu counts.
also honored when reporting interrupt stats
- fixed typo for sorting on 'syst' [thanks stig]
3.7.2-1 Mar 5, 2014
- added optional groups & titles to ganglia export module [thanks peter]
- removed extra '%s' in gexpr/senddata call for ipmi
- an error trying to run dmidecode when it wasn't there was fixed some time
after v3.6.0 but never made it into the release notes. [thanks seb]
- added additional stats for disk details to graphite.ph [thanks bob]
- changed format for AccumTim reporting for process data in prc file to
be a single format. [thanks andy]
- fixed a problem with --procanalyze when processing multiple raw files, it
was not clearing the right data structures
3.7.1-1 Jan 7, 2014
- removed nvidia and sexpr from kit as warned over a year ago
- lookup of uid:gid via grep needs trailing ':' in search or it will
incorrectly match first entry with longer name string
- changed deprecated use of defined(@$impiRemap) to defined($ipmiRemap)
re: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=728760
- when rearranged logging code for E/F messages for syslog, ending up
using a variable that wasn't yet defined
- during playback of multiple files with different host names, disk/network
indexing structures need to be reinitialized
- when filtering network details, the Num in the output should start at 0
as opposed to its value when not filtered which left holes in numbering
- was reporting swapped data as bytes when in fact it is reported by the
kernel in pages. it now reports swap sizes correctly by multiplying by
the correct page size. [thanks philippe]
3.6.9-1 Oct 18, 2013
- typo in network plot header loop resulted in infinite loop [thanks andy]
- remove $int/secs from numa hit rate calc AND add more precision to its output [thanks stig]
- need deal with a new process showing up with an existing pid, though rare
it can happen when a high rate of process creations [thanks guy]
3.6.8-1 Jul 20, 2013
- new flag $exportComm must be set in gexpr/ganglia so that they won't
generate an error if run without -f or -A [thanks tom]
- new switch: --intfilt allows filtering of interrupts
- always log messages of type F/E to syslog in daemon mode even if
-m is not set [thanks again, tom]
- wasn't dealing correctly with missing whitespace after network name in
/proc/dev/net in initRecord() [thanks andy]
- updated init.d script for suse per the maintainer's instructions [thanks tom]
- extra spaces were being printed in plot mode for tpc stats
- added entry to envrules.std to deal with intel Phi Co-Processor
- debian init.d script now does 'exit 1' if status reports 'not running'
- rawnetignore switch wasn't working correctly
- found/fixed some subtle problems with --procanalyze as well as some cleanup
- need to ignore first sample after initializing summary arrays
- need to init summary hashes for thrutime and accumT because get uninit var
in print routine is only a single process entry
- found a typo in procAnalyze() to a $usecs which wasn't being used!
- added error check to make sure --procanalyze with -P requires -s
- added a little more debugging output for -d128
- discovered dynamic disk/network detail names for interactive mode were not
being reported correctly. sounds a lot worse than it is because this is
typically not done very often nor are disks/networks very dynamic except in
large, virtualized environments such as clouds
- add to list of devices to exlude from network summary data: tap, dp and nl,
which are associated with openstack cinder. remember you can always add
more to that list with --netfilt
- $lastHour was never referenced and dayInit() called every time a log was
created so fix logic to update $lastHour correctly AND call initDay() one
time and do it before newLog() called.
- closed a couple of file handles that were left open and reportedly
causing some defunct processes with -sx. [thanks brian]
- fixed bug in lustre stats recording [thanks roland]
- clarified --showsubopts text about disk and network filters in that they
apply to both summary and detail data output
- fixed problem with --import and --stats
- --statsopt a didn't work because when changed some internal logic missed
changing a test of $timestampFlag to $timestampCounter[$rawPFlag] and so
now $timestampCount can be removed entirely
- clear $firstpass after 1st pass during playback
- make sure filename initialized before calling loadConfig so if there is
an error logsys() doesn't get an undefined var warning
- to be safe, remove any quotes on net/dsk filters in case included by
mistake in DaemonCommands string
- tightened up tests to see if daemonized collectl already running
- if no hiRes::Time, fudge the value of $microInterval based on -i [thanks Domi]
- new --procOpt k, removes known shells from process listing with -sZ,
currently set to /bin.sh, /usr/bin/perl, /usr/bin/python and python
- fixed varname in lexpr: $debug should have been $lexDebug
3.6.7-1 Mar 8, 2013
- set network speed for vnets to '??' so they'll use $DefNetSpeed for
bogus checks since the kernel hardcodes then to 10 which makes no sense
[thanks rick]
- code to print brief totals for -st wasn't include in a conditional
so you'd always get extra columns of output when -st was NOT included
- needed to initialize numaMem->{lock} for cases where user selects -sM
and no data collected [thanks laurence]
- added randomize [thanks robert] and align switches to graphite module
and align switch only to gexpr.ph since gexpr uses current times in messages
- added escape switch to graphite to allow one to change the dots in hostname
- change to suse startup script to look in /usr/sbin instead of /usr/bin
- added debug mask of 16 to lexpr to help test x= switch
- can now use commas OR colons with lexpr,x= though commas preferred and
colons may go away
- added disk qlen, wait, svctime and util to lexpr
- it was pointed out that in getExec() I'm initializing $oneline instead of $oneLine
[thanks joe]
- for debian init script, reverse logic for running start-stop-deamon with
-test so it will work with buxybox too [thanks chris with help from troy]
- new switch: --cpuopts z (the only option) which suppresses lines of idle
activity from detailed stats
3.6.6-2 Dec 7, 2012
- when purging imported detail plot data, only do so if file had changed
- when playing back multiple files, do NOT try to process a new file that
has not yet seen the end of the current interval ($timestampCound==1)
- fix SuSE init.d script, [thanks tom]
3.6.6-1 Nov 25, 2012
- last version broke lexpr and it wasn't correctly handling intervals
other than 1
- do not set $dskChangeFlag to 4 when maj/min numbers change as it does
not mean the stats changed
- removed checks for major/minor disk numbers changing
3.6.5-2 Sept 27, 2012
- was not updating new major/minor numbers for a disk when they changed so
got stuck in a loop which kept disk maj/min changed every interval
- new -r option to purge older .log files, def=12 months
- fixed DaemonCommands to preserver order so you can override anything by
adding on the right side of it
- new 'align' switch added to lexpr so default is NOT to align to whole min
- for -sE do not convert negative temperatures [thanks kevin]
- add error handling to 'print' in logmsg
- vmstat needs to set $sameColsFlag to make header pagination work with -p
- new graphite switch f, use fqdn for host [thanks Bryant]
3.6.5-1 Sept 10, 2012
- when lexpr called with x= it needs to set summary data flag in case
nothing else is being reported, otherwise timestamps print after the
data instead of before
- lexpr typos: $tcpError, $udpError and $icmpError should not be singular
- timestamp wasn't being updated for -sD because it was specified in $dskdetFormat
- explicitly close logs before opening new ones in the hope that the occasionally
corrputed file problems with gunzip will go away
- tcp 'last' variables weren't correctly initialized and so was printing bad data
on first line of output
3.6.4-2 August 28, 2012
- modified lexpr, gexpr and graphite such that when i= is used, to align
sending on whole minute boundaries which is particularly useful with rrd
3.6.4-1 June 25, 2012
- merged snmp and tcp stats under -st and changed export routines to
show summary error counts for -st. removed snmp.ph from kit.
summaries (based on --tcpfilt) as does brief format
- correctly deal with dynamic disks/networks
- instead of pulling names from header, get them from raw file when discovered
- simplify code that deals with changed disks, now that more cleanly handled
- replace runtime calls to 'die' with calls to syslog
- readS was still left in INSTALL! [thanks gavin]
- added system boot time to header
- new values for procopts s/S to show process start times
- graphite.ph now prints loadavgs to 2 decimal places [thanks brandon]
- extended lexpr,x= functionality to also call an init routine
- initFormat now returns entire header!
- if nothing returned from an import module on a printVerbose or printPlot call
for detail data do not call printText() since it will screw up colmux and
plot detail file with empty lines
- new --rawdskignore AND --rawnetignore because sometimes easier to specify
a pattern of things to ignore
- removed restriction for running as root to get network speeds via ethtool
by looking in /sys/devices now
- slight change to way the disk queue depth is being calculated to provide
better accuracy [thanks ken]
- new --dskopts f reports disk details with some fractional values
- always calculate disk details even when only doing -sd since a plugin
might want to get at them
- new graphite switch b, will cause output to be prefaced by a specified string
[thanks justin]
- slight change to s= functionality for lexpr, gexpr and graphite: no arguments will
disable all but imported data, allowing you do log -s data to files sending over socket
- need to give other routines (specifically --import) access to the lexpr
interval by declaring it with 'our'
- had to change the way lexpr/gexpr/graphite do min/max/avg since they were
using a positional index to track intermediate values when clearly a hash
is required for cases where not all intervals contain same elements
- -P and --plotflag had different effects on $headerRepeat because prior to calling getopts
I was peeking ahead for an ARG of -P and not including --plo [thanks devilized]
- gexpr module has wrong units for network packets and with 'g' modes had to multiply
kb counts by 1024 to convert to bytes, which is the units for these that ganglia uses
[thanks, trevor]
- clean up handling of missing ipmitool and root access [thanks trevor]
3.6.3-2 May 01, 2012
- finally remembered to remove readS from the kit [thanks joseba]
- when filtering a process by the fill path with 'f', never include collectl itself
- documented utime in manpage
- if -i0 set $DefNetSpeed to 0 so we don't throw any 'bogus' network speed messages
- new switches, --rawdiskfilt and --rawnetfilt, allow one to filter disks/nets at time
of data collection so they never appear in raw file
- added call to IntervalEnd() (if it exists) for --import
- add option timeout to --address when connecting back to explicit address
- moved code that deal with fractional intervals and !HiRes closer to other interval processing
- added 'strict' to snmp module as well as 'help' option: snmp,h
- fixed problems with --import
- if --import is used to generate detail data with -f and -P not specified, collectl throws an error
trying to close the detail log which clearly hasn't been created
- when using interval other than the defaul AND -s-all, blank lines are printed for standard intervals
which don't have imported data. this applied to brief, verbose AND detail data
- added some more systems to envrules: Proliant SL230/SL250 Gen 8 and SE1170s
3.6.3-1 Mar 03, 2012
- fixed serious bug introduced a number of versions ago, which during playback of multiple files
and specifying date/time caused collectl to continue reading first timestamp in each file
and generating 'uninit variable' errors. not harmful, but inefficient and ugly!
- added exit codes of 0/1 to all the exit points
- moved help text for --stats from basic to extended
- found $file=~/rawp/ near line 1440 clearing $1, $2 and $3 and so $prefix, $fileDate and $fileTime
were not getting set correctly
- clarified 'No files processed' message to be a little more explicit
- broaden where collectl looks for lustre modules and also fixed a typo
of $lustops to $lustOpts [thanks brian]
- procAnalize incorrectly totaling fault totals instead if interval values [thanks andy]
- limit sizes of -procfilt for username/command to 19 and 15 respectively
- change order of ps command in loadPids() so they return max length fields for user/command
- remove () from command field from /proc/pid/stat in pidNew()
- optimize new pid processing with --procfilt
- add new pids to pidSkip{} as appropriate
- undef pidSkip{} whenever pids wrap
- added hello.ph and graphite.ph to INSTALL
- was incorrectly setting DiskFilterFlag to 1 all the time, even when not overridden in
collectl.conf. while not a bug, it does cause a slight increase in overhead
3.6.2-1 Feb 28, 2012
- changed behavior of --runas to no longer require a change to /etc/init.d/collectl
as it now uses /var/run to write collectl.pid into. this means to ineract with a
non-root daemon, you still need to be root, which makes sense.
3.6.1-4 Feb 20, 2012
- removed --ssh switch, making detecting the parent going away the default behavior
- added switch --nohup which will allows collectl to continue running if parent exits,
which is more consistent with how --nohup itself works
- in logmsg ONLY write to STDERR when attached to a terminal
- serious problem when using --tworaw and a flush interval < that for the process data
occurs because newer versions of zlib will fail if you try to flush to a file that
has not been updated. since I don't know which version of zlib this started happening
in and feel this is a relatively rare case, we're just rejecting this combination
regardless of zlib version. I do have an email out to the zlib author and if I ever
get to the bottom of this will be ble to relax this restriction.
- use getimeofday() for timestamps in logmsg()
- enhanced timing parameters when -i0 used. if specified user 2nd/3rd parameter as ratio
to first making it possibily to measure loads of different rations other than 1:6:30.
- discovered --import was missing from man pages and so added it
- when playing back a file, set $verboseFlag if user specified --verbose but NEVER clear it
- experimental import: snmp, see http://collectl.sourceforge.net/Snmp.html for details
- printf in record() blows up if formatting chars in command string! [thanks mike]
- added accumulated time as a --top sort option
- changed formatting of accumulated time in process output to simply be hh:mm:ss or
mm::ss.ss when less than an hour to be more in line with top
- new swithes, --stats and --sumstats report stats in brief mode, the latter only summary
data
- during playback need to check $numProcessed before reporting none were processed
- stats reporting logic wasn't processing 1st file, checking for $numProcessed>1
- removed -oA and replaced/extended functionality with --stats/--statopts
- wasn't allowing --procopts playing back process data unless -sZ which was silly
- subtle problem found: illegal 'last' in pidNew() because file disappeared between initial -e
and trying to open it a few usecs later! can't exit a sub via last so changed to return(0)
- our friends at OFED slightly changed the output of perfquery again [thanks frederic]
3.6.1-3 Jan 13, 2012
- added 'Reason: $!' to socket open failures
- was not reporting interrupts in playback mode correctly
- added $memAnon to lexpr
- need to initialize $thisConfig when --lustopts set [thanks joe]
- do not allow -f with gexpr and not one or both of -P/--rawtoo [thanks again, joe]
- modify misc.ph to honor --showcolheader
- modify lexpr, sexpr, gexpr to reject --showcolheader
- if --showcolheader and --export (only works with vmstat for now), exit after first
print call
- remove restriction of not letting someone use --home with proc/slab data since they
may want to apply filters and therefore not need more than a terminal full
- new switch, --comment, allows a user to add a comment to the header
- only read /proc/slabinfo IF slab monitoring requested AND if slab monitoring requested
make sure /proc/slabinfo is readable (some admins only allow root access)
- added code for slow proc read speed test on all system >= 32 CPUs except for RHEL6.2
and SLES 11 SP1
- if /sys/devices/system/node doesn't exist, set CpuNodes to 1 and disable -sM if set
- fixed a lot of typos in a lot of docs
- only set a socket failure handler with a socket is explicitly being opened
- added 'h' option to gexpr, lexpr and sexpr
- changed the way vmstat.ph decides to print its header
- added new process option: x, which adds extended data to standard display
- added Mlocked to verbose memory output as well as numa stats [for fred]
- changed root name for cpu detail data in gexpr from cputotals to cpuinfo [thanks evan]
- new export: graphite
- normalization for CPU load reports jiffes instead of a percentage [thanks guy]
- removed restriction against using -D as non-root user
- as per https://bugzilla.redhat.com/show_bug.cgi?id=716825, non-root access to /proc/pid/io
is now considered a security hole and so may not have read access! therefore we need to
check to see is the io structure is readable before trying. if it isn't, zeros will be
reported for non-readable structures
- new procopts option I, disables collection of IOSTATS and reading of /proc/pid/io,
a performance optimization at the expense of less process information
- newswitch, --runas will cause collectl to run as a non-root daemon. this WILL require
changes to the init.d script to work! be sure and read the man page
- changed location where $doneFlag was getting cleared because stopping the daemon before
initializtion was completed was causing the flag to be reset to 0 and not left at 1
- change sort limit for process counters from 6-9s to 9-9s [thanks stig]
- added SUSE SP info to header
- added debian and ubuntu release/distro info to header
3.6.0-3 Oct 17, 2011
- added dirty memory to lexpr
3.6.0-2
- support for numa
- split anon pages into separate field in verbose mode as well as plot format
- changed the memory header for -sm to SUMMARY rather than STATISTICS as the
latter is currently used to indicate detail data, something that didn't exist
for memory prior to numa support
- added --xopts i to be consistent with --dskopts and --netopts. did NOT add
such a switch for lustre
- expanded error checking with perfquery to catch 'Failed to open' errors
during initialization
- discovered and removed reading of /proc/stat during -sm, which was there to support
2.4 kernel fields that have since been moved
- changed collectl-debian start script to use /bin/sh instead of bash
- removed ".B collectl" at start of collectl man page for debian/lintian compliance
- made width of number of dentries in -si --verbose 7 instead of 6 digits wide
3.6.0-1
- do NOT call derived() when playing back rawp files or you'll get unit var
for $memUsedLast.
- need to include non-numeric type interrupt counts in interrupt totals
- fixed a few problems with envronmental data and interpretation of --envopts
- was not allowed to use with -P and only 'M' should have been restricted
- was only honoring C/F when temp name started with Temp rather anywhere in string
- was not correctly overriding default ipmi devices with user define options
- fixed formatting/calculations for interactive memory subtotals generations when
RETURN is typed in conjunction with --memopts R in brief mode
- added new section to FAQ called 'gottchas' as a place to describe the perils of
round-off error and normalization
- when printing verbose data in import modules, need to clear $$lineref or the last
line that mainline collectl reports (if any) will be repeated. this was fixed
in hello.ph and atigpu.ph
- new switch: --dskopts z, which when specified filters out disk details lines of all 0s
- added switch examples to start scripts for clarification of use
- added support for 'vd' disks [thanks gavin]
- since kernel 2.6 compatible with 3.0 and 2.4 is sooo old, 2.4 support officially dropped!
[thanks for the push, tony]
- dropped support for collectl data generated by versions of collectl older than 2.0
- need to set $cpusEnabled to 0 when playing back interrupts in plot format w/o -sC, since the
code that normally does that has already been executed and 'C' not yet added to $subsys. subtle...
- filled in some missing ; in nvidia.ph in PrintPlot routine
- fixed problem writing plot files with --import
- added 'i' to both dskopts and netopts which will cause i/o sizes to be displayed in
brief mode like --iosize except in this case independent of each other
- do not include virtual networks in network summary [thanks hank]
- in newLog() need to use gettimeofday for current time when hires::time is used otherwise you'll
occasionally get a time 1 second earlier and new files names are wrong! [thanks hank]
- exclude vlan from network totals to avoid duplicate counts [thanks andrey]
- added 2 new fields to verbose cpu Summary Stats - Run Total and Blocked Total
- added VmSwap to process/memory display
3.5.1-1 May 23, 2011
- change expression used to find CPU count in /sys since -P isn't necessarily
built into all greps
- instead of only getting the platform name when -sE, always try to get it
- forgot to include 'T' as valid --envopts
- check for failure of 'ipmitool sdr dump' command
- need to ignore interval checks with --showcolhead and -sE
- fix bug in checkSubSys() because while it could find newer subsys it couldn't find
dropped ones
- needed to clear nethostflag outside conditional that looks at prefix changed, which
was incorrectly preventing consecutive files on the same day from being identified
- added new routine pushmsg() that allowed one to stack up messages generated BEFORE
'beginning execution' message and then play them afterward, making log easier to read
- changed several calls from logmsg() to pushmsg()
- added support for files that cross midnight and ability to play them back in full
see updated Playback.html
- remove duplicate message in sexpr
- have found an instance where the number of networks in the header didn't match the ones
listed (some were dropped!) and so added a check to take care of this
- renamed $active, $inactive and $dirty to $memAct, $memInact and $memDirty for better
consistency with other memory variable names. Didn't bother with older V2.4 mem variables
- new switch --memopts R: display memory info as changes/interval, similar to sar's -R switch
- logic to clear '$sameColsFlag' in verbose mode and --import was wrong
- --showcolheaders and -sE requires root
- added support for nvidia driver V270.41.19 which has different output format. highly
probable other versions will behave different as well
3.5.0-3 Feb 12, 2011
- expanded interrupt details to include non-numeric interrupts
- new import module added for GPUs: nvidia.ph
- added getExec type 0 to support new import
- updated version of gexpr, with new switches to control using default ganglia
variable names
- bug fix: wasn't sending E and F types messages to syslog
- wasn't initializing enough 'last' vars for latest nfs V4
- only allow -sT with -P or -f
- added new switch --tworaw as a synonym for --group which makes more sense
- if an imported module returned -1 in its init routine, disable it. return
1 for success
- new --procopt: R causes real-time priorities to be displayed rather than RT,
at the cost of 2 extra columns in the display [thanks lee]
- added optional callback GetHeader to --import API, if not defined not called
- change error handling when playing back files with no selected subsystems to be
non-fatal, skip the file and continue processing
- added dl585-g7 to envrules.txt
- allow -s-all to remove ALL L subsystems when you only wanted --import data played
back. I actually forgot to add this to release notes until V3.5.1
3.5.0-2 Jan 09, 2011
- turned utime into a mask, so we can control the granularity of micro-logging
to include /proc time with/without process accesses
3.5.0-1 Jan 09, 2011
- renamed --showplotheaders to --showcolheader since it now applies to ALL headers
for single header line output (will only show cpu for -scd --verbose)
- fixed ALL verbose and detail output formats to include date/time headers
- newer kernels added additional files to /sys/devices/system/cpu/ which messed
up the way total CPUs were being calculated
- added 2 new variableS to lexpr: cputotals.num and cputotals.total [thanks chris]
- removed unused switch --pidfile from collectl -x
- file processing push/pop code wasn't handling data change correctly
- added new flag to show host changed since THAT was what was needed in 'consecutive'
file identification processing
- found problem with playing back multiple files with --thru for different hosts!
needed to 'undef $newSeconds[$rawPFlag]' whenever hostname changed
- new netopts values
e - show errors in brief mode and explicit types everywhere else
E - only print lines that have non-zero network errors in them
- new diagnistic switch --utime, causes periodic micro-timestamps to be written into
raw file at different points in time for finer grained measurements of operation times
3.4.4-3 Dec 9, 2010
- if -s during playback, at least ONE requested subsys must be in recorded file.
if c recorded, C would cause error message because pattern match didn't have 'i'
- add requirement for STDOUT to be connected to a terminal as a condition to call resize
- change to collectl.conf - roll logs at exactly midnight, not 1 minute past
- new --envopts value of T to truncate values to integers
- ignore 'Fan Redundant' in env data for dl160g6
- if impi data field is blank, ignore it
- fixed filtering of ipmi data AND renames 'c' option to 'p', for power
- include THRD in -P format for processes
- only turn off echo when in brief mode AND not playing back a file
- if data collectl w/o HIRES and display request msec, set default to '000' instead of 0
- discovered only --ssh in help so removed -S
3.4.4-2 Nov 10, 2010
- base36() needs to do an int() on values <10 so their fraction not
included in output string
- reduced printing of headers for -sf --verbose to one call to printText()
per line. otherwise one hostname prepended to each line of socket call.
- fixed a problem with --procfilt C: it was trying to match whole process
name rather than just the beginning of it [thanks gary]
3.4.4-1 Nov 09, 2010
- vmstat not handling date/time correctly, needed $dateTime[0]
- need to call export module's init routine in playback mode
- lustre 1.8.4 module location moved, check expanded [thanks Frederik]
- new top sort options, pid and cpu, which don't make a lot of sense
unless used with filters
- do NOT include hostname in RECORD printing routine with -A
- CPU verbose output should not right shift 1st header line with -oT
- removed printing of extra '$line' at end of NFS DETAIL header
- incorrectly setting recSubsys to [YZ] if user specifies --top even
if -s specified too! They should be merged [thanks mats]
- don't write to a socket if shutting down in which case $doneFlag set
- don't report socket errors if not in server mode
- added 'ProLiant DL160se G6' to envrules.std
- disableSubsys should ONLY remove subsystems from export option 's='
was also clearing KFlag rather than LFlag [thanks chris]
- new process sort option 'thread', sorts by thread count
- changed start/stop in initd scripts from "$network +openibd" to "$all"
so collectl will start after everyting else
3.4.3-3 August 19, 2010
- added --netfilt
- very rare: if playing back CPU data but none collected, be sure to
set $cpusEnabled to number of CPUs or else you'll get warning that
one or more disabled
- pattern match wrong for 'emcpower' disks [thanks lewis]
- changed disk details to use 'cvt()' for reporting number of I/Os since DM numbers
can be more than 4 digits
- change --umask behavior. default is to do nothing unless explicity set
AND user is 'root'
- 2 new process sort fields: pid and cpu
3.4.3-2 August 16, 2010
- only look at $cpuDisabledFlag when processing CPU data
- perfquery in OFED 1.5 can report warnings in its output stream which need to be ignored
- if you try to playback a file and specify -s with no existing subsystems you'll
get an error
3.4.3-1 August 02, 2010
- perfquery checks problems
- version finding code not working correctly for ofed 1.5
- disabling -sl by mistake when perfquery not found
- when errors detected during initialization not skipping subsequent checks
3.4.2-5 July 21, 2010
- changed INSTALL to only execute commands like chkconfig OR update-rc
when $DESTDIR is / [thanks mike]
3.4.2-4 July 09, 2010
- added --dskfilt
- added check for client-side OST uuid status 'DEACTIVATED', which seems to
have showed up somewhere in the 1.6 timeframe but now sure when, thanks Heiko
3.4.2-3 June 25, 2010
- new memory field 'SUnreclaim' ONLY available in plot format and lexpr,
just not enough room in terminal based output [thanks seb/fred]
- misc now considers uptime, mhz and mounts as 'lightweight' counters and will
sample every standard interval. Only logins, which is heavy-weight, will be
sampled based on "i=" or the default of 60 seconds. Further, all lightweight
samples will be returned every interval by lexpr whereas the heavy-weight ones
will only be returned when sampled. In order to keep sexpr/gexpr formats constant
(primarily because I don't know the effect of not doing so), they will report
all counters every interval.
- support for CPUs dynamically changing stats and going off/on-line
- NOTE -- can't detect this during interrupt processing unless also
monitoring CPU data, which people typically do anyways
3.4.2-2 June 15,2010
- not correctly handling discovery of new disks during playback
- new feature: select process by UID range [thanks mark]
- fixed bug in --procfile u/U processing while testing
- added systot and usertot to lexpr to report totals for all system and user
counters
- changed error message processing when trying to playback a file with process
when there isn't any or slabs data, etc. Rather than only show the message
when -m, which could result in only a 'no files processed' message they will
be unconditionally displayed as they should
3.4.2-1 May 21, 2010
- change default umask to 133 so that colplot can read files since webserver
doesn't have privs
- now that raw files are always compressed, the message about disabling it
with -oz when no compression no longer makes sense so the message has been
clarified to use --quiet with raw files and -oz with plot files
- added README-WINDOWS to src tarball
- cleaned up code that still expected [com] in $lustOpts instead of $lustreSvcs
- more cleanup and bug fixes to INSTALL for debian support. thanks bernd
- change to /bin/sh
- do not use ANY explicit paths
- minor changes to man pages, also for debian restrictions
- wasn't reading NfsFilter correctly from header on playback
- save perfquery version and use it to drive the skipping of 'field 13' rather than
OFED versions which isn't always available
- do not issue 'stty' if !PC, running on terminal and !background. missed a couple...
3.4.1-5 Mar 30, 2010
- new env options F/T converts temps to C or F
3.4.1-4 Mar 29, 2010
- new switch --whatsnew prints a summary of changes, a mini-release notes
3.4.1-3 Mar 23, 2010
- added Fusion-IO card to list of valid disks: fio
- gexpr, lexpr and misc weren't honoring internal interval counter.
- if a secondary/tertiary interval specified gexpr/lexpr didn't process
it correctly
- new switch: --envfilt allows you to specify filters
- if you specify a " in DaemonCommands it gets passed along in the variable itself
(not a problem for ') so we have to remove them
- added new section 'Filters' to header. Added EnvFilt and moved NfsFilt to it
- added new switch --envremap, which allows for renaming one or more output field names
- added new feature switch to lexpr. if x=file is specified, that file will be loaded
via require and a corresponding function name called after every print cycle, allowing
one to do modified, custom output
- new switch, --umask too control output file protections, see man umask. default is 0137
- new environmental option - if you include a device number with --envopts use THAT as a
device number with -d when running ipmitool. for some systems the default devices is
the slower one and this will have an impact on how fast ipmitool will run, possibly
slowing down collectl
- added 'use 5.008000', which should have probably been there years ago
3.4.1-2 Mar 16, 2010
- do now allow -oA in verbose mode
- consolidated all code to disable -s subsystems when a conflict consolidated into
disableSubsys which ALSO disables them in s= clause of --export
- removed code to disable s= in all the ph export modules since now redundant
- support for DESTDIR env variable in INSTALL/UNINSTALL [thanks Bernd]
- Voltaire changes output of ofed_info so we have to process IB version
slightly differently
- change lustre message about needing -L to --lustsvc
- changes to lexpr to include processes in run queue and to change prefix
for proc creates/runs to 'proc'
- changes fo misc.ph to ALWAYS report latest values in --export as well if 'a'
paremeter, noting the default is to only report them when sampled. collection
still defaults to 1 minute, overridable via 'i='.
- since loading formatit.ph moved in a recent release, any calls to error()
before it's loaded since it needs a routine internal to formatit. so now
only call printText() from error() if formatit loaded.
3.4.1-1 Feb 22, 2010
- when printing plot data to files, wasn't putting headers on subsequent days' files
3.4.1-0 Jan 10, 2010
- make sure all major release settings in RELEASE-collectl have dates
- remove blank line in all collectl start scripts right before 'END INIT INFO'
since debian doesn't like it and we should be consistent
3.4.0-4 Jan 04, 2010
- updated envrules to include additional parsing rules for dl185 [thanks evan]
- changed envrules header for dl585 G1 to G5
- if running an ofed >= 1.5, ignore 'CounterSelect2' field, which is right in the middle
- send errors in getExec() to /dev/null because perfquery for > ofed 1.4 is braindead
- was incorrectly using 256 to print IB debugging info instead of 2
3.4.0-3 Dec 14, 2009
- was not clearing right variable for CPU Detail Totals in sexpr.ph
- fixed typo on QLogic HCA name from qlib to qib
3.4.0-2 Dec 13, 2009
- fixed typo of HugePages from HughPages [thanks Frederic]
- fixed typo of 'openib' in start script LSB headers to 'openibd'
- clarified help and man page for --all to indicate ONLY summary data will
be reported, meaning NO process or detail data either
3.4.0-1
- restructure installation directories to be more standard
- pid was not properly set for suse flush command
3.3.7-1 Nov 26, 2009
- added support for psv [polyserve] disks
- added support for QLogic IB HCA
- changes to INSTALL/UNINSTALL to handle gentoo and to restructure 'generic'
distro processing for more flexibility in the future
- 3 'standard' tools turned out not to be standard on gentoo and so:
- limit checking for ethtool to writing to log file OR --showhead
- if can't find lspci during -sx processing (and -sx IS a daemon default),
disable -sx rather than throw a hard error.
- only use dmidecode if -sE and if not found, set product name to 'Unknown'
- creating /var/log/collectl in INSTALL so when installed this way the
daemon writes logs into that directory instead of /var/log. this now
matches what an RPM install does
- if required include files can't be find in same directory as collectl, look
in ReqDir which is initially set to /usr/share/collectl. This can be
changed in collectl.conf
- when exiting due to a fatal error, be sure to exit(1) and not just exit.
- some process I/O counters found to be missing on CentOS 4.8 and so had to
initialize to 0 in case not found
- wasn't catching 'ioall' as invalid --top option
3.3.6-2 Sep 16, 2009
- if printing interrupts in brief mode, Cpu headers have to be changed as the number
of cpus increase to 2 or 3 digits. [thanks Aron]
3.3.6-1 Aug 19, 2009
- changed error message about missing ethtool or lspci to just ethtool since
missing lspci was already caught and reported
- change location of collectl to /usr/bin in collectl-debian
- make -P honor --hr which it currently does not [thanks giles]
3.3.5-4 Jul 20. 2009
- performance optimizations in dataAnalyze()
- check process/slabs first whenever type is proc/slab. then in a separate clause
look at subsys, thereby preventing parsing of type in other checks
- always include test of subsys and do it first. found to be completely missing
in lustre tests
3.3.5-3 Jul 17, 2009
- expanded meaning of -G to include slabs in 'rawp' files and to add 'g' to the Flags
in the header, which also uncovered a number of bugs in the way batches of files for
different hosts/dates were selected/handled even before slabs were added
- drop support for -sy in brief mode since it really doesn't make much sense and if you
do specify -sy it now forces verbose mode. see Slab documentation for more on playing
back files generated with -G
- if can't find an ofed utility AND rpm isn't on system, don't use it [thanks seb]
- fixed some problems with -oA processing
- removed a couple of error checks for switches that don't apply to a particular option
since they are silently ignored already, making it easier to recall a command and add
switches rather than having to remove those that don't apply
- flush STDIN at startup in case someone typed extra CRs
- added col2tlviz to kit
- changes to --export processing broke --vmstat so moved call to setFlags() from right
before playback code (which sets them itself) to right after call to $expName init routine
- changed start scripts so that if you can specifice "start/restart {[extension] switches]"
making easier to use/document. the old syntax which put the switches 1st meant you had to
use "" if you didn't want to change them AND it didn't work with redhat's 'service' command
3.3.5-2 June 30, 2009
- added client.pl to examples/ and moved readS to /examples
- added new switch --procstate, which allows you to limit process displays to
only show those processes in one or more explicit states
- incorrectly looking for 'LustreVersion' in header instead of 'CfsVersion'
- when dropped SubOpts from header it broke pattern matching for subsys in
header during playback
- only calculate disk detail stats using CPU time when hires not available
- when reporting a lustre server that is both an MDS and OST in brief mode,
the 2nd line column headers are reversed for the types of server
- removed obsolete switches (and warnings) -b, -e, -oP, -Y, -Z, -O, --subopts and -sLL
- changed buddyinfo headers in verbose, plot and detail files being sure to include
name/zone after : in details [thanks bayard]
- use mergeSubsys() everywhere $userSubsys is used to reset value of $subsys
- changed some instaces local variable $file to begin sorting out of local variables
with the same name as the global one
- if newlog starts and NOT an interval 2 interval, we don't record correct slab data so
only clear $newRawSlabFlag (also renamed for clarification) during interval 2