interesting benefits of solaris

well this is slightly surprising, but in a very good way, and does lead to some interesting suggestions on how to best to improve matters, but look at the following graph of FAST ESP query latency:

Notice that the average latency drops as we use the server more . . . but WHY?
Well that’s just because we’re running the FAST indexes on a ZFS based file system and the L2 ARC cache is making it’s presence felt


# arcstat.pl
Time read miss miss% dmis dm% pmis pm% mmis mm% arcsz cur
11:25:52 13G 263M 1 158M 1 104M 15 44M 11 2G 2G
11:25:53 29K 103 0 97 0 6 2 2 15 2G 2G
11:25:54 10K 161 1 156 1 5 13 1 9 2G 2G
11:25:55 10K 197 1 174 1 23 18 3 50 2G 2G

Of course, I’d really like to try playing with a few Enterprise grade SSDs to supplement the L2 ARC – should be able to soak most of the “hot” data from SSD without going back to the spinning rust (admittedly the full index data set is only 80GB)

*patiently waits for Sun to get their fingers out*

Update
Using Ben Rockwood’s arc_summary.pl tool we get the following view into the ARC cache:

ARC Size:
         Current Size:             4659 MB (arcsize)
         Target Size (Adaptive):   4659 MB (c)
         Min Size (Hard Limit):    1023 MB (zfs_arc_min)
         Max Size (Hard Limit):    31735 MB (zfs_arc_max)

ARC Size Breakdown:
         Most Recently Used Cache Size:          29%    1393 MB (p)
         Most Frequently Used Cache Size:        70%    3266 MB (c-p)

ARC Efficency:
         Cache Access Total:             1464939081
         Cache Hit Ratio:      91%       1342983472     [Defined State for buffer]
         Cache Miss Ratio:      8%       121955609      [Undefined State for Buffer]
         REAL Hit Ratio:       78%       1146170142     [MRU/MFU Hits Only]

         Data Demand   Efficiency:    91%
         Data Prefetch Efficiency:    86%

        CACHE HITS BY CACHE LIST:
          Anon:                       10%        142482495              [ New Customer, First Cache Hit ]
          Most Recently Used:          5%        74249410 (mru)         [ Return Customer ]
          Most Frequently Used:       79%        1071920732 (mfu)       [ Frequent Customer ]
          Most Recently Used Ghost:    1%        19996413 (mru_ghost)   [ Return Customer Evicted, Now Back ]
          Most Frequently Used Ghost:  2%        34334422 (mfu_ghost)   [ Frequent Customer Evicted, Now Back ]
        CACHE HITS BY DATA TYPE:
          Demand Data:                53%        712758575
          Prefetch Data:              17%        241164086
          Demand Metadata:            20%        280805976
          Prefetch Metadata:           8%        108254835
        CACHE MISSES BY DATA TYPE:
          Demand Data:                52%        64233340
          Prefetch Data:              30%        36667246
          Demand Metadata:            15%        19272211
          Prefetch Metadata:           1%        1782812

playing with tcptrace and xplot



# tcpdump -ni en0 port 80 -w output.trace
# tcptrace -G output.trace
# xplot *tput.xpl

From the online manpage:

  • Yellow: instantaneous packets
  • Red: Throughput for the last few packets
  • Blue: Throughput since the start of the stream/connection

Other useful graphs:

  • _owin.xpl – outstanding data/congestion
  • _rtt.xpl – round trip time/time
  • _ssize.xpl – segment size/time
  • _tput.xpl – throughput/time
  • _tsg.xpl – time sequence graph
  • _tline.xpl – Timeline graph – W Richard Stevens style

Just some notes here so I don’t forget the basics – manual over at here.

argh, SNMP can really chafe!

Using version 1:

% snmpget -c COMMSTRING -M /usr/local/share/snmp/mibs -v 1 -m
USAGE-MIB:PROXY-MIB:REDLINE-STATS-MIB:REDLINE-STATS-MIB:REDLINE-CONFIG-M
IB hostname REDLINE-STATS-MIB::sessActive.0 Error in packet
Reason: (noSuchName) There is no such variable name in this MIB.
Failed object: REDLINE-STATS-MIB::sessActive.0

Using version 2 (2c):
% snmpget -c COMMSTRING -M /usr/local/share/snmp/mibs -v 2c -m
USAGE-MIB:PROXY-MIB:REDLINE-STATS-MIB:REDLINE-STATS-MIB:REDLINE-CONFIG-M
IB hostname REDLINE-STATS-MIB::sessActive.0
REDLINE-STATS-MIB::sessActive.0 = Counter64: 12247

Slightly annoying that – but makes certain sense

solaris zone utilisation via SNMP

It’s been a bug-bear for a long time for me that the CPU metrics when querying a Solaris 10 host are global and not zone specific (which of course makes sense, just makes it harder to track zone utilisation).

So finally wrote a basic perl script that will provide that information via a SNMP mib, output looks like the following:


> snmpwalk -v 1 -c public localhost .1.3.6.1.4.1.2021.255.7
UCD-SNMP-MIB::ucdavis.255.7.0 = STRING: "Zone name"
UCD-SNMP-MIB::ucdavis.255.7.1 = STRING: "global"
UCD-SNMP-MIB::ucdavis.255.7.2 = STRING: "gallery"
UCD-SNMP-MIB::ucdavis.255.7.3 = STRING: "nakos"
UCD-SNMP-MIB::ucdavis.255.7.4 = STRING: "mcdougallfamily"
UCD-SNMP-MIB::ucdavis.255.7.5 = STRING: "shared"
UCD-SNMP-MIB::ucdavis.255.7.6 = STRING: "packer"
UCD-SNMP-MIB::ucdavis.255.7.7 = STRING: "si"

Script is available at here

Current bugs/issues
# snmpwalk will not step through all the sub-trees

scaling web apps

A little video, thin on detail of course, but hints at some home truths on building/designing scale-able applications (and i’d go so far to say that they are applicable to ALL applications not just webapps)

http://joyent.vo.llnwd.net/o25/videos/LinkedIn-Bumpersticker-LED-Scaling-Rails.m4v

Of course, I know Ben Rockwood like’s his solaris and F5′s – but that’s not going to surprise many (and I just LOVE f5s)

And here’s a blog entry with the details on /how/ that’s done:-

http://www.joyeur.com/2008/04/18/the-wonders-of-fbref-and-irules-serving-pages-from-facebooks-cache

. . . basically as simple as:-

when HTTP_REQUEST {
  if { [HTTP::uri] contains "/popular_something/list"} {
    HTTP::respond 200 content "<fb:ref handle='[HTTP::uri]'/>"
  } else {
    pool facebook.application_server_pool
  }
}

recent applications window

This little snippet will give you a new little window for all those recent applications you launch – I seem to have picked up a habit for closing applications down when I stop working with them for an hour or two (pages etc.)


defaults write com.apple.dock persistent-others -array-add \
'{ "tile-data" = { "list-type" = 1; }; "tile-type" = "recents-tile"; }'
killall Dock