Settings

Theme

Most Pressed Keys and Programming Syntaxes

mahdiyusuf.com

152 points by deniszgonjanin 14 years ago · 98 comments

Reader

quellhorst 14 years ago

I photoshopped this really quick for to compare ruby on Dvorak and Qwerty. https://img.skitch.com/20110908-q24qths9k4u6438wpd989qreci.j...

  • jholman 14 years ago

    Nice. This is the most interesting question about Dvorak, to me.

    Everyone I know who recommends Dvorak is a programmer. And I'm more than content with my comfort and speed typing English; the pain of slowing down to type brackets all the time when programming is more of a pain point for me.

    So with this in mind, it's interesting to note that Dvorak has moved the quote key to a slightly less favourable location, and basically banished paren/bracket. Not a solution for my pain point!

    • IvarTJ 14 years ago

      I personally use a variant of Programmer’s Dvorak. Brackets and various symbols are placed where you usually see numbers, while you have to press shift to access numbers. I think that on a proper Programmer’s Dvorak Caps Lock should cause shift to be in effect on the numbers/symbols buttons.

  • waffle_ss 14 years ago

    Don'tcha know Colemak is the latest craze?

andylei 14 years ago

it'd be interesting to see these heatmaps in some sort of normalized way. for example, 'e' is the most common letter in english, so its the most commonly used letter in these programming languages. it'd be very interesting to see, for example, this heatmap with the intensities divided by each letter's frequency of use in the English language, or across a large set of data including a lot of different programming languages

  • alextgordon 14 years ago

    Just did it for 28000 C files. Here's the results:

        a  0.772163
        b  1.2679
        c  1.78209
        d  1.1195
        e  0.881398
        f  1.47252
        g  0.924242
        h  0.358954
        i  1.06756
        j  0.835313
        k  1.41458
        l  0.981729
        m  1.08955
        n  0.9156
        o  0.73849
        p  1.74468
        q  4.2497
        r  1.21577
        s  1.05023
        t  1.03627
        u  1.2967
        v  1.77662
        w  0.396003
        x  13.7292
        y  0.47566
        z  3.78748
    
    The numbers are (relative frequency in C) / (relative frequency in English). So "b" is slightly more common in C than English, but "w" is a lot more common in English than C.

    The raw counts for symbol characters:

        _  22890057
        ,  10895692
        )  10749798
        (  10745839
        *  9211904
        ;  8187969
        -  6628768
        =  5878296
        >  4428291
        /  3468260
        .  3011078
        {  2212412
        }  2211783
        "  2120264
        &  1647188
        :  1032587
        +  962554
        #  909859
        [  889538
        ]  888722
        <  839910
        |  643903
        %  583092
        !  561462
        \  540456
        '  454201
        @  131199
        ?  112488
        ~  84629
        ^  19064
        $  17922
        `  7272
        [space] 74199965
    • dredmorbius 14 years ago

      What would be interesting here would be a difference analysis or regression giving the preference for any given key in a given language. E.g.: '|' is highly predictive of shell, '$' of perl, '()' for lisp. Might be fun to do in R.

      • pwnguin 14 years ago

        I really need to do reading and research on this, but I'm pretty sure that's what Hidden Markov Models are for. You could watch a webpage go from HTML to javascript and back!

    • zokier 14 years ago

      Could I ask for one more data: the total number of characters and maybe lines? That way symbol/alpha/line ratios could be compared to other languages.

      • alextgordon 14 years ago

        Yeah, when I get a chance I'll gather together some stats on all the languages I have data on (about 40).

        Finder reports 630,942,867 bytes for the whole directory. Assuming most files will be plain ASCII, that should give a good approximation for the total number of characters.

        • zokier 14 years ago

          Based on those numbers I gathered some stats about keyboard layouts:

          * 18% of all characters are symbols, 12% are spaces and 70% are alphabetic

          * 20% of all non-space characters are symbols and 80% are alphabetic.

          * US kb layout users need to use shift for 64% of symbols

          * Finnish/Swedish kb layout users need to use shift for 73% of symbols and AltGr for 7% of symbols.

          * Fi/Swe layout users thus need to use 25% more modifer keys for symbols.

          Conclusion: Fi/Swe layout sucks.

          edit: https://gist.github.com/1205728 python script used to get these numbers (percentages calculated with OOo Calc).

    • troxy 14 years ago

      Do you know why you have slightly different numbers of (){}[] characters? In C or C++ shouldn't those all be paired up to match?

      • zeteo 14 years ago

        It can probably be accounted for by comments (e.g. people sometimes comment out half a block). Although the comments should be left out, so as not to mix the C and the English.

        • alextgordon 14 years ago

          Also string and character literals. It's common to write something like

              if (c == '{')
          
          and not need to test for the matching one.
    • zokier 14 years ago

      Heatmap for symbols only: http://i.imgur.com/yc6fe.png

  • wccrawford 14 years ago

    For this reason, I think it would have been more interesting to ignore the alphabet keys and just heatmap the rest.

    • delinka 14 years ago

      But then you lose the impact of the entire set of reserved words. At the point of ignoring the entire alphabet, you're looking at developer preferences for spacing and operators. Might be a nice sidebar to the existing heat maps.

      • pyre 14 years ago

          > you're looking at developer preferences for spacing
          > and operators
        
        Not really. For example, in languages that use $ (e.g. Perl, PHP) to denote a variable, that's not developer preference. I'm actually surprised that there aren't more operators being used in Ruby. Though I'm not a Ruby programmer, it does not look as devoid of punctuation characters as the heatmap suggests.
      • cpeterso 14 years ago

        What would a programming language look like if it was optimized so its reserved keywords used mostly home row letters (and, in Unix tradition, preferably alternated left/right hands) and operators without shifting? This would be tough, since the home row only has one vowel: a.

        • aangjie 14 years ago

          I was wondering the same thing and noticed that the vowels almost always feature in the top 10 and DVORAK has them all in one hand. yay..Infact, in a very casual observation, i think only 'r' seems to be the letter out of DVORAK layout.. I looked across languages though. Guess this makes me a DVORAK evangelist.:-) And to complete that image i will add this DVzine link.http://www.dvzine.org/

    • zokier 14 years ago
  • kevindication 14 years ago

    Except for Lisp, of course, where ( and ) are more common than e.

    • swannodette 14 years ago

      This is not true. See my other comment on this thread. Dominance of ( ) only reflects a particular coder's naming convention.

swannodette 14 years ago

It's interesting to note that a big reason ( ) dominate in Lisp here is that pg adopts the FP habit of short var names. If anything this is probably just a measure of the tendency to use long vs short name - mainstream OO practice encourages the former. It would be interesting to rerun the heatmap for Lisp with a typical CLOS program. I think you'll find that ( ) no longer dominate.

EDIT: And in fact here's a heatmap of core.logic (1K LOC) which is fairly OO-ish in its design - http://twitpic.com/6hwj88. ( ) are strong but do not dominate everything.

UPDATE: And here's a 1.4K LOC Clojure program, core.match http://twitpic.com/6hwo8w/full. ( ) again do not dominate.

saintfiends 14 years ago

This just graphically displays what I whine about most of the time. Why does my pinky has to do most the work? My pinky is pretty short and all the pinky movements are awkward. It considerably slows down my code typing speed.

I wonder If there would be another keyboard layout specially made for programmers. If you look at it you'll see that most of it has a similar pattern.

  • tomjen3 14 years ago

    It is funny that you should mention it, because I have spent a lot of time finding keyboard layouts more suitable for programing.

    In the end I settled on the programmer dvorak layout. It is basically a standard dvorak keyboard but with the special keys you need to program moved to easier locations (and the dvorak keyboard itself uses the homerow much more efficiently, so there is much less strain on your fingers and your wrists).

    • saintfiends 14 years ago

      I have thought about switching to Dvorak, it's hard to find from where I live. I'd love to know what model or brand you settled for.

      • ZeroGravitas 14 years ago

        If you mean buying a hardware keyboard with dvorak letters, then don't. If you learn to touch type (and there's no point switching to Dvorak if you don't intend to) then you don't need to look at your keyboard, indeed being able to will only teach you bad habits.

        Print a paper copy of the key layout to pin next to your monitor when starting out and use a typing tutor like gtypist to train yourself instead.

        • saintfiends 14 years ago

          I have no problem typing without looking. Problem as I mentioned above, it is difficult to type with my pinky. It is kind of short (Not oddly short, just that I have shorter fingers than average) and it's awkward. That's what slows me down.

          So I'm looking for a layout which makes my other fingers work more.

          • ZeroGravitas 14 years ago

            It sounds like a smaller and/or curved keyboard might be a physical alternative, rather than learn a new layout e.g. the expensive Kinesis contoured devices.

            http://www.kinesis-ergo.com/contoured.htm

            • cytzol 14 years ago

              Funny you mention the Kinesis, which can switch between qwerty and dvorak with a keypress, without you having to consult the OS about it. (great if you can't configure the computer you're using)

      • tomjen3 14 years ago

        I didn't by a physical keyboard as such, any operating system allows you to change the layout and I just did that.

        You can print the layout out on a piece of paper until you have learned them but if you plan on looking at the keyboard when you type you are not going to benefit from a dvorak keyboard.

        If you do deciede to learn it, just know that it is totally worth it - your hands will thank you.

      • sjs 14 years ago

        I wrote the Dvorak letters on my keyboard with a marker and by the time it wore off each key I knew what that key was. It took a couple of weeks.

  • hernan7 14 years ago

    >> Why does my pinky has to do most the work?

    Same thing happens with the saxophone keywork. It seems that pinkies get to handle all the stuff that was not included in the original spec.

    Oh, you want to add a shift key? Let the pinky handle it. A C# key? No problem, we can fit it in the pinky cluster.

tomh- 14 years ago

Lisp: http://dl.dropbox.com/u/2196687/lisp-keystrokes.png

  • snprbob86 14 years ago

    My experience with Lisp is minimal, and I'm a Vim guy, so I may be totally wrong about this, but...

    Doen't nearly all serious Lisp developers use Emacs? And doesn't Emacs have piles of shortcuts for wrapping/unwrapping/manipulating s-expressions? I'd imagine that the resulting number of parens is wildly different than the number originally typed.

    Can any experienced Lispers comment on this? Where can I find a cheat sheet of such shortcuts?

  • michaelcampbell 14 years ago

    The "-" key isn't even lukewarm which seems odd to me given what little I know about lisp.

  • ori_b 14 years ago

    What corpus did you use to generate that?

robert_nsu 14 years ago

I looked at the heatmaps, then looked at my keyboard. The keys with rubbed out labels nearly match his findings 100%. My 'N' isn't (only because the key is slightly larger than my other keys). Other than that, he is spot on.

  • jholman 14 years ago

    Interesting thought about the labels. Mine mostly do not match.

    In fact, on my keyboard, N is the most-obliterated key label, followed by MAD<. Also damaged: GHSECV. I think I can blame ASD on games, and E should be obvious. So why are CVNM< so very heavily worn? My best guess is that the labels are on the upper section of the keys, which is presumably the laziest part of the key for my fingers to reach?

  • jevinskie 14 years ago

    I'm curious, why do you have a large N key?

zyb09 14 years ago

Clearly ObjC programmers are the only ones, who comment their code responsibly.

  • jcizzle 14 years ago

    I imagine this comes from the fact that: 1. ObjC is only used on Apple's platforms (and the less popular GNUStep). 2. You'll have to use Xcode to write for these platforms (less popular alternatives don't add up here). 3. Xcode automatically inserts 7 lines of comments with // at the top of every file. Two files per class, 28 /'s per class.

  • fullmoon 14 years ago

    (or have to ;) )

    • mmariani 14 years ago

      If you code responsibly in ObjC there's no need to write comments. Therefore, if someone "has to" comment ObjC code there's something wrong with their practices.

      The same holds for almost any language, but in ObjC that's just natural.

quellhorst 14 years ago

Would like to see a Dvorak version of this.

5hoom 14 years ago

Interesting to note the difference between C and C++ with regards to the '*' and '&' keys.

I know there is a lot of raw pointer and address usage in C, but I'm surprised at how little these keys show up in C++.

It's good to see that people are taking advantage of smart pointers ;)

(It's subtle though, so I could be reading too much into it).

  • frou_dh 14 years ago

    Seems odd that < and > are said to be used more in C than in C++ (templates).

KirinDave 14 years ago

4 of my haskell files put into heatmap. One of them is an applicative-functor-style use of attoparsec, which tends to have more punctuation than normal haskell code. Even with the frequent use of :'s, $'s and ()'s, the alphanumeric keystrokes dominate the input.

http://fayr.am/9xkE

You can compare this to the Lorum Ipsum text map and see its only slightly different: http://fayr.am/9yk6

I dunno what that means or what sort of value judgements it drives, but it's pretty different from the other heatmaps.

  • jerf 14 years ago

    And on Dvorak, all of the yellow letters except the "R" are on the home row.... (R is on the O.)

jemfinch 14 years ago

This really needs to take into account modifier keys (in particular, shift).

duck 14 years ago

Whitespace hasn’t been taken into consideration (tabs and spaces) which would have been a cool thing to see.

I think if that was included this would be a lot more useful. Is there a reason it wasn't?

  • tadfisher 14 years ago

    Different editors take different amounts of effort to insert whitespace.

    • aangjie 14 years ago

      Still without whitespaces, i have to look at the python heatmap a little differently..

Newky 14 years ago

The javascript image shows limited to no usage on the $ key, Doesn't say a lot for jQuery usage.

  • s00pcan 14 years ago

    In the codebase here we use 'jQuery' instead of '$'.

    • falcolas 14 years ago

      Ditto. If you pull in multiple plugins or additional frameworks (for some reason, we mix YUI with jquery), the $() notation can become broken, and you have to explicitly call jQuery().

      • udp 14 years ago

        Unless you put all your code in a big closure to make a correct $ in function scope:

          (function ($)
          {
          
          }) (jQuery);
  • iks 14 years ago

    The article mentions that "the heat map does miss out on things like shift and caps. ex. in perl with the dollar sign. ($)".

    • jrockway 14 years ago

      Yeah, but the $ is shown as frequently-pressed in the Perl heatmap, so that must not be what he means.

cwp 14 years ago

Here's one for Smalltalk. It's based on my .changes file - about 200K LOC, with all the lines containing '----' and $! removed. What's left is, I think, stuff that actually got typed into a browser.

You can definitely see $:, but otherwise it looks pretty much like English.

bryze 14 years ago

Has anyone done this for programmer Dvorak yet? I guess I'm just looking for validation..

pa7 14 years ago

If anyone cares: I just added the DVORAK keyboard layout to the keyboard heatmap and open sourced the code. Here is the repo URL: https://github.com/pa7/Keyboard-Heatmap

danobeavis 14 years ago

apropos of the brainfuck reference earlier today, here is a brainfuck interpreter, written in brainfuck, visualized through the keyboard heatmap.

http://i.imgur.com/lSDYJ.jpg

swannodette 14 years ago

Just to drive the point home, Clojure's core.clj is 6,500+ lines of Lisp, funny enough, parens do not dominate - http://twitpic.com/6hwt28/full.

  • daniel_solano 14 years ago

    Well, there is a very good reason for this: Clojure tries to reduce the number of parentheses by substituting other characters, namely square brackets []. For example:

        ; Common Lisp
        (defun add (x y) (+ x y))
        ; Scheme
        (define (add x y) (+ x y))
        ; Clojure
        (defn add [x y] (+ x y))
    
    Also, Clojure eliminates some parentheses that are used in other Lisps:

        ; Common Lisp and Scheme
        (cond ((> x 0) 1)
              ((= x 0) 0)
              (t -1))
        ; Clojure
        (cond (> x 0) 1
              (= x 0) 0
              :else -1)
MrVitaliy 14 years ago

The title is misleading. They just extracted character frequencies from source files which fail to capture 'Delete', 'Shift', 'Ctrl', 'Alt', etc keys.

Even has Paul Graham name at the end, as if 'Look, this is totally legit!'

  • bostonpete 14 years ago

    Well, Ctrl and Alt aren't related to the programming language (more the editor or OS) and neither is Delete.

    Shift is, but that could have (should have IMO) been extracted from the character frequencies in the source files...

bfung 14 years ago

Of course there's also the missing shortcut keys. For example, Java projects using an IDE would probably have crtl-space be the most frequently pressed keys (autocomplete).

landhar 14 years ago

The problem with this is that you can't tell the difference between numerals and symbols or even worse between two symbols in the same key (such as '_' and '-').

ori_b 14 years ago

Interesting. It seems strange that Javascript and Ruby seem to use 'r' significantly less than other languages. I have no idea why that would be.

yvdriess 14 years ago

It appears that it just scans source files, not actual key presses. I barely touch the parenthesis keys when programming Lisp for example.

hernan7 14 years ago

Perl programmers do seem to comment a lot... the "#" looks almost as heavily used as the "$", which is mandatory for variables.

saintfiends 14 years ago

In reality though enclosing glyphs will not be very well balanced. Opening brackets will be typed more than closing brackets.

dodo53 14 years ago

vim and emacs would be fun too :oP

4ad 14 years ago

based on my visual observation, apart from lisp, python seems to skew furthest away from average. Its heatmap is much cooler, with less extremes. I wonder why.

doki_pen 14 years ago

I'd love to see a keyboard layout based on data like this.

  • egiva 14 years ago

    It already exists in terms of a more logical layout for faster typing of many latin-based languages. Instead of QUERTY keyboard, it's called the Dvorak layout and is really common in certain circles and places - I believe that in OS X you can switch to this alternative layout pretty easily if you're interested in learning a faster way of typing. More info here: http://atmac.org/dvorak-keyboard-layout-switching

    • ZeroGravitas 14 years ago

      There's also some programmer versions of Dvorak, that make the punctuation more accessible too (by demoting numerals to needing the shift-key to be held):

      http://en.wikipedia.org/wiki/Dvorak_Simplified_Keyboard#Prog...

      Just as Dvorak was based on letter usage, this is based on examining large bodies of code just like the heatmaps in the OP.

      • tomjen3 14 years ago

        It is a bit more than just switching the keys and numbers around - it uses the information from the code it examined to intelligently put the special chars that are used most often were they are easiest to access.

        And yes if you wonder, it is awesome to use.

killion 14 years ago

I bet the backspace key is used the most.

francescolaffi 14 years ago

time for a programmer keyboard layout? "ASERTNIOL" in the middle line would be good for several langs

MicahWedemeyer 14 years ago

The most pressed keys are ⌘ (or CTRL), C, and V.

Kwpolska 14 years ago

Doesn't PHP use a colon at the end of every line? WTF?

  • shabble 14 years ago

    No.

    It uses a semicolon to delimit statements within blocks, but (afaik) placing each statement inside its own <?php ... ?> tag would be valid.

    There's also the alternative syntax[1], which is mostly used for templating these days, which looks like

        <?php if ($foo): ?>
         ... html ...
        <?php endif; ?>
    
    and is probably the only place other than a ternary operator that might conceivably end a line with a colon.

    [1] http://php.net/manual/en/control-structures.alternative-synt...

    • Kwpolska 14 years ago

      but the <?php ?> don't seem to be used too much on this heatmap.

      • Androsynth 14 years ago

        As php code gets more structured, its moving towards MVC frameworks where php tags are used sparingly in the views. (they are a fast way to make spaghetti code)

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection