tag:blogger.com,1999:blog-7094652.post4476802273259237059..comments2024-03-23T14:36:09.980+00:00Comments on Neil Mitchell's Blog (Haskell etc): 47% faster than GHC*Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.comBlogger15125tag:blogger.com,1999:blog-7094652.post-76351999959635862952007-05-29T11:54:00.000+01:002007-05-29T11:54:00.000+01:00Alex: The code is not very good, and doesn't work ...Alex: The code is not very good, and doesn't work much... I'll blog a fix shortly.<BR/><BR/>Anon: Thanks for the further benchmarks, very useful!Neil Mitchellhttps://www.blogger.com/profile/13084722756124486154noreply@blogger.comtag:blogger.com,1999:blog-7094652.post-39726840846105786642007-05-20T18:25:00.000+01:002007-05-20T18:25:00.000+01:00The optimization is as good as it gets on this exa...The optimization is as good as it gets on this example, since the code hangs on the standard C libarary's getchar() function, and uses almost all time in that function. Any improvement on this benchmark will require to rewrite the C code or the Haskell runtime to use other functions. I rewrote the C code using a buffer and using memory maped ID (mmap). Rewriting the Haskell runtime requires a bit more than I'm willing to spend here though. I have a Core2 duo, and tested on a 1GB file (32 MiB is too small to get a realistic test)<BR/><BR/>The results are as follows:<BR/>"getchar()": 14.3s<BR/>buffered: 5.5s<BR/>mmaped: 2.1s<BR/><BR/>The standard C library is in other words quite slow, and this is quite typical in many OSes, and most nontrivial C applications would have used buffered or mmaped IO. It is still quite impressive, since you basicly have managed to get the program IO-bound, in the sense that it hangs on the runtime.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7094652.post-72665759644952376702007-05-20T03:07:00.000+01:002007-05-20T03:07:00.000+01:00Hi Neil, isn't the output of your code (modified t...Hi Neil, isn't the output of your code (modified to fix the other mistakes mentioned) 0 for the first sample input you provide, "text", with no newline? It seems that you want to start your count at 1. (But then you have to check for a completely empty file, too.Alexhttps://www.blogger.com/profile/14845444560806521663noreply@blogger.comtag:blogger.com,1999:blog-7094652.post-9595484854216225062007-05-20T02:03:00.000+01:002007-05-20T02:03:00.000+01:00Ludoa: eek, I've got that wrong - yes, that if wil...Ludoa: eek, I've got that wrong - yes, that if will always be false. It doesn't effect the benchmark I was running, but I'll fix it up.Neil Mitchellhttps://www.blogger.com/profile/13084722756124486154noreply@blogger.comtag:blogger.com,1999:blog-7094652.post-65433904493325746822007-05-20T00:59:00.000+01:002007-05-20T00:59:00.000+01:00Gwenhwyfaer: apparently the mistake was the missin...Gwenhwyfaer: apparently the mistake was the missing parentheses.<BR/>So I'm still wondering what the last 'if' does, just like anon in comment #7. Neil, if you could clear that up for us, that'd be great :-)LudoAhttps://www.blogger.com/profile/08204938378565880774noreply@blogger.comtag:blogger.com,1999:blog-7094652.post-20457714715345407672007-05-19T20:59:00.000+01:002007-05-19T20:59:00.000+01:00fridim: You could use a buffer for either, and it ...fridim: You could use a buffer for either, and it would speed both up by similar amounts. By using no buffering you make the test a more consistent basis.<BR/><BR/>-- NeilAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-7094652.post-80582329703264943232007-05-19T20:30:00.000+01:002007-05-19T20:30:00.000+01:00That is not enouth good for C. You take one char a...That is not enouth good for C. You take one char after another. You should use a buffer (of 2048 for example). It would be very faster, and faster than GHC's one.fridimhttps://www.blogger.com/profile/04430115851774902265noreply@blogger.comtag:blogger.com,1999:blog-7094652.post-57849383180624721922007-05-19T17:53:00.000+01:002007-05-19T17:53:00.000+01:00Ah, it seems on my system (GCC 4.1.2/Glibc 2.5), g...Ah, it seems on my system (GCC 4.1.2/Glibc 2.5), getchar() is a regular function call. You could insert "#undef getchar" after the #includes to make sure.shaurzhttps://www.blogger.com/profile/03588038254545671774noreply@blogger.comtag:blogger.com,1999:blog-7094652.post-12638775080628840822007-05-19T16:13:00.000+01:002007-05-19T16:13:00.000+01:00I'm also curious about under what circumstances th...I'm also curious about under what circumstances the last if will be true.<BR/><BR/>When I compile the first version with -Wall I do get a warning showing what's wrong.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7094652.post-43645463417285479652007-05-19T15:51:00.000+01:002007-05-19T15:51:00.000+01:00It is not entirely fair on Haskell since the C hea...It is not entirely fair on Haskell since the C header files normally define getchar as a macro which is expanded inline, but also export it as a regular function which GHC is calling.shaurzhttps://www.blogger.com/profile/03588038254545671774noreply@blogger.comtag:blogger.com,1999:blog-7094652.post-22798380021245566032007-05-19T15:48:00.000+01:002007-05-19T15:48:00.000+01:00Anon1: It tends to reduce code size a bit, in the ...Anon1: It tends to reduce code size a bit, in the examples I've tried. Code size can only properly be measured on large programs though, so I'll wait a while before giving any results on that.<BR/><BR/>Anon2: Correct, that is the mistake - I missed some brackets. Purely because its been a while since I used C full-time, and had forgotten the priority rules for operators. The lack of decent types in C means I missed it, but caught it the very first test run.Neil Mitchellhttps://www.blogger.com/profile/13084722756124486154noreply@blogger.comtag:blogger.com,1999:blog-7094652.post-78184591945923393632007-05-19T15:44:00.000+01:002007-05-19T15:44:00.000+01:00Ludo, I do believe that's the "accidental mistake"...Ludo, I do believe that's the "accidental mistake" Neil mentioned.gwenhwyfaerhttps://www.blogger.com/profile/03775254923855147509noreply@blogger.comtag:blogger.com,1999:blog-7094652.post-1045386142218925512007-05-19T14:23:00.000+01:002007-05-19T14:23:00.000+01:00I'm far from a C expert, but I don't totally get t...I'm far from a C expert, but I don't totally get this part:<BR/><BR/> while (last = getchar() != EOF) {<BR/> if (last == '\n')<BR/> i++;<BR/> }<BR/> if (last == '\n')<BR/> i--;<BR/><BR/>How can that last 'if' ever be true, since it'll only get to that part of the code when the while condition is false, and the while condition is only false if 'last' is EOF. Or is it somehow possible for 'last' to be both \n AND EOF at the same time?<BR/>I'm sure it's a stupid question, but I'm curious about it.<BR/><BR/>Thanks!<BR/>LudoLudoAhttps://www.blogger.com/profile/08204938378565880774noreply@blogger.comtag:blogger.com,1999:blog-7094652.post-8055049672390795402007-05-19T10:07:00.000+01:002007-05-19T10:07:00.000+01:00#include "stdio.h"int main(){ int i = 0; int last ...#include "stdio.h"<BR/><BR/>int main()<BR/>{<BR/> int i = 0;<BR/> int last = 0;<BR/> while ((last = getchar()) != EOF) {<BR/> if (last == '\n')<BR/> i++;<BR/> }<BR/> if (last == '\n')<BR/> i--;<BR/> printf("%i\n", i);<BR/> return 0;<BR/>}Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7094652.post-11409784877170710882007-05-18T20:15:00.000+01:002007-05-18T20:15:00.000+01:00Very nice!How do your optimizations affect code si...Very nice!<BR/><BR/>How do your optimizations affect code size?Anonymousnoreply@blogger.com