<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Optimising strlen()</title>
	<atom:link href="http://www.stdlib.net/~colmmacc/2009/03/01/optimising-strlen/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.stdlib.net/~colmmacc/2009/03/01/optimising-strlen/</link>
	<description>An Irishman's Fiery</description>
	<lastBuildDate>Tue, 17 May 2011 19:12:51 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
	<item>
		<title>By: inmarket</title>
		<link>http://www.stdlib.net/~colmmacc/2009/03/01/optimising-strlen/comment-page-1/#comment-298194</link>
		<dc:creator>inmarket</dc:creator>
		<pubDate>Thu, 18 Jun 2009 01:12:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.stdlib.net/~colmmacc/?p=270#comment-298194</guid>
		<description>Actually - on some architectures method 1 may be the most efficient. The C compiler could automatically loop unroll (like method 3). The compiler may produce better code from method 1 than method 2 if it supports a double base indirect memory test instruction. The memory cache may pre-read a long word even on byte access and may support multiple byte instructions simultaneously (a common feature on modern processors) thus giving you all the optimisations of the later methods without the code overheads. Remember longer code also means less efficient instruction cache use.
The point of all this is that any optization is COMPLETELY processor dependant and even supposedly obvious optimisations may in-fact slow the routine. The only way to tell is to profile based on a thourough understanding of the processor and memory achitecture.</description>
		<content:encoded><![CDATA[<p>Actually &#8211; on some architectures method 1 may be the most efficient. The C compiler could automatically loop unroll (like method 3). The compiler may produce better code from method 1 than method 2 if it supports a double base indirect memory test instruction. The memory cache may pre-read a long word even on byte access and may support multiple byte instructions simultaneously (a common feature on modern processors) thus giving you all the optimisations of the later methods without the code overheads. Remember longer code also means less efficient instruction cache use.<br />
The point of all this is that any optization is COMPLETELY processor dependant and even supposedly obvious optimisations may in-fact slow the routine. The only way to tell is to profile based on a thourough understanding of the processor and memory achitecture.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul Jakma</title>
		<link>http://www.stdlib.net/~colmmacc/2009/03/01/optimising-strlen/comment-page-1/#comment-289382</link>
		<dc:creator>Paul Jakma</dc:creator>
		<pubDate>Mon, 18 May 2009 09:41:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.stdlib.net/~colmmacc/?p=270#comment-289382</guid>
		<description>colmmac/wolf: Yes, I know about pages and mmap(). They (or analogues) might be very common, but they&#039;re not part of the C spec, and there definitely are machines without. Anyway, it doesn&#039;t matter because that&#039;s not what the code is generally relying on. It&#039;s relying on the longword access to never cause a fault.

So is that assumption safe? Obviously it is if memory access faults are on longword-aligned, page-boundaries. But do machines exist with sub-longword fault or memory-mapping granularity? (E.g. could you have RAM mapped at the bottom of a longword and, say, an I/O port at the top?).

Basically, my question is, I get the impression this code relies on common machine-properties for correctness - not on any guarantees made by C. Am I correct in thinking that?</description>
		<content:encoded><![CDATA[<p>colmmac/wolf: Yes, I know about pages and mmap(). They (or analogues) might be very common, but they&#8217;re not part of the C spec, and there definitely are machines without. Anyway, it doesn&#8217;t matter because that&#8217;s not what the code is generally relying on. It&#8217;s relying on the longword access to never cause a fault.</p>
<p>So is that assumption safe? Obviously it is if memory access faults are on longword-aligned, page-boundaries. But do machines exist with sub-longword fault or memory-mapping granularity? (E.g. could you have RAM mapped at the bottom of a longword and, say, an I/O port at the top?).</p>
<p>Basically, my question is, I get the impression this code relies on common machine-properties for correctness &#8211; not on any guarantees made by C. Am I correct in thinking that?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Leonid Volnitsky</title>
		<link>http://www.stdlib.net/~colmmacc/2009/03/01/optimising-strlen/comment-page-1/#comment-262769</link>
		<dc:creator>Leonid Volnitsky</dc:creator>
		<pubDate>Thu, 05 Mar 2009 08:30:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.stdlib.net/~colmmacc/?p=270#comment-262769</guid>
		<description>My some what similar travails in optimizing static sized array: 
http://volnitsky.com/project/lvvlib/array.html</description>
		<content:encoded><![CDATA[<p>My some what similar travails in optimizing static sized array:<br />
<a href="http://volnitsky.com/project/lvvlib/array.html" rel="nofollow">http://volnitsky.com/project/lvvlib/array.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: wolf550e</title>
		<link>http://www.stdlib.net/~colmmacc/2009/03/01/optimising-strlen/comment-page-1/#comment-262739</link>
		<dc:creator>wolf550e</dc:creator>
		<pubDate>Thu, 05 Mar 2009 05:56:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.stdlib.net/~colmmacc/?p=270#comment-262739</guid>
		<description>@paulj: the error you get from reading past the end of a string (or any array) is a page fault. Reading past the array inside the page is completely &quot;kosher&quot; - the processor and the OS can&#039;t help you if you read (or overwrite) a different variable than the one you intended. Only when you are trying to touch a page that&#039;s not yours will you trap (You won&#039;t get an error if the next page is also yours). Anyway, since the error can only happen on page boundary (that&#039;s how the mmu works), and that is usually not less than 4096 bytes (sometimes much more!), while reading a properly null terminated string in word size chunks you will be completely ok. Barring off-by-one bugs, of course.</description>
		<content:encoded><![CDATA[<p>@paulj: the error you get from reading past the end of a string (or any array) is a page fault. Reading past the array inside the page is completely &#8220;kosher&#8221; &#8211; the processor and the OS can&#8217;t help you if you read (or overwrite) a different variable than the one you intended. Only when you are trying to touch a page that&#8217;s not yours will you trap (You won&#8217;t get an error if the next page is also yours). Anyway, since the error can only happen on page boundary (that&#8217;s how the mmu works), and that is usually not less than 4096 bytes (sometimes much more!), while reading a properly null terminated string in word size chunks you will be completely ok. Barring off-by-one bugs, of course.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: j</title>
		<link>http://www.stdlib.net/~colmmacc/2009/03/01/optimising-strlen/comment-page-1/#comment-262723</link>
		<dc:creator>j</dc:creator>
		<pubDate>Thu, 05 Mar 2009 04:47:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.stdlib.net/~colmmacc/?p=270#comment-262723</guid>
		<description>&gt; You can’t drag and drop that kind of improvement in an IDE.

But you can get it for free if you use modern libraries or a modern language!</description>
		<content:encoded><![CDATA[<p>&gt; You can’t drag and drop that kind of improvement in an IDE.</p>
<p>But you can get it for free if you use modern libraries or a modern language!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: colmmacc</title>
		<link>http://www.stdlib.net/~colmmacc/2009/03/01/optimising-strlen/comment-page-1/#comment-262651</link>
		<dc:creator>colmmacc</dc:creator>
		<pubDate>Wed, 04 Mar 2009 23:25:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.stdlib.net/~colmmacc/?p=270#comment-262651</guid>
		<description>@paulj  - both pages from mmap should be aligned along word boundaries, and memory on the stack, so it seems safe. If the zero is completely absent then we might get a protection fault, but that&#039;s fair enough. I can&#039;t see any other magic :/

@barryluk true.</description>
		<content:encoded><![CDATA[<p>@paulj  &#8211; both pages from mmap should be aligned along word boundaries, and memory on the stack, so it seems safe. If the zero is completely absent then we might get a protection fault, but that&#8217;s fair enough. I can&#8217;t see any other magic :/</p>
<p>@barryluk true.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin</title>
		<link>http://www.stdlib.net/~colmmacc/2009/03/01/optimising-strlen/comment-page-1/#comment-262649</link>
		<dc:creator>Justin</dc:creator>
		<pubDate>Wed, 04 Mar 2009 23:13:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.stdlib.net/~colmmacc/?p=270#comment-262649</guid>
		<description>Method 3 reminds me an awful lot of  Duff&#039;s device, which is even more fun 
http://en.wikipedia.org/wiki/Duff%27s_device</description>
		<content:encoded><![CDATA[<p>Method 3 reminds me an awful lot of  Duff&#8217;s device, which is even more fun<br />
<a href="http://en.wikipedia.org/wiki/Duff%27s_device" rel="nofollow">http://en.wikipedia.org/wiki/Duff%27s_device</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: baryluk</title>
		<link>http://www.stdlib.net/~colmmacc/2009/03/01/optimising-strlen/comment-page-1/#comment-262638</link>
		<dc:creator>baryluk</dc:creator>
		<pubDate>Wed, 04 Mar 2009 22:11:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.stdlib.net/~colmmacc/?p=270#comment-262638</guid>
		<description>@colmmacc, Actually it&#039;s not tail recurisve.

Proper version:

size_t strlen(const char * str)
    strlen(str, 0);

size_t strlen(const char * str, size_t a)
    if (!*str)
        return a;
    else
        return strlen(++str,a+1);</description>
		<content:encoded><![CDATA[<p>@colmmacc, Actually it&#8217;s not tail recurisve.</p>
<p>Proper version:</p>
<p>size_t strlen(const char * str)<br />
    strlen(str, 0);</p>
<p>size_t strlen(const char * str, size_t a)<br />
    if (!*str)<br />
        return a;<br />
    else<br />
        return strlen(++str,a+1);</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: josh</title>
		<link>http://www.stdlib.net/~colmmacc/2009/03/01/optimising-strlen/comment-page-1/#comment-262629</link>
		<dc:creator>josh</dc:creator>
		<pubDate>Wed, 04 Mar 2009 20:45:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.stdlib.net/~colmmacc/?p=270#comment-262629</guid>
		<description>Eh, my preferred optimization for strlen is to stop calling it so much.  Other library functions tend to return a pointer to the beginning of the string, which you already have, rather than the end of the string, even though they typically already have that internally.  If you don&#039;t throw that data away, you don&#039;t have to recalculate it.</description>
		<content:encoded><![CDATA[<p>Eh, my preferred optimization for strlen is to stop calling it so much.  Other library functions tend to return a pointer to the beginning of the string, which you already have, rather than the end of the string, even though they typically already have that internally.  If you don&#8217;t throw that data away, you don&#8217;t have to recalculate it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Keith Gaughan</title>
		<link>http://www.stdlib.net/~colmmacc/2009/03/01/optimising-strlen/comment-page-1/#comment-262627</link>
		<dc:creator>Keith Gaughan</dc:creator>
		<pubDate>Wed, 04 Mar 2009 20:38:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.stdlib.net/~colmmacc/?p=270#comment-262627</guid>
		<description>&lt;blockquote&gt;
More tests will interfere with pipelines, but there will be less jumping.
&lt;/blockquote&gt;

That&#039;s pretty much dependent on the machine architecture, mind. For instance, processors, such as the ARM, with predicated instructions don&#039;t have this problem, so it&#039;s hard to squeeze more performance out of a strlen() implementation than this variant of method 3:


        ; Entry:
        ; r0 - String pointer
        ; Exit:
        ; r0 - Length of the string.
        ; r1, r2 corrupted.
        MOV   r1, r0
.loop   LDR   r2, [r0]
        TST   r2, #&amp;000000FF
        ADDNE r0, r0, #1
        TSTNE r2, #&amp;0000FF00
        ADDNE r0, r0, #1
        TSTNE r2, #&amp;00FF0000
        ADDNE r0, r0, #1
        TSTNE r2, #&amp;FF000000
        ADDNE r0, r0, #1
        BNE   loop
        RSB   r0, r1, r0
        MOV   pc, lr


Mind though, I haven&#039;t written any ARM assembly in some nine years or so, so caveats apply, and that obviously won&#039;t work on a big-endian ARM, and it assumes the string is word-aligned.</description>
		<content:encoded><![CDATA[<blockquote><p>
More tests will interfere with pipelines, but there will be less jumping.
</p></blockquote>
<p>That&#8217;s pretty much dependent on the machine architecture, mind. For instance, processors, such as the ARM, with predicated instructions don&#8217;t have this problem, so it&#8217;s hard to squeeze more performance out of a strlen() implementation than this variant of method 3:</p>
<p>        ; Entry:<br />
        ; r0 &#8211; String pointer<br />
        ; Exit:<br />
        ; r0 &#8211; Length of the string.<br />
        ; r1, r2 corrupted.<br />
        MOV   r1, r0<br />
.loop   LDR   r2, [r0]<br />
        TST   r2, #&amp;000000FF<br />
        ADDNE r0, r0, #1<br />
        TSTNE r2, #&amp;0000FF00<br />
        ADDNE r0, r0, #1<br />
        TSTNE r2, #&amp;00FF0000<br />
        ADDNE r0, r0, #1<br />
        TSTNE r2, #&amp;FF000000<br />
        ADDNE r0, r0, #1<br />
        BNE   loop<br />
        RSB   r0, r1, r0<br />
        MOV   pc, lr</p>
<p>Mind though, I haven&#8217;t written any ARM assembly in some nine years or so, so caveats apply, and that obviously won&#8217;t work on a big-endian ARM, and it assumes the string is word-aligned.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

