Monday, September 27, 2010

org/owasp/esapi/codecs/HTMLEntityCodec.java (is it really correct ?)

org/owasp/esapi/codecs/HTMLEntityCodec.java

After reply from jwilliams, I have almost apologized for arogancy...
but...I gues jwilliams is the same person, that coded this:
org/owasp/esapi/codecs/HTMLEntityCodec.java
@author Jeff Williams

I have spent some more time to tune my implementation,
and to figure out what the OWASps HTMLEntityCodec does,
since the jwilliams comment did not match what I have saw in his code:

I have tested against ESAPI-2.0-rc6
the primitive code

return ESAPI.encoder().encodeForHTML(in);


This is what gets encoded (all green are "encoded somehow")


and please see HTML source code
for what is produced in markup !

So basically what you have posted as comment
is SOMETHING ELSE as YOUR code does:
Your post:
1) Encoding characters > 255 isn't useful, barring games with the character set.
2) There is no security problem with rendering named entities, although ESAPI uses hex entities to help performance.
3) Nobody is immune to charset switching
4) It's dangerous to remove characters entirely, you should replace with u+FFFD

What I think:
1) but you ARE encoding > 255, and incorrectly.
You ARE using ALSO, NAMED entities, not hex ! (and for huge ranges),
you print out standalone surrogates as hex which is error and violation of SGML def of HTML,
and correct surrage pairs are encoded as two hex codes instead of one hex int
2) esapi uses NAMED as well for wide range of chars !
3) no comment yet waiting for explanation
4) You are NOT using u+FFFD but whitespace " "

So please if I'm wrong "again", correct me but I do not want to waste ANY more time with
fixing external api,
I will stick to my fixed code,
and I warn the others to make their own versions as well (or use better code than OWASPs RI).

Please update DOCS to make things clear, if this is intended encoding and
OWASP considers this as safe, I will skip the libs just by reading docs, and
save some time on communication and testings.

Final Note ?

I hope this can open eyes a bit:


public static void main(String[] args) {
EscapeUtils2 eu=new EscapeUtils2();
String s1="<>abc123+-";
String s2=new String(new int[]{0xdc00,65823,65839,65855},0,4);
out.println(eu.escapeHtmlFull(s1+s2));
out.println(ESAPI.encoder().encodeForHTML(s1+s2));
}

a.in.the.k:
&#60;&#62;abc123+-&#xFFFD;&#65823;&#65839;&#65855;

with ESAPI gou get:
&lt;&gt;abc123&#x2b;-&#xdc00;&#xd800;&#xdd1f;&#xd800;&#xdd2f;&#xd800;&#xdd3f;

extra encoded + sign
&#xdc00 entity outputed (should not apear in HTML !!! by specification)
and 3 more &#xd800 entities (should not apear in HTML !!! by specification),
which happens because SMP should be encode(codePoint) not encode(char)+encode(char).


+ with my code you get all strange data situations logged:
27.9.2010 13:35:43 EscapeUtils2 log
WARNING: UNUSED DESCSET: codePoint=56320 at index:10
27.9.2010 13:35:43 EscapeUtils2 log
WARNING: SMP: codePoint=65823 at index:11
27.9.2010 13:35:43 EscapeUtils2 log
WARNING: SMP: codePoint=65839 at index:13
27.9.2010 13:35:43 EscapeUtils2 log
WARNING: SMP: codePoint=65855 at index:15



I do not expect that my data will contain SMPs or unpaired surrogates. But at least if they do, I produce correct markup.

Thursday, September 23, 2010

OWASP has deleted How_to_perform_HTML_entity_encoding_in_Java

http://www.owasp.org/index.php?title=How_to_perform_HTML_entity_encoding_in_Java

I have fixed this "naive article" back in spring 2009
an it contained my proposal for "HTML encoding".

Week ago I have discovered mistake in my code:
2 chars which I should exclude from output
where not excluded and outputed as encoded.

I wanted to update the alg on the web and surprise:
HTML Entity Encoding is not enough to stop XSS in web applications. Please see

XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet for more information.

So let's see what is the OWASPS update ?
Article named: XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet.

Why Can't I Just HTML Entity Encode Untrusted Data?
HTML entity encoding is okay for untrusted data that you put in the body of the HTML document, such as inside a div tag. It
even sort of works for untrusted data that goes into attributes, particularly if you're religious about using quotes around
your attributes. But HTML entity encoding doesn't work if you're putting untrusted data inside a script tag anywhere, or an
event handler attribute like onmouseover, or inside CSS, or in a URL. So even if you use an HTML entity encoding method
everywhere, you are still most likely vulnerable to XSS. You MUST use the escape syntax for the part of the HTML document
you're putting untrusted data into. That's what the rules below are all about.
Ok it covers more in one place, excelent....
introduces "terms" like "HTML Escape" or "Attribute Escape"....
and no surprise it is strong propagation of
ESAPI and ESAPI reference implementation.

BEWARE

Check code here:
http://code.google.com/p/owasp-esapi-java/source/browse/trunk/src/main/java/org/owasp/esapi/codecs/HTMLEntityCodec.java

and
latest version of mine "pseudo-code"
still kept inside owasps wiki history.

compare and decide .....

Mine works for "Supplementary Multilingual Plane"
uses only
Numeric character references not Character entity references.

and it's imune to client charset swithing..

Probably we will here more about ESAPI, since they "amuse and scare me" more and more every day....
-------------
BUG FIX: Two extra chars to remove are 0b 0c. (swich ifs or add extra if else line). Sorry....

string.replace with function benchmarks


function escapeRegExp(s) {
return s.replace(/([-.*+?^${}()|[\]\/\\])/g, '\\$1');
};
function escapeRegExp_asFunction(s) {
return s.replace(/([-.*+?^${}()|[\]\/\\])/g, function(ch) {
return "\\" + ch;
});
};

Test case:

var loops = 10000,
s1 = "abcdefgjklmnoprstuvxyz",
s2 = "-.*+?^${}()|[]/\\",
s3 = "a-a.a*a+a?a^a$a{a}a(a)a|a[a]a/a\\a",
testStrings = [s1, s2, s3];

MSIE 7.0 results

escapeRegExp
78:abcdefgjklmnoprstuvxyz
110:\-\.\*\+\?\^\$\{\}\(\)\|\[\]\/\\
125:a\-a\.a\*a\+a\?a\^a\$a\{a\}a\(a\)a\|a\[a\]a\/a\\a

escapeRegExp_asFunction
110:abcdefgjklmnoprstuvxyz
797:\-\.\*\+\?\^\$\{\}\(\)\|\[\]\/\\
797:a\-a\.a\*a\+a\?a\^a\$a\{a\}a\(a\)a\|a\[a\]a\/a\\a


Using function as second parameter in replace is at least:

796/125 ~= 6 times slower than the first one on MSIE a
173/48 ~= 3 times slower on FF
77/61 ~= same speed on Safari !!!

on test string with half matched chars.

Tuesday, September 14, 2010

isArray, optimized ?

isArray optimized ?

The world has almost agreed that this is
correct check for Array in JavaScript.

var toString = Object.prototype.toString,
isArray: function(obj) {
return toString.call(obj) === "[object Array]";
};

Similar checks are used for Date or Numbers.
For those who may not know why, read this excelent explanation by kangax:

instanceof considered harmful (or how to write a robust isArray)


I have been curios about performance:

100 000 loops !!! times in mils compare native naive checks and the correct one.

MSIE 7.0
a instanceof Array 63:true
a.constructor === Array 62:true
toString.call(a) == [object Array] 235:true

Firefox/3.5.12
a instanceof Array 12:true
a.constructor === Array 41:true
toString.call(a) == [object Array] 51:true

Safari/533.18.5
a instanceof Array 2:true
a.constructor === Array 5:true
toString.call(a) == [object Array] 41:true

MSIE 7.0 is the slowest and penalty is >4 all browsers.

Again this is of course nothing in absolute numbers,
since we are talking about 100000 loops here !

Speedup for false checks

The penalty and measured times are "the same"
even if you pass null or undefined inside.

Since I use this method
often on attribute normalization
at the begining of my functions,
and many times the checked attribute is optional
(null, undefined or even "")

I propose small speed up with this code:

var toString = Object.prototype.toString,
isArray: function(obj) {
return (object!=null && toString.call(obj) === "[object Array]");
};

Yes the "evel twin" is intentional.
or even with

var toString = Object.prototype.toString,
isArray: function(obj) {
return (!!object && toString.call(obj) === "[object Array]");
};

Of course you pay some extra penalty for this (20ms/100000 loops on MSIE) in positive checks,
but it drops down false checks to almost no cost (31ms/100000 loops).
BTW do we need === to compare strings ?
Benchmarks:

arrLit toString.call(a) == [object Array] 219:true:[object Array]
arrLit optimized toString.call(a) 234:true:[object Array]

null toString.call(a) == [object Array] 203:false:[object Object]
null optimized toString.call(a) 31:false:[object Object]

You see the extra price on positive call 219 vs 234 and spped up 203 vs 31!
on null input

Of course you can still write the code outside of isArray
in each code where it makes sence
but I like it inside - optimized.

I see no sence to let null be converted to
[object Window] (on FF) and compared as string with [Object Array].

Tuesday, September 7, 2010

VS 2008, ASP.NET Development Server, .xslt vs .xsl filename extension

MS VS 2008, .xslt vs .xsl filename extension

Add New Item "Wizard" generates by default .xslt extension. The file is then served by magic cassini (ASP.NET Development Server) when testing locally.

All works fine until you try to load xslt file with XMLHttpRequest.
Cassini sends incorrect Content-Type:

HTTP/1.1 200 OK
Server: ASP.NET Development Server/9.0.0.0
Date: Tue, 07 Sep 2010 13:02:40 GMT
X-AspNet-Version: 2.0.50727
Cache-Control: private
Content-Type: application/octet-stream
Content-Length: 347
Connection: Close

and of course XHR (correctly) fails to provide xhr.responseXml property.

Questions:
How can you configure ASP.NET Development Server or .NET web application to serve "correct Content-Type"?
What is the "correct Content-Type" anyway ?
Will be XHR capable to read this "correct Content-Type" (XB of course)?

Thanx for one nice default
inconsistent with another one.

Because of using
twisted version of XHR
and MSIE 7.0 it was hard to spot.

Solution ?

Use .xsl instead of .xslt.
ASP.NET Development Server servres them with text/xml Content-Type which seems to work ;-)

Thursday, September 2, 2010

JSLints and evil twins == (again)

This time I have used JSLint to recheck and possibly
polish some of my older sourcecodes:

As expected JSLint gave me
several messages

Problem at line 3 character 5: Expected '===' and instead saw '=='.

I decided to turn it off with /*jslint eqeqeq: false*/
and surpise... only SOME of the messages disapeared.

Try this:

1./*jslint eqeqeq: true*/
2.if(a==undefined){}
3.if(a==null){}
4. if(a==b){}


result (expected):

Error:
Problem at line 2 character 5: Expected '===' and instead saw '=='.
if(a==undefined){}
Problem at line 3 character 5: Expected '===' and instead saw '=='.
if(a==null){}
Problem at line 4 character 5: Expected '===' and instead saw '=='.
if(a==b){}
Implied global: a 2,3,4, b 4

all 3 lines reported as error.

Now try to turn it off:

/*jslint eqeqeq: false*/
if(a==undefined){}
if(a==null){}
if(a==b){}

Only the last line is not reported any more, first two are still considered harmfull:

Error:
Problem at line 2 character 5: Use '===' to compare with 'undefined'.
if(a==undefined){}
Problem at line 3 character 5: Use '===' to compare with 'null'.
if(a==null){}
Implied global: a 2,3,4, b 4


Why do I care ?

From habit (maybe very wrong one)
I use construction:

function(arg1,arg2,arg3)
if(arg3 == null)
or
if(arg3 != null)

bacause IMHO this is valid:

null==null
true
undefined==null
true
null!=null
false
undefined!=null
false

To simplyfy ifing for both undefined and null values
Otherwise I would have to write:

if(arg3 === null || typeof arg3 !== "undefined")
or
if(arg3 === null || arg3 === undefined)

with second option I use known trick,
"elimination of evil global undefined"
BTW: also used by jQuery:
(function( window, undefined ) {

})(window);

Trying this

/*jslint eqeqeq: false*/
(function(undefined) {
if(a==undefined){}
}());

with JSLint you get:

Error:
Problem at line 2 character 11: Expected an identifier and instead saw 'undefined' (a reserved word).
(function(undefined) {
Problem at line 3 character 9: Use '===' to compare with 'undefined'.
if(a==undefined){}
Implied global: a 3

Solution ?

Can anyone tell me how to turn off JSLint ==
In the way it ignores all not only some constructions ?

Can anyone tell me how to make effective and simple if
which returns true only for "null and undefined" and false for all other values ?
Suggestion by Mr.D on his page
If you only care that a value is truthy or falsy, then use the short form. Instead of
(foo != 0)
just say
(foo)
is not and option because we are latking about specified or valid (0, false) and uspecified or invalid (null,undefined) here...

Or shell I rewrite all my ifs from if(a!=null) into strange looking:

/*jslint eqeqeq: false*/
(function(undef) {
if(a!=undef){}
}());



Thanx in advance....

Try also my favorite blogger at:
http://webreflection.blogspot.com/search?q=JSLint

Update: 2011/06/07

It seems that current version Edition 2011-07-01
http://www.jslint.com/
works fine for all 3 cases and works correctly for null and undefined as well:

1./*jslint eqeqeq: true*/
2.if(a==undefined){}
3.if(a==null){}
4. if(a==b){}