← Index
NYTProf Performance Profile   « block view • line view • sub view »
For ddd2.pl
  Run on Tue May 25 16:52:24 2010
Reported on Tue May 25 16:56:56 2010

File /project/perl/lib/URI/Escape.pm
Statements Executed 7652
Statement Execution Time 143ms
Subroutines — ordered by exclusive time
Calls P F Exclusive
Time
Inclusive
Time
Subroutine
4611189.3ms120msURI::Escape::::uri_escapeURI::Escape::uri_escape
36881224.7ms24.7msURI::Escape::::CORE:substcontURI::Escape::CORE:substcont (opcode)
4611117.0ms19.3msURI::Escape::::uri_unescapeURI::Escape::uri_unescape
922228.43ms8.43msURI::Escape::::CORE:substURI::Escape::CORE:subst (opcode)
11219µs19µsURI::Escape::::CORE:matchURI::Escape::CORE:match (opcode)
0000s0sURI::Escape::::BEGINURI::Escape::BEGIN
0000s0sURI::Escape::::_fail_hiURI::Escape::_fail_hi
0000s0sURI::Escape::::uri_escape_utf8URI::Escape::uri_escape_utf8
Call graph for these subroutines as a Graphviz dot language file.
Line State
ments
Time
on line
Calls Time
in subs
Code
1#
2# $Id: Escape.pm,v 3.28 2004/11/05 13:58:31 gisle Exp $
3#
4
5package URI::Escape;
63171µs128µsuse strict;
# spent 28µs making 1 call to strict::import
7
8=head1 NAME
9
10URI::Escape - Escape and unescape unsafe characters
11
12=head1 SYNOPSIS
13
14 use URI::Escape;
15 $safe = uri_escape("10% is enough\n");
16 $verysafe = uri_escape("foo", "\0-\377");
17 $str = uri_unescape($safe);
18
19=head1 DESCRIPTION
20
21This module provides functions to escape and unescape URI strings as
22defined by RFC 2396 (and updated by RFC 2732).
23A URI consists of a restricted set of characters,
24denoted as C<uric> in RFC 2396. The restricted set of characters
25consists of digits, letters, and a few graphic symbols chosen from
26those common to most of the character encodings and input facilities
27available to Internet users:
28
29 "A" .. "Z", "a" .. "z", "0" .. "9",
30 ";", "/", "?", ":", "@", "&", "=", "+", "$", ",", "[", "]", # reserved
31 "-", "_", ".", "!", "~", "*", "'", "(", ")"
32
33In addition, any byte (octet) can be represented in a URI by an escape
34sequence: a triplet consisting of the character "%" followed by two
35hexadecimal digits. A byte can also be represented directly by a
36character, using the US-ASCII character for that octet (iff the
37character is part of C<uric>).
38
39Some of the C<uric> characters are I<reserved> for use as delimiters
40or as part of certain URI components. These must be escaped if they are
41to be treated as ordinary data. Read RFC 2396 for further details.
42
43The functions provided (and exported by default) from this module are:
44
45=over 4
46
47=item uri_escape( $string )
48
49=item uri_escape( $string, $unsafe )
50
51Replaces each unsafe character in the $string with the corresponding
52escape sequence and returns the result. The $string argument should
53be a string of bytes. The uri_escape() function will croak if given a
54characters with code above 255. Use uri_escape_utf8() if you know you
55have such chars or/and want chars in the 128 .. 255 range treated as
56UTF-8.
57
58The uri_escape() function takes an optional second argument that
59overrides the set of characters that are to be escaped. The set is
60specified as a string that can be used in a regular expression
61character class (between [ ]). E.g.:
62
63 "\x00-\x1f\x7f-\xff" # all control and hi-bit characters
64 "a-z" # all lower case characters
65 "^A-Za-z" # everything not a letter
66
67The default set of characters to be escaped is all those which are
68I<not> part of the C<uric> character class shown above as well as the
69reserved characters. I.e. the default is:
70
71 "^A-Za-z0-9\-_.!~*'()"
72
73=item uri_escape_utf8( $string )
74
75=item uri_escape_utf8( $string, $unsafe )
76
77Works like uri_escape(), but will encode chars as UTF-8 before
78escaping them. This makes this function able do deal with characters
79with code above 255 in $string. Note that chars in the 128 .. 255
80range will be escaped differently by this function compared to what
81uri_escape() would. For chars in the 0 .. 127 range there is no
82difference.
83
84The call:
85
86 $uri = uri_escape_utf8($string);
87
88will be the same as:
89
90 use Encode qw(encode);
91 $uri = uri_escape(encode("UTF-8", $string));
92
93but will even work for perl-5.6 for chars in the 128 .. 255 range.
94
95Note: Javascript has a function called escape() that produce the
96sequence "%uXXXX" for chars in the 256 .. 65535 range. This function
97has really nothing to do with URI escaping but some folks got confused
98since it "does the right thing" in the 0 .. 255 range. Because of
99this you sometimes see "URIs" with these kind of escapes. The
100JavaScript encodeURI() function is similar to uri_escape_utf8().
101
102=item uri_unescape($string,...)
103
104Returns a string with each %XX sequence replaced with the actual byte
105(octet).
106
107This does the same as:
108
109 $string =~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/eg;
110
111but does not modify the string in-place as this RE would. Using the
112uri_unescape() function instead of the RE might make the code look
113cleaner and is a few characters less to type.
114
115In a simple benchmark test I did,
116calling the function (instead of the inline RE above) if a few chars
117were unescaped was something like 40% slower, and something like 700% slower if none were. If
118you are going to unescape a lot of times it might be a good idea to
119inline the RE.
120
121If the uri_unescape() function is passed multiple strings, then each
122one is returned unescaped.
123
124=back
125
126The module can also export the C<%escapes> hash, which contains the
127mapping from all 256 bytes to the corresponding escape codes. Lookup
128in this hash is faster than evaluating C<sprintf("%%%02X", ord($byte))>
129each time.
130
131=head1 SEE ALSO
132
133L<URI>
134
135
136=head1 COPYRIGHT
137
138Copyright 1995-2004 Gisle Aas.
139
140This program is free software; you can redistribute it and/or modify
141it under the same terms as Perl itself.
142
143=cut
144
145399µs1385µsuse vars qw(@ISA @EXPORT @EXPORT_OK $VERSION);
# spent 385µs making 1 call to vars::import
1463199µs1131µsuse vars qw(%escapes);
# spent 131µs making 1 call to vars::import
147
14816µsrequire Exporter;
149112µs@ISA = qw(Exporter);
15016µs@EXPORT = qw(uri_escape uri_unescape);
15115µs@EXPORT_OK = qw(%escapes uri_escape_utf8);
152158µs119µs$VERSION = sprintf("%d.%02d", q$Revision: 3.28 $ =~ /(\d+)\.(\d+)/);
# spent 19µs making 1 call to URI::Escape::CORE:match
153
1543912µsuse Carp ();
155
156# Build a char->hex map
157113µsfor (0..255) {
1582562.64ms $escapes{chr($_)} = sprintf("%%%02X", $_);
159}
160
16115µsmy %subst; # compiled patternes
162
163sub uri_escape
164
# spent 120ms (89.3+30.8) within URI::Escape::uri_escape which was called 461 times, avg 261µs/call: # 461 times (89.3ms+30.8ms) by WWW::Google::PageRank::get at line 33 of WWW/Google/PageRank.pm, avg 261µs/call
{
165184417.3ms my($text, $patn) = @_;
166 return undef unless defined $text;
16746179.0ms if (defined $patn){
168 unless (exists $subst{$patn}) {
169 # Because we can't compile the regex we fake it with a cached sub
170 (my $tmp = $patn) =~ s,/,\\/,g;
171 eval "\$subst{\$patn} = sub {\$_[0] =~ s/([$tmp])/\$escapes{\$1} || _fail_hi(\$1)/ge; }";
172 Carp::croak("uri_escape: $@") if $@;
173 }
174 &{$subst{$patn}}($text);
175 } else {
176 # Default unsafe characters. RFC 2732 ^(uric - reserved)
177322723.7ms414930.8ms $text =~ s/([^A-Za-z0-9\-_.!~*'()])/$escapes{$1} || _fail_hi($1)/ge;
# spent 24.7ms making 3688 calls to URI::Escape::CORE:substcont, avg 7µs/call # spent 6.08ms making 461 calls to URI::Escape::CORE:subst, avg 13µs/call
178 }
179 $text;
180}
181
182sub _fail_hi {
183 my $chr = shift;
184 Carp::croak(sprintf "Can't escape \\x{%04X}, try uri_escape_utf8() instead", ord($chr));
185}
186
187sub uri_escape_utf8
188{
189 my $text = shift;
190 if ($] < 5.008) {
191 $text =~ s/([^\0-\x7F])/do {my $o = ord($1); sprintf("%c%c", 0xc0 | ($o >> 6), 0x80 | ($o & 0x3f)) }/ge;
192 }
193 else {
194 utf8::encode($text);
195 }
196
197 return uri_escape($text, @_);
198}
199
200sub uri_unescape
201
# spent 19.3ms (17.0+2.35) within URI::Escape::uri_unescape which was called 461 times, avg 42µs/call: # 461 times (17.0ms+2.35ms) by URI::_server::host at line 48 of URI/_server.pm, avg 42µs/call
{
202 # Note from RFC1630: "Sequences which start with a percent sign
203 # but are not followed by two hexadecimal characters are reserved
204 # for future extension"
205184419.0ms my $str = shift;
206 if (@_ && wantarray) {
207 # not executed for the common case of a single argument
208 my @str = ($str, @_); # need to copy
209 foreach (@str) {
210 s/%([0-9A-Fa-f]{2})/chr(hex($1))/eg;
211 }
212 return @str;
213 }
214 $str =~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/eg if defined $str;
# spent 2.35ms making 461 calls to URI::Escape::CORE:subst, avg 5µs/call
215 $str;
216}
217
218131µs1;
# spent 19µs within URI::Escape::CORE:match which was called # once (19µs+0s) by URI::BEGIN at line 152 of URI/Escape.pm
sub URI::Escape::CORE:match; # xsub
# spent 8.43ms within URI::Escape::CORE:subst which was called 922 times, avg 9µs/call: # 461 times (6.08ms+0s) by URI::Escape::uri_escape at line 177 of URI/Escape.pm, avg 13µs/call # 461 times (2.35ms+0s) by URI::Escape::uri_unescape at line 214 of URI/Escape.pm, avg 5µs/call
sub URI::Escape::CORE:subst; # xsub
# spent 24.7ms within URI::Escape::CORE:substcont which was called 3688 times, avg 7µs/call: # 3688 times (24.7ms+0s) by URI::Escape::uri_escape at line 177 of URI/Escape.pm, avg 7µs/call
sub URI::Escape::CORE:substcont; # xsub