Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

A Perl-based Transparent TCP Proxy (TPROXY and POE)

by charlesboyo (Beadle)
on Aug 13, 2011 at 01:22 UTC ( #920112=perlmeditation: print w/ replies, xml ) Need Help??

Following up on my earlier meditation How-to: whip up your own transparent TCP proxy using Perl, Brother davido suggested I leverage POE to solve the multi-threading and other TODOs.

Wow! I has been a very interesting two weeks since then, during which I have learnt how POE works, the Wheels and Components and all such. And now, I have code that pretty much does what I want. Well, I ain't processing the intercepted protocol yet so what follows is a transparent TCP proxy that works for just about anything.

I have borrowed from more places I can remember :) including the POE Cookbook, CPAN and the innards of POE. Thanks to Brother rcaputo and all others that have been involved in POE.

This listens transparently on port 1800 and opens a clear-channel connection to the desired end point while streaming all traffic both ways using paired POE::Wheel::ReadWrite instances in a dedicated POE::Session.

#!/usr/bin/perl use warnings; use strict; use POE qw(Wheel::TProxySocketFactory Wheel::ReadWrite Filter::Stream) +; use Socket qw(unpack_sockaddr_in inet_ntoa); # This machine is the man-in-the-middle (called mitm) # Intercepts connections from unsuspecting hosts (clients) to equally +unsuspecting TCP servers (servers) # Starter session, creates the listening TProxySocketFactory POE::Session->create( inline_states => { _start => \&mitm_start, _stop => \&mitm_stop, _child => \&mitm_child, mitm_error => \&mitm_error, client_connected => \&client_connected, }, args => [ 1800 ], ); sub mitm_start { my ($heap, $session, $port) = @_[HEAP, SESSION, ARG0]; $heap->{sessions} = 0; $heap->{mitm} = POE::Wheel::TProxySocketFactory->new( BindPort => $port, Reuse => 'yes', SuccessEvent => 'client_connected', FailureEvent => 'mitm_error', ); logevent("mitm started on port $port", $session); } sub mitm_stop { my $session = $_[SESSION]; logevent("mitm is stopping",$session); } sub mitm_child { my ($heap, $session, $reason) = @_[HEAP, SESSION, ARG0]; if ($reason eq 'create') { $heap->{sessions}++; } elsif ($reason eq 'lose') { $heap->{sessions}--; logevent("mitm connection closed, " . $heap->{sessions} . " active + sessions", $session); } } sub mitm_error { my ($heap, $session, $op, $errnum, $errstr) = @_[HEAP, SESSION, ARG0 +, ARG1, ARG2]; logevent("mitm $op error $errnum: $errstr", $session); delete $heap->{mitm}; } sub client_connected { my ($heap, $session, $client, $client_addr, $client_port) = @_[HEAP, + SESSION, ARG0, ARG1, ARG2]; # client information is about the source, get the IP address of the +source my $client_host = inet_ntoa($client_addr); # get the address and port of the intended server at the other end o +f this connection my ($server_port, $server_addr) = unpack_sockaddr_in(getsockname($cl +ient)); my $server_host = inet_ntoa($server_addr); logevent("mitm got connection from $client_host:$client_port to $ser +ver_host, " . $heap->{sessions} . " active sessions", $session); # create a new Session to manage this client POE::Session->create( inline_states => { _start => \&setup_client, _stop => \&shutdown_client, server_connect_fail => \&server_connect_fail, server_connected => \&server_connected, client_input => \&client_input, server_input => \&server_input, client_conn_error => \&client_conn_error, server_conn_error => \&server_conn_error, shutdown_rwwheels => \&shutdown_rwwheels, }, args => [ $client, $client_port, $client_host, $server_port, $serv +er_host ], ); } sub setup_client { my ($heap, $session, $client, $client_port, $client_host, $server_po +rt, $server_host) = @_[HEAP, SESSION, ARG0..ARG4]; $heap->{client} = $client; $heap->{client_endpoint} = "$client_host:$client_port"; $heap->{server_endpoint} = "$server_host:$server_port"; $heap->{server_wheel} = POE::Wheel::TProxySocketFactory->new( RemoteAddress => $server_host, RemotePort => $server_port, SuccessEvent => 'server_connected', FailureEvent => 'server_connect_fail', ); } sub shutdown_client { my ($heap, $session) = @_[HEAP, SESSION]; logevent("connection to " . $heap->{server_endpoint} . " terminated" +, $session); } sub server_connected { my ($heap, $session, $server) = @_[HEAP, SESSION, ARG0]; my $client = delete $heap->{client}; $heap->{client_wheel} = POE::Wheel::ReadWrite->new( Handle => $client, Filter => POE::Filter::Stream->new, InputEvent => 'client_input', ErrorEvent => 'client_conn_error', ); $heap->{server_wheel} = POE::Wheel::ReadWrite->new( Handle => $server, Filter => POE::Filter::Stream->new, InputEvent => 'server_input', ErrorEvent => 'server_conn_error', ); logevent("connected to " . $heap->{server_endpoint}, $session); } sub server_connect_fail { my ($heap, $session, $op, $errnum, $errstr) = @_[HEAP, SESSION, ARG0 +, ARG1, ARG2]; logevent("connection to " . $heap->{server_endpoint} . "failed. $op +error $errnum: $errstr", $session); delete $heap->{client}; delete $heap->{server_wheel}; } sub client_input { my ($heap, $input) = @_[HEAP, ARG0]; $heap->{server_wheel}->put($input) if $heap->{server_wheel}; } sub server_input { my ($heap, $input) = @_[HEAP, ARG0]; $heap->{client_wheel}->put($input) if $heap->{client_wheel}; } sub client_conn_error { my ($kernel, $heap, $session, $op, $errnum, $errstr) = @_[KERNEL, HE +AP, SESSION, ARG0, ARG1, ARG2]; if ($op eq 'read' and $errnum == 0) { logevent('client disconnected', $session); } else { logevent("connection from " . $heap->{client_endpoint} . " failed. + $op error $errnum: $errstr", $session); } $kernel->yield("shutdown_rwwheels"); } sub server_conn_error { my ($kernel, $heap, $session, $op, $errnum, $errstr) = @_[KERNEL, HE +AP, SESSION, ARG0, ARG1, ARG2]; if ($op eq 'read' and $errnum == 0) { logevent('server disconnected', $session); } else { logevent("connection to " . $heap->{server_endpoint} . " failed. $ +op error $errnum: $errstr", $session); } $kernel->yield("shutdown_rwwheels"); } sub shutdown_rwwheels { my ($heap) = $_[HEAP]; my $client = $heap->{client_wheel}; my $server = $heap->{server_wheel}; $client->shutdown_input(); $server->shutdown_input(); $server->flush if $server->get_driver_out_octets(); delete $heap->{server_wheel}; $client->flush() if $client->get_driver_out_octets(); delete $heap->{client_wheel}; } sub logmsg { print "[$$] @_ at ", scalar localtime, "\n" } sub logevent { my ($state, $session, $arg) = @_; my $id = $session->ID(); print scalar localtime; print " session $id $state "; print ": $arg" if (defined $arg); print "\n"; } $poe_kernel->run(); exit 0;

I found out the hard way about Perl's inheritance transferring only functions. A simple TProxySocketFactory isa SocketFactory and a drop-in replacement for the constructor failed woefully.

So I had to modify a copy of SocketFactory.pm. Here is the diff of my TProxySocketFactory as compared with SocketFactory 1.312:

1c1 < package POE::Wheel::TProxySocketFactory; --- > package POE::Wheel::SocketFactory; 6c6 < $VERSION = '0.901'; # NOTE - Should be #.### (three decimal places) --- > $VERSION = '1.312'; # NOTE - Should be #.### (three decimal places) 46,50d45 < # Define a couple of constants not in Socket.pm < # Note that this may be invalid outside Linux < sub SOL_IP () { 0 } < sub IP_TRANSPARENT () { 19 } < 680,689d674 < # charlesboyo: Make the socket transparent by setting the IP_TRANS +PARENT socket option < unless (setsockopt($socket_handle, SOL_IP, IP_TRANSPARENT, 1)) { < $poe_kernel->yield( < $event_failure, < 'set_ip_transparent', $!+0, $!, $self->[MY_UNIQUE_ID] < ); < return $self; < } < DEBUG && warn "set_ip_transparent"; <

Transparent proxying requires a supported kernel - I'm using the Linux kernel 2.6.30 with TPROXY support and iptables v1.4.3. Set-up shell script requires that that the incoming and outgoing interfaces are enslaved to a Linux bridge interface. This diverts tcp/80 passing across the bridge to the listening TProxy:

#!/bin/bash INT="eth2" EXT="eth1" ip rule add fwmark 1 lookup 100 ip -f inet route add local 0.0.0.0/0 dev lo table 100 iptables -F iptables -t mangle -F iptables -t mangle -N DIVERT iptables -t mangle -A DIVERT -j MARK --set-mark 1 iptables -t mangle -A DIVERT -j ACCEPT iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT iptables -t mangle -A PREROUTING -p tcp -m tcp --dport 80 -j TPROXY -- +on-port 1800 --on-ip 0.0.0.0 --tproxy-mark 0x1/0x1 ebtables -t broute -F ebtables -t broute -A BROUTING -p IPv4 -i $INT --ip-proto tcp --ip-dpo +rt 80 -j redirect --redirect-target DROP ebtables -t broute -A BROUTING -p IPv4 -i $EXT --ip-proto tcp --ip-spo +rt 80 -j redirect --redirect-target DROP

With this, I have all but completed the task I set out to do.

Thanks to everyone for all the code and knowledge out there. And thanks for reading this thread. Enjoy.

Comment on A Perl-based Transparent TCP Proxy (TPROXY and POE)
Select or Download Code
Re: A Perl-based Transparent TCP Proxy (TPROXY and POE)
by armstd (Friar) on Aug 13, 2011 at 02:45 UTC

    Nice work.

    Even better for posting it and following up from before.

    --Dave

Re: A Perl-based Transparent TCP Proxy (TPROXY and POE)
by Tanktalus (Canon) on Aug 13, 2011 at 14:21 UTC

    All those events whereby control is passed from one function to another indirectly ... I see why BrowserUk is always advocating real threads, and why tye advocates Coro.

    Each of your "threads" of execution (represented, it seems, by a POE::Session) seem completely independent, which means you could theoretically get away with forking off each proxy instead of using event-based programming. With proper COW support in the kernel (which I suspect has been there for over a decade), this should be relatively cheap. The only downside is if in later stages of processing you decide you need additional modules loaded - you'll be loading them every time you need them unless you use them prior to the fork.

    However, if you use threads (whether that's threads or Coro), you should be able to load them the same as your initial foray - whenever you want. So now, the downsides go to threads - like forking, any logging will need to lock your logfile, just to keep lines from being interleaved. That's minor, because I believe Log::Log4perl already can do that. The second point is that if you create too many OS threads, you can overwhelm the system. By that I don't mean that the system will crash because you have too many threads, I expect both Linux and Windows to handle thousands of threads. What I mean is that your proxy may start taking up so much of the CPU by virtue of having threads with equal weighting that other processes may get starved. However, there are two mitigating factors to this in my mind: The first is that your threads should be fairly inactive - your CPU probably can handle all the proxying of your maxed out uplink while leaving plenty of room for doing other activities (though compiling kde may be impacted :-> ). The second mitigating factor here is that current kernels (2.6.38+, I believe) have a new scheduling method which will switch not just between threads, but between sessions, which will result in non-proxy threads being scheduled between proxy threads more often, reducing that starvation (but that could theoretically impact your proxy, if it weren't for the first mitigating factor :-) ).

    Coro is just a single kernel thread, much like your POE solution, but ever so slightly less taxing on the kernel (fewer states need to be saved off) than pure threads, and, IME, much less taxing on the programmer.

    With my CB proxy (nee fetcher), I use Coro to schedule everything, and AnyEvent::Socket::tcp_server to start the servers (one socket-based, the other tcp-based). When a connection comes in, I spawn a Coro thread (well, actually, three - one for input on the socket, one for listening to the CB fetcher that is in the same process, and one for serialising the output back across the socket). Each thread otherwise looks like "normal" non-event-based perl programming. For example, the code to listen for input on the socket looks like this: while (defined $fh and defined (my $cmd = $fh->readline())) { ... } - the same as you would do normally. I don't need to use IO::Select, or set up multiple functions with events. I don't have to store intermediate data on a heap - my lexical stack already works fine for that, so perl (Coro) takes care of that.

    What I don't know with Coro is how many threads it will handle without going nuts, i.e., slower than other solutions. I recall reading somewhere that it can be much much faster than using threads, even on a multi-core system, and I think this is one of those areas where it does better, so I expect that more threads will work fine here.

Re: A Perl-based Transparent TCP Proxy (TPROXY and POE)
by Anonymous Monk on Aug 15, 2011 at 11:41 UTC
    One argument in favor of POE is that packets actually come in rather infrequently, probably one-at-a-time on just one "pipe," and although there might be not-so-many packets, there might be thousands of simultaneous conversations to keep track of. Hence, the argument against having a commensurate number of (almost always always-)idle threads. There is no religion here; just a counter-argument.
Re: A Perl-based Transparent TCP Proxy (TPROXY and POE)
by sundialsvc4 (Monsignor) on Aug 15, 2011 at 19:34 UTC

    POE is basically a Finite-State Machine engine.   It works well in this case because it describes each conversation using a relatively small storage record (a “session”).   The maximum number of incoming packets is determined by the number of ports, and perhaps really by the number of network interfaces, that are being listened to.   The actual CPU-driven response to each packet is so trivial as to be almost non-existent.   A system like POE, unlike processor threads, puts all of that status information on one great big electronic “tote board,” namely the current state of all those sessions.   Everyone can see everything.   Certainly, POE itself can.   Adjustments can be made easily.   There’s no duplicated resources and no complex timing-holes.   At any instant in time, every conversation, except the one for the most recently-received packet, has nothing for the CPU to actually do on its behalf.

    In a very large dining room full of many hundreds of people, how many employees are there?   Perhaps only a few dozen.   When someone comes in and sits down at a table, a new order-ticket will eventually be created, but not an entire new set of employees.   What works well when you go out to eat, also works in a TCP proxy.   Imagine the chaos that would occur in a restaurant if there was a box by the door full of little boxes, each labeled “instant employee, just add water,” and that is how you got your dinner.   (The employee pops into existence, takes your order, wades into the kitchen with the hundreds of others, dukes it out fighting to get access to the grill and so-on, eventually comes back with your food, then falls dead.)   If you instantly (and instinctively) know such a thing could never work in a restaurant (even if it were possible), you also know it would not work in other settings for which it is, technically, “possible.”

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://920112]
Approved by GrandFather
Front-paged by Tanktalus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2014-08-30 03:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (291 votes), past polls