Help with debugging custom modules with Habitat's httpd

We run a inbound proxy service using mod_proxy coupled with an in-house traffic regulator built around a custom module. This is one of the first services we’re targeting to run with habitat.

This opens some interesting questions, which I think will be applicable to the community, but I’d also appreciate any input from people with relevant or parallel experience.

We’re using a fork of core/httpd, which is built with Exception Hook, and no developer modules (i.e. all rather than reallyall).

What we have so far is a package which builds the module with apxs, and a ‘wrapper’ package called pxy which has as its dependencies our own httpd, and our built module.

In order to start httpd, pxy has httpd and the custom module as dependencies, and in addition to the stock config from core/httpd, loads the custom module, and enables exception hook.

Some already resolved gotchas for reference:

  • when using a dependent service (like http) it’s necessary to copy the hooks over into the ‘wrapping’ cookbook
  • when referring to paths in the httpd config, it’s necessary to use the pkgPathFor helper

The problem I have now is that the service segfaults when I start it, and I get precisely no information about how or why.

Additionally, when attempting to start httpd using the hab user and group (as is done in core/httpd) I get permissions errors:

pxy.default(SV): Starting service as user=hab, group=hab
pxy.default(O): (13)Permission denied: AH00072: make_sock: could not bind to address [::]:80
pxy.default(O): (13)Permission denied: AH00072: make_sock: could not bind to address
pxy.default(O): no listening sockets available, shutting down
pxy.default(O): AH00015: Unable to open logs

Referring to I tried setting these to root, to see if httpd would drop privileges, but now I get:

pxy.default(SV): Starting service as user=hab, group=hab
pxy.default(O): AH00526: Syntax error on line 127 of /hab/svc/pxy/config/httpd.conf:
pxy.default(O): Error:\tApache has not been designed to serve pages while\n\trunning as root.  There are known race conditions that\n\twill allow any local user to read any file on the system.\n\tIf you still desire to serve pages as root
 then\n\tadd -DBIG_SECURITY_HOLE to the CFLAGS env variable\n\tand then rebuild the server.\n\tIt is strongly suggested that you instead modify the User\n\tdirective in your httpd.conf file to list a non-root\n\tuser.\n
hab-launch(SV): Child for service 'pxy.default' with PID 16733 exited with code exit code: 1

So, a couple of issues:

  1. Does anyone have any help / experience / suggestions in general with debugging a segfaulting httpd / custom module? How to get some diagnostics, specifically with respect to habitat? Under normal circumstances, I might try to use gdb.

  2. Any idea of the best approach to resolve the user/group issue?

@sns WRT the permission issue - I bet that is because you are starting the service on a privileged port (80). IIRC you can start apache as root to bind on 80, then set the user/group in your apache config to {{pkg.svc_user}} and {{pkg.svc_group}} respectively.

I’m pretty sure that’s already handled in the core/httpd plan. If I simply start core/httpd, httpd runs, listens on port 80, and drops privs as it should.

If I started our own httpd, with the same config as core/httpd, other than enabling ExceptionHook, httpd runs, listens on port 80, and drops privs as it should.

When I invoke the pxy service, which calls httpd, the wrong things happen…

After debugging with @sns a bit we discovered that the wrapper plan had its hab_pkg_user set to “hab” which caused the port binding issue. When we left off we only had the segfault issue left

So to summarise:

There were two different issues going on - firstly, the ‘wrapper’ plan had set pkg_svc_user and pkg_svc_group to hab - fixing that meant that httpd would start properly without the module loaded.

To debug the segfault I needed to:

  • enable coredumps in the httpd config
  • rebuild httpd and my module with a custom strip() so debug symbols could be seen
  • start the service, and capture a coredump
  • analyse the coredump with gdb

The stacktrace showed a failed assignment, which on further investigation showed itself to be down to mod_rewrite not being loaded before loading the custom module.

Once loaded, the custom module seems to be working.

1 Like