Convert rendered menu lists to array

Public

Had a problem where I wanted to read body content / HTML in that's in well formed bulleted listings of links. I needed to convert this into an array for further processing, almost site map style. Below is a sample structure.

Get raw version
html5
  1. <ul>
  2. <li>level 1</li>
  3. <li><a href="stuff.md">things</a>
  4. <ul>
  5. <li>level 2</li>
  6. <li>
  7. level 2 tricky
  8. <ul>
  9. <li>level 3</li>
  10. <li>things</li>
  11. <li>neato</li>
  12. <li>stuff and things</li>
  13. </ul>
  14. </li>
  15. <li>neato</li>
  16. <li>stuff and things</li>
  17. </ul>
  18. </li>
  19. <li>neato</li>
  20. <li>stuff and things</li>
  21. </ul>

PHP code to walk through the above and through lots of preg matching we can convert it into an array. This cheats a bit by converting HTML to XML then converting simple XML into json, then into an array. Mouthful but it works and is very few lines to accomplish it :)

Get raw version
php
  1. $data = '<ul>
  2. <li>level 1</li>
  3. <li><a href="stuff.md">things</a>
  4. <ul>
  5. <li>level 2</li>
  6. <li>
  7. level 2 tricky
  8. <ul>
  9. <li>level 3</li>
  10. <li>things</li>
  11. <li>neato</li>
  12. <li>stuff and things</li>
  13. </ul>
  14. </li>
  15. <li>neato</li>
  16. <li>stuff and things</li>
  17. </ul>
  18. </li>
  19. <li>neato</li>
  20. <li>stuff and things</li>
  21. </ul>';
  22. // strip white space and end lines
  23. $data = trim(str_replace(array(' ', "\r\n", "\r", "\n", '<ul>', '</ul>', '<li>', '</li>'), array('', '', '', '', '<children>', '</children>', '<nodes>', '</nodes>'), $data));
  24. // preg match li's that have nested structure to make them valid XML
  25. $data = preg_replace('/<nodes>([^<>]*)<children>/s', '<nodes><title>${1}</title><link>null</link><children>', $data);
  26. // match links and convert to easier to parse format for XML walk
  27. $data = preg_replace('/<a href="(.*?)">(.*?)<\/a>/s', '<title>${2}</title><link>${1}</link>', $data);
  28. // make empty bottom title items without children to match our previous method
  29. $data = preg_replace('/<nodes>([^<>]*)<\/nodes>/s', '<nodes><title>${1}</title><link>null</link></nodes>', $data);
  30. // cheat to make this xml
  31. $xml = simplexml_load_string($data, 'SimpleXMLElement');
  32. // cheat to convert simpleXML object into a nested array
  33. $ary = json_decode(json_encode((array) simplexml_load_string($data)), 1);