3
\$\begingroup\$

I worked on a hybrid mobile application for a newspaper company. They had a Wordpress website with news content (in a wp_posts table) which they wanted to be displayed in the application. I built the hybrid application with HTML, CSS, JavaScript, JQuery mobile and packaged with Phonegap build.

I used php to make a connection with their Wordpress database and retrieve the news data (i.e the post_content field of the wp_posts table) to be displayed in the client side application. I retrieved the data with ajax on the client side and inserted it to the DOM with JQuery. However the post_content data contained HTML tags and Script tags etc. It also contained Wordpress specific tags like [caption] and [soundcloud] so there were lots of display problems.

So I decided to filter the content (with php) before sending it back to the client side (and inserting to the DOM) so it could be displayed properly.

I'd really like for you to view my code (in relation to filtering the [caption] tags) and see if you think I filtered it in an efficient way or not (e.g Could I have achieve the same thing with Regex?). I'll firstly give you more information on the process I went through.

I firstly had to analyse the content of the post_content field in the wp_posts table. On studying the data I found that the [caption] tags followed the same pattern. More specifically, Inside the [caption] tag, an <img> tag appears firstly and then photo caption text. Here's is an example of how it appears in the post_content field.

   [caption id="attachment_14014" align="alignleft" width="599"]
            <img class="size-full wp-image-14014" 
            src="https://www.example.com/wp-content/uploads/Joe-Soap.jpg"       
            alt="By Joe Soap" width="599" height="593" /> 
            Photo courtesy of dp98.
   [/caption]

I wanted to convert this (whenever it occurred within the post_content data) to the following HTML so I could target the elements and style with CSS.

 <div class="img_wrapper">
        <img class="size-full wp-image-14014" src="https://www.example.com/wp-content/uploads/Joe-Soap.jpg"         
        alt="By Joe Soap" width="599" height="593" /> 
        <span class="photo_caption_text">Photo courtesy of dp98.</span>
    </div>

So I wrote the following function. Do you think their was an easier/more efficient way to achieve the result I wanted? e.g with Regex

PHP function:

 function filterPostContent($post_content){
    //This function takes in the post_content field (of a wordpress wp_posts table) as a parameter, 
    //filters the content and returns the filtered content.
    //Firstly we create an array from the post_content at every space character 
    //so we can then inspect each element of the array
    //to see if it is a [caption] tag 
    //split the content into an array at every space.
    $post_content_array = preg_split("/[\s]+/", $post_content);
    $post_content_filtered = array();

    //initialize booleans to false
    $caption_tag_element = false; 
    $img_tag_element = false;
    $caption_text = false;

    //iterate through the array 
    for($x = 0; $x < count($post_content_array); $x++){

        if(strpos($post_content_array[$x], '[caption') !== false){
            //check if the element in the array is a caption tag
            //we have found a caption tag so set the boolean to true
            $caption_tag_element = true;
            //do not push this element onto the array.

        }else if($caption_tag_element == true){
            //check if the caption_tag_element is true which means we have come across a [caption tag
            //so now look for <img string or /> string 
            if(strpos($post_content_array[$x], '<img') !== false){
                //if <img string is found then push the start of an image tag to the post content array therefore not including the characters before.
                array_push($post_content_filtered, ' <img ');
                $img_tag_element = true;

            }else if($img_tag_element == true){
                //therefore this element is still part of the img tag element
                //check if we have come to the end of the img tag element
                if(strpos($post_content_array[$x], '/>') !== false){
                    //we have come to the end of the img tag element so therefore
                    //we must be at the point where the photo caption text begins
                    //so we want to add in a class so that we can style this text with CSS on the client side.
                    //close the img tag with /> and add a span tag with a class of photo_caption_text
                    $new_element = '/><span class="photo_caption_text">';
                    array_push($post_content_filtered, $new_element);
                    //we have finished adding the img tag so set the boolean to false
                    $img_tag_element = false;
                    //set this to true so we know to look out for [/caption] in the next iterations of the loop
                    $caption_text = true;
                }else{

                    //we have not come to the end of the img tag element yet so just push it to the array normally 
                    array_push($post_content_filtered, $post_content_array[$x]);
                }

            }else if($caption_text == true){
                //we have come across the caption text itself so now look for [/caption] tag within the element
                //if we do find [/caption] we must replace it with a </span> tag in order to close the <span> tag 
                //we have just created for styling the caption text

                if(strpos($post_content_array[$x], '[/caption]') !== false){
                    //found the end caption tag so replace it with a span tag
                    $element = $post_content_array[$x];
                    $new_element = str_replace("[/caption]", "</span>", $element);

                    array_push($post_content_filtered, $new_element);
                    //now we know we have come to the end of the photo caption text so set both booleans to false
                    $caption_text = false;
                    $caption_tag_element = false;

                }else{
                    //[/caption] has not been found yet meaning the current element of the $post_content_array 
                    //array is a part of the photo caption text itself.
                    array_push($post_content_filtered, $post_content_array[$x]);
                }

            }

        }else{
            //the current element of post_content_array does not contain the string: [caption 
            //also caption_tag_element is false meaning we havent come across a [caption] tag within the post_content
            //so just push the element to the new array of filtered content
            array_push($post_content_filtered, $post_content_array[$x]);
        }   

    }   //end for loop

    //return the filtered content.
    return $post_content_filtered;
}

CSS styling

On the client side, I was then able to target the photo_caption_text class as follows:

.photo_caption_text{
font-size: 1.0rem;
display: block;
padding: 3px 0px 4px 0px;
text-align: center;
color: #606060;
font-style: italic;
}

Before and after the content is filtered.

enter image description here

\$\endgroup\$
4
  • \$\begingroup\$ it's horrible nesting of if else statements \$\endgroup\$ Commented Aug 21, 2017 at 11:09
  • 1
    \$\begingroup\$ Don't parse HTML with REGEX \$\endgroup\$ Commented Aug 21, 2017 at 19:17
  • 1
    \$\begingroup\$ also, wordpress has an app for just this ppurpose.. you are completely re-inventing the wheel.. why?\ \$\endgroup\$ Commented Aug 21, 2017 at 19:18
  • \$\begingroup\$ @Iwrestledabearonce. I understand your point of view. However there were other cases where I needed to detect for instance youtube iframes within post_content and add ?enablejsapi=1 onto the src attribute of the iframe so that I could then target youtube players on the client side with JavaScript and stop them programmatically.... Does the wordpress app parse iframes like that? \$\endgroup\$ Commented Sep 4, 2017 at 9:46

1 Answer 1

2
\$\begingroup\$

This seems an odd design choice considering you are working in a domain-specific tagging language.

Why not have Wordpress code transform the post to HTML for you as it already does for your regular site? One of the unfortunate parts of the Wordpress architecture is that it does not cleanly decouple your data from your display concerns (as you are finding out). Since the Wordpress app "owns" translating the posts in the DB into HTML, why would't Wordpress still own this for your mobile view (perhaps via different template, even one that just contains HTML fragments for asynchronous delivery to mobile app). This will likely lead to less breakage as changes in your Wordpress site are made, as you would rely on the same view-generation logic.

If you are truly trying to decouple your mobile app into a service, than perhaps you can work with the cached HTML that is generated by Wordpress vs. the actual database post entry? At least in this manner, you could work with DOM manipulation tools effectively to transform the HTML document (which is really your main problem here as your "caption" is not formatted as HTML).

Even if you decide to forego these alternate approaches, I would say that your could go with a more optimized regex approach assuming that:

  • The [caption ...] and [/caption] items in the posts are truly regularly formed and reliable for use with regex.
  • Your caption text is located before the end tag predictably.

In such a case I would go directly to a replacement using a regular expression. Perhaps something like:

$regex = '#\[caption.*\].*(<img.*/>)(.*)\[/caption\]#mU';
$replace_function = function($matches) {
    $template = <<<'EOT'
<div class="img_wrapper">
    %s
    <span class="photo_caption_text">%s</span>
</div>
'EOT';
    return sprintf($template, trim($matches[1]), trim($matches[2]);
};
$replaced_content = preg_replace_callback($regex, $replace_function, $post_content);

This eliminates should simplify things greatly as opposed to your approach of using regex to break apart the string and then iterating over every single string, evaluating it.


Now, with regards to your code as is:

  • Consider foreach($post_content_array as $word) instead of for(...) for this main loop. I don't see where you are doing any operations here which require you to refer to the specific index value for the string being evaluated.
  • It seems odd to me that you would pass in a string to this function, yet return an array. I would think this function should return the reassembled string such that the caller doesn't have to understand the inner workings of the function.
  • Consider inverting your conditionals and/or adding continue to de-nest large sections of your code.

For example:

if(strpos($post_content_array[$x], '[caption') !== false){
    $caption_tag_element = true;
    continue;
}
if($caption_tag_element === false){
    array_push($post_content_filtered, $post_content_array[$x]);
    continue;
}
// rest of code, now without nesting 

These sorts of "quick exits" can make your code much easier to read and less prone to bugs due to fewer codepaths. There are rarely reasons to have an else condition (like when one of two side-effects need to occur), so try to design them away when possible.

  • You should default to using strict logical comparisons as opposed to their loose counterparts. Very seldom in code do you need the flexibility to evaluate a condition in a type-inspecific manner. Try to use exact comparisons by default to make your code less error prone against unexpected truthy/falsey behaviors.
  • You have WAY too many comments. Let your code speak for itself.
\$\endgroup\$
1
  • \$\begingroup\$ Thanks for the advice. Well I wasn't the original developer of the wordpress site and I hadnt learnt wordpress before. This was a college project and I didnt have a chance to research the Wordpress app side of things (as this was only 1/4 of the whole app I built). The rest wasn't built on wordpress. Good idea about grabbing the wordpress HTML instead of from the database thanks, however I was given very specific designs to re-create from designers which were completely different to how the stories appeared on their current website. Thanks for your advice on my code as it it. Very helpful. \$\endgroup\$ Commented Aug 22, 2017 at 10:50

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.