PHP Parser for RSS Feed From WordPress.com

I’ve been working on my own site a bit since getting out of school.  I’ve been pulling an RSS feed from this blog onto my homepage for a while now, using something based off of this script.  I try to keep the markup valid.  The feed has given me a bit of troubles for valid markup, mainly with escaping ampersands.  One problem was that I was using this function:

function pagFixRSSEncoding($argString){
    return mb_convert_encoding($argString,'HTML-ENTITIES', 'UTF-8');
}

which I had gotten from somewhere. I’m not sure what it does, but it seems to do nothing for me. I removed it, though I’ll reinstate it or look for something else if characters other than ampersands give me trouble.

I was also using this function:

function pagFixLinks($argString){
    return str_replace("&", "&", $argString);
}

to fix the ampersands in the “link”. This simply replaces all ampersands with the proper HTML entity in the passed string, works just fine.

It fails for the “description”, ie the content, because all HTML entities already are escaped in that element. But WordPress.com places an image at the end of the “description” element for stats purposes. It contains ampersands that are not escaped. It also contains the illegal border attribute. After a good bit of time working on my regex and then figuring out how to run a function on the result, I made this function to fix the stats image element so that it can validly be inserted into XHTML:

function pagFixWordpressStats($argString){
    $fncString = $argString;
    $fncString = preg_replace_callback('/(stats.wordpress.com.*?")/i', "pagFixLinksForPregReplaceCallback", $fncString);
    $fncString = str_replace(" border="0"", "", $fncString);
    return $fncString;
}

It fixes the ampersands only for that image link with the “preg_replace_callback” function. The callback is to this wrapper function for the above “pagFixLinks” function:

function pagFixLinksForPregReplaceCallback($argString){
    return pagFixLinks($argString[1]);
}

The “str_replace” line simply removes the invalid border attribute.

I didn’t even notice that callback version of “preg_replace” for a while.  I was missing the “/” at the beginning and end of the pattern parameter as well: not used to the PERL version of regular expressions, I guess.

I hope this helps somebody.  I guess I might as well post the whole parser to be sure:

function pagFixLinks($argString){
    return str_replace("&", "&", $argString);
}

function pagFixLinksForPregReplaceCallback($argString){
    return pagFixLinks($argString[1]);
}

function pagFixWordpressStats($argString){
    $fncString = $argString;
    $fncString = preg_replace_callback('/(stats.wordpress.com.*?")/i', "pagFixLinksForPregReplaceCallback", $fncString);
    $fncString = str_replace(" border="0"", "", $fncString);
    return $fncString;
}

function fncGetRSSFeedAndFormat($argURL){
    // script based from: http://matthom.com/archive/2007/09/18/use-php-to-display-any-rss-feed-on-your-site
    # INSTANTIATE CURL.
    $curl = curl_init();

    # CURL SETTINGS.
    curl_setopt($curl, CURLOPT_URL, $argURL);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 0);

    # GRAB THE XML FILE.
    $xmlFeed = curl_exec($curl);

    curl_close($curl);

    # SET UP XML OBJECT.
    $xmlObjFeed = simplexml_load_string( $xmlFeed );

    $tempCounter = 0;

    $temp = "";
    $max_title_length = 30;

    foreach ( $xmlObjFeed->channel->item as $item )
    {
        $temp .= "n<div class="post">";
        // title
        $temp .= "n<h3><a href="".pagFixLinks($item->link)."">".pagFixLinks($item->title)."</a></h3>";
        $temp .= "n<div class="date">"
            .date('d M Y', strtotime((string) ($item->pubDate)))
            ." <span class="time">@".date('h:i A', strtotime((string) ($item->pubDate)))
            ."</span></div>";
        // post
        $temp .= "n<div class="content">".pagFixWordpressStats($item->description)."</div>";
        $temp .= "n<div class="more"><a href="".pagFixLinks($item->link)."">Read Full Post</a></div>";
        $temp .= "n</div><!-- post -->
";
        $output .= $temp; // append to output
        $temp = "";
    }

    return $output;
}

It is just a function calling a couple other functions. I may objectify it later, but it currently contains all output markup, so it would have to be modified to allow for custom surrounding markup. Hopefully Worpress didn’t muss it up too much.

[Update 10-1/15] Fixed line breaking WordPress wonkiness [/update]

[Update 10-1/16] Fixed ampersand issue [/update]

Published by

Toby

I am a quiet person from Northeast Ohio. I work as a web developer. I like computers, music, and many other things.

Leave a Reply

Your email address will not be published. Required fields are marked *