Read a Rss Feed with a URL in Golang

Edit

Published on November 11, 2022

Author: Meet Rajesh Gor

#go

Reding Rss Feed

We can use golang's encoding/xml package to read a Rss feed. Though we have to be speicific of what type of structure the Rss feed has, so it is not dynamic but it works really well with structs. I have covered a few nuances of reading XML file in the config file reading post of the 100 days of golang series.

Get request to Rss feed

We first need to send a GET request to the Rss feed, we can use the http package to grab the response.


go
package main

import (
	"log"
	"net/http"
)

func main() {

	url := "https://meetgor.com/rss.xml"
	response, err := http.Get(url)

	if err != nil {
		log.Fatal(err)
	}

    log.Println(response.Body)
	defer response.Body.Close()

}

So, in the above example, we have used the net/http package to send a GET request with the Get funciton. The function takes in a string as a URL and returns either the object as response or an error. If there arose any error, we simply exit out of the program and log the error. If the error is nil, we return the response in the response variable. This builds up a good foundation for the next step to read the response body and fetching the actual bytes from the response.

Fetch the content from the Link

Since we have the response object, we can use the io.ReadAll function to read the bytes in the response body. The function takes in the Reader object in this case it is ReadCloser object as a http object. The function then returns the slice of bytes/int8. The slice then can be interpreted as string or other form data that can be used for parsing the xml from the response.


go
package main

import (
	"log"
	"net/http"
    "io"
)

func main() {

	url := "https://meetgor.com/rss.xml"
	response, err := http.Get(url)

	if err != nil {
		log.Fatal(err)
	}

	data, err := io.ReadAll(response.Body)

	if err != nil {
		log.Fatal(err)
	}

    log.Println(string(data))
    log.Printf("Type -> %T", data)
}

<rss>
    <channel>
        <item>
        ...
        ...
        ...
        </item>
    </channel>
</rss>


Type -> []uint8 

So, we can see that the parsed content is indeed xml, it is type casted to string from the slice of bytes. This can be further used for the parsing the text as Rss structure and fetch the required details.

Parsing Rss with a struct

We can now move into creating a struct for individual tags required in the parsing.


go
package main

import (
    "encoding/xml"
	"io"
	"log"
	"net/http"
)

type Rss struct {
	XMLName xml.Name `xml:"rss"`
	Channel Channel  `xml:"channel"`
}

type Channel struct {
	XMLName     xml.Name `xml:"channel"`
	Title       string   `xml:"title"`
	Description string   `xml:"description"`
	Item        []Item   `xml:"item"`
}

type Item struct {
	XMLName xml.Name `xml:"item"`
	Title   string   `xml:"title"`
	Link    string   `xml:"link"`
}

func main() {

	url := "https://meetgor.com/rss.xml"
	response, err := http.Get(url)

	if err != nil {
		log.Fatal(err)
	}

	data, err := io.ReadAll(response.Body)

	if err != nil {
		log.Fatal(err)
	}

    log.Println(string(data))
}

If you would look at the rss feed, you can see it has a structure of tags and elements. The rss tag is the root tag, followed by channel and other types of nested tags speicific for the type of information to be stored like title for the title in the feed, link for the link to the feed, etc.

So, we create those as structure, the root structure is the Rss which we will create with a few attributes like Channel and the name of the current tag. In the Rss case the name of the tag/element is rss, so it is given the xml.Name as xml:'rss' in backticks indicating the type hint for the field. The next field is the Channel which is another type(custom type struct). We have defined Channel as a struct just after it that will hold information like the title, description of the website. We also have the xml.Name as xml:"channel" which indicates the current struct is representation of channel tag in the rss feed. Finally, we also have a custom type struct as Item. The Item struct has a few attributes like Title, Link and you can now start to see the pattern, you can customize it as per your requirements and speicifications.


go
package main

import (
    "encoding/xml"
	"io"
	"log"
	"net/http"
)

type Rss struct {
	XMLName xml.Name `xml:"rss"`
	Channel Channel  `xml:"channel"`
}

type Channel struct {
	XMLName     xml.Name `xml:"channel"`
	Title       string   `xml:"title"`
	Description string   `xml:"description"`
	Item        []Item   `xml:"item"`
}

type Item struct {
	XMLName xml.Name `xml:"item"`
	Title   string   `xml:"title"`
	Link    string   `xml:"link"`
}

func main() {

	url := "https://meetgor.com/rss.xml"
	response, err := http.Get(url)

	if err != nil {
		log.Fatal(err)
	}

	data, err := io.ReadAll(response.Body)

	if err != nil {
		log.Fatal(err)
	}

    // New Code

	d := Rss{}
	err = xml.Unmarshal(data, &d)

	if err != nil {
		log.Fatal(err)
	}

	for _, item := range d.Channel.Item {
		log.Println(item.Title)
	}
}

$ go run main.go

Why and How to make and use Vim as a text editor and customizable IDE
Setting up Vim for Python
Setting up Vim for BASH Scripting
Vim: Keymapping Guide
...
...
...
Django + HTMX CRUD application
PGCLI: Postgres from the terminal
Golang: Closures
Golang: Interfaces
Golang: Error Handling
Golang: Paths
Golang: File Reading
Golang: JSON YAML TOML (config) File Reading.

So, here we have initialized the Rss struct as empty and then used the Unmarshal method in the xml package. The Unmarshal method will parse the data as per the type of either int, float, bool or string, any other type of data will be discarded as interface or struct. We can usually parse any valid type of data into Unmarshal method and it generally gives a proper expected outcome.

The Unmarshal method takes in the slice of byte and the second paramter as pointer to a struct or any variable that will store the parsed xml content from the slice of byte. The function just returns the error type, either nil in case of no errors, and returns the actual error obejct if there arise any type of error.

So we parse the data which is a slice of byte to the funciton and the reference to the d object which is a empty Rss object. This will get us the data in the d object. We can then iterate over the object as per the struct and use the perform operations like type casting or converting types, etc to get your required data back.

In the above example, we simply iterate over the d.Channel.Item which is a list of elements of tag item in the rss feed. Inside the for loop, we can access the object and simply print or perform any sort of operations. I have simply printed the list of articles with titles.

Links for the code available on the 100 days of golang GitHub repository.

So, that's how we parse an XML feed in golang. Just plug and play if you have a similar type of structure of the Rss XML feed. Happy Coding :)

<a class='prev' href='/sqlite-inline-custom-separator'>

    <svg width="50px" height="50px" viewbox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
        <path d="M13.5 8.25L9.75 12L13.5 15.75" stroke="var(--prevnext-color-angle)" stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round"> </path>
    </svg>
    <div class='prevnext-text'>
        <p class='prevnext-subtitle'>prev</p>
        <p class='prevnext-title'>SQLite importing CSV with custom separator</p>
    </div>
</a>

<a class='next' href='/python-search-replace-file'>

    <div class='prevnext-text'>
        <p class='prevnext-subtitle'>next</p>
        <p class='prevnext-title'>Python: Search and Replace in File</p>
    </div>
    <svg width="50px" height="50px" viewbox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
        <path d="M10.5 15.75L14.25 12L10.5 8.25" stroke="var(--prevnext-color-angle)" stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round"></path>
    </svg>
</a>