Commit a3e41148 authored by Aurelien's avatar Aurelien
Browse files

Merge branch '181-new-job-web-broken-link-checker' into 'latest'

Resolve "[New Job] - Web - Broken link checker"

Closes #181

See merge request r2devops/hub!92
parents db3ad154 3e5e0893
Loading
Loading
Loading
Loading
+98 −0
Original line number Diff line number Diff line
# 🔗 Links Checker

## Description

Using this job you will be able to detect most (see [here](#types-of-link-verified)) broken links in your **Markdown** or **HTML** files.

It uses the tool [`Liche`](https://github.com/raviqqe/liche){:target="_blank"} in [Go](https://golang.org/){:target="_blank"} 
to test and find the links in your documents. 
In its default state, this job will analyze your whole project for eligible files to verify.

!!! warning
    This job may generate a lot of errors about local broken links in your document if you are using **absolute paths** or **rewriting URLs**. 
    See [Absolute paths and rewriting URLs](#absolute-paths-and-rewriting-urls)

## How to use it

1. Have `.md`, `.html` or `.htm` files in your project
2. Add the corresponding URL to your `.gitlab-ci.yml` file (see [Getting
   started](/getting-started)). Example:

    ```yaml
    include:
      - remote: 'https://jobs.r2devops.io/links_checker.yml'
    ```

3. If you need to customize the job (stage, variables, ...) 👉 check the [jobs
   customization](/use-the-hub/#jobs-customization)
4. Well done, your job is ready to work ! 😀

## Job details

* Job name: `links_checker`
* Docker image:
[`peterevans/liche:1.1.1`](https://hub.docker.com/r/peterevans/liche){:target="_blank"}
* Default stage: `static_tests`
* When: `always`

### Variables

| Name | Description | Default |
| ---- | ----------- | ------- |
| `LICHE_DIRECTORY` <img width=450/> | Path to the directory to be scanned | ` ` |
| `LICHE_FILES` | A list of files (separated with spaces) to scan. It can be used with `LICHE_DIRECTORY` | ` ` |
| `LICHE_EXCLUDE` | A [regular expression](https://en.wikipedia.org/wiki/Regular_expression){:target="_blank"} to exclude a pattern of link | ` ` |
| `LICHE_PRINT_OK` | In addition to broken links, it will add not-broken links in the report (see [artifacts](#artifacts)) | `false` |
| `LICHE_RECURSIVE` | When `LICHE_DIRECTORY` is filled it will search for files recursively  | `true` |
| `FAIL_ON_BROKEN` | Make your pipeline fails when a broken link is found | `false` |
| `ROOT_DIRECTORY` | Used for absolute paths, it defines the root of HTML projects | ` ` |
| `LICHE_OPTIONS` | Additional options (see [options](https://github.com/raviqqe/liche){:target="_blank"}) | ` ` |
| `REPORT_OUTPUT` | Report file's name | `junit-report.xml` |

### Types of link verified

This tool will check for links in a specific context, and so in your project some link formats may not be checked. However,
here is (a non-exhaustive) list of what `Liche` can or can't identify:

**In HTML files (`.html`, `.htm`):**
```HTML
Can identify:

* <a href="https://www.google.com"></a>
* <a href="portfolio.html"></a>
* <a href="mailto:contact@google.com"></a>
* <img src="../images/logo.png"/>
* <img src="logo.png"/>

Can't identify:

* <div onClick="redirect('https://www.google.com')"></div>
* <script type="text/javascript">
      window.location.href = "https://www.google.com" 
  </script>
...
```

**In Markdown files (`.md`):**
```md

Can identify:

* [Gitlab](https://gitlab.com)
* [R2DevOps](https://r2devops.io){:target="_blank"}
* # New post [posts](https://pastebin.com)
* # My title link : https://www.google.com
* **See here a search engine: https://www.google.com**
```

### Absolute paths and rewriting urls

If you are using absolute paths in your HTML documents, be sure to fill the variable `ROOT_DIRECTORY`. If you don't, by default, the variable will be filled with the same path as `LICHE_DIRECTORY`.

If you use URL rewriting in your static website, using this job, most of the internal links will be considered as broken. To avoid that, you can define that you
only want to check external links, by using `LICHE_EXCLUDE: "^[^http]"` (see [regex](https://en.wikipedia.org/wiki/Regular_expression){:target="_blank"}) 

### Artifacts

We use [Junit](https://junit.org/junit5/){:target="_blank"}'s XML report to display error report
directly in pipeline `Test` tab and in merge request widget.
+6 −0
Original line number Diff line number Diff line
name: links_checker
description: Helping you find broken links it will do.
default_stage: static_tests
icon: 🔗
maintainer: Protocole
license: MIT
+84 −0
Original line number Diff line number Diff line
# Job from R2Devops hub --> r2devops.io

stages:
    - static_tests
  
links_checker:
    image: 
        name: peterevans/liche:1.1.1
        entrypoint: [""]
    stage: static_tests
    variables:
        # Variables relative to LICHE tool
        ## Defines in which directory LICHE is looking for files
        LICHE_DIRECTORY: ""
        ## Defines which files it should check
        LICHE_FILES: ""
        ## Exclude links based on regex pattern
        LICHE_EXCLUDE: ""
        ## Add in the report the link which are fine
        LICHE_PRINT_OK: "false"
        ## Custom options
        LICHE_OPTIONS: ""
        ## For a directory defined, search in sub folders for files
        LICHE_RECURSIVE: "true"
        ## Fails the pipeline if LICHE finds a broken link
        FAIL_ON_BROKEN: "false"
        ##
        ROOT_DIRECTORY: ""

        # Defines the name of the report
        REPORT_OUTPUT: "junit-report.xml"
    script:
        - mkdir /liche && cd /liche
        - apk add --update nodejs npm curl && npm install junit-report-builder

        - add_option() { export LICHE_OPTIONS="${LICHE_OPTIONS} ${1}"; }
        - |
            generate_report() {
                cat ${CI_PROJECT_DIR}/linkchecker_logs
                echo "EOF" >> ${CI_PROJECT_DIR}/linkchecker_logs
                curl -s -o /liche/main.cjs https://gitlab.com/r2devops/hub/-/snippets/2044617/raw/master/main.cjs
                node main.cjs "${CI_PROJECT_DIR}" "${CI_PROJECT_DIR}/linkchecker_logs" "${REPORT_OUTPUT}"
                mv ${REPORT_OUTPUT} ${CI_PROJECT_DIR}/${REPORT_OUTPUT}
            }

        - | 
            if [ ! -d ${CI_PROJECT_DIR}/${LICHE_DIRECTORY} ]; then
                echo "Directory specified ${CI_PROJECT_DIR}/${LICHE_DIRECTORY} does not exist, exit"
                exit 1
            fi
            if [ -z ${ROOT_DIRECTORY} ]; then
                export ROOT_DIRECTORY=${LICHE_DIRECTORY}
            fi
            add_option "${CI_PROJECT_DIR}/${LICHE_DIRECTORY} -d ${CI_PROJECT_DIR}/${ROOT_DIRECTORY}"; 
        - |
            if [ ! -z "${LICHE_FILES}" ]; then
                for i in ${LICHE_FILES}; do
                    if [ ! -f ${i} ]; then
                        echo "File ${i} does not exist, exit";
                        exit 1;
                    fi

                    add_option "${CI_PROJECT_DIR}/${i}"
                done
            fi
        - if [ ! -z ${LICHE_EXCLUDE} ]; then add_option "-x \"${LICHE_EXCLUDE}\""; fi
        - if [ ${LICHE_PRINT_OK} = "true" ]; then add_option "-v"; fi
        - if [ ${LICHE_RECURSIVE} = "true" ]; then add_option "-r"; fi
        
        - |
            if liche ${LICHE_OPTIONS} > ${CI_PROJECT_DIR}/linkchecker_logs 2>&1; then
                generate_report;
                echo "No errors so far in in the checked files";
            else
                generate_report;
                echo "Errors found in checked files";
                if [ ${FAIL_ON_BROKEN} = "true" ]; then exit 1; fi
            fi
    artifacts:
        when: always
        paths:
            - ${CI_PROJECT_DIR}/${REPORT_OUTPUT}
        reports:
            junit: ${CI_PROJECT_DIR}/${REPORT_OUTPUT}
 No newline at end of file
+1 −0
Original line number Diff line number Diff line
* Initial version
 No newline at end of file