Distributed Software Dependency Management Using Blockchain

Figures - uploaded by Horacio Gonzalez-Velez

Author content

All figure content in this area was uploaded by Horacio Gonzalez-Velez

Content may be subject to copyright.

Discover the world's research

25+ million members
160+ million publication pages
2.3+ billion citations

Join for free

Distributed Software Dependency Management

using Blockchain

Gavin D’mello, Horacio Gonz´

alez–V´

elez

Cloud Competency Centre, National College of Ireland

http://www.ncirl.ie/cloud

dmellogavin5000@gmail.com, horacio@ncirl.ie

Abstract—Contemporary software deployments rely on cloud-

based package managers for installation, where existing packages

are installed on demand from remote code repositories. Usually

frameworks or common utilities, packages increase the code

reusability within the ecosystem, whilst keeping the code base

small. However, disruptions in the package management services

can potentially affect development and deployment workﬂows.

Furthermore, cloud package managers have arguably an ambigu-

ous ownership model and offer limited visibility of packages to

the users. This work describes the development of a blockchain-

based package control system which is decentralised, reliable, and

transparent. Blockchain nodes are installed within the distributed

infrastructure to provide immutability, and then a dependency

graph is constructed with the help of smart contracts to trace

the software provenance. Our system has been successfully tested

with 4338 packages from NPM, 950 out of which are the top

depended-upon packages.

Index Terms—Software Reuse; Blockchain; Cloud Computing;

Software packaging; Smart Contracts

I. INTRODUCTION

Long considered a key practice in the industry, software re-

use entails the creation of new software systems using existing

software packages [1]. Typically composed of complementary

software modules, most software packages are made available

as common utility tools or frameworks which are used by

millions of users which can in turn be used by other packages.

With the advent of a microservices and cloud architectures,

package reuse has increased as each service has its lifetime

and own state to manage independently of other services. Each

language and community tend to have a different operating

package manager, and the proliferation of online version con-

trol tools such as Github and Bitbucket have led to the creation

of wider range of interdependent software components [2].

Package managers provide a platform for code sharing. Re-

liable application package managers are of prime importance

to software developers. Most packages need to be installed

immediately before the deployment phase.

Software packages tend to have direct and transitive de-

pendencies on other packages, which make them vulnerable

and/or prone to failure if any dependency is unpublished

or compromised. Dependencies are not necessarily straight-

forward and can have multiple nesting levels. For example,

the package libcurl, which is used for sending HTTP

requests, depends on other packages like zlib which is

used to compress data. Any failure in getting the package

metadata or binaries could lead to build failures and hinder

the development processes. An example of such a scenario

is the left-pad

problem discussed by [2] which led to many

installation and build failures on NPM. Two percent of the

transitive packages installations failed in this event.

While different software package managers offer mirrors

and streaming, most package managers are heavily centralised

in their architectures, which can present a single point of

failure and, more relevant to this work, a source of incon-

sistencies when package components or versions change. It is

therefore important to check if existing package managers can

be decentralised to improve the reliability of the ecosystem.

Widely considered immutable time-stamped data structures,

blockchains implement peer-to-peer networks where parti-

cipants can concurrently verify interactions using decentralised

consensus protocols. Blockchain smart contracts allow us to

store data and execute functions on them in such decentralised

setup. Once a smart contract is deployed, transactions can be

sent to the contract. In our case, transactions are the versions

or new packages submitted by developers. The changes made

by the transactions to our data structure are “mined” (veriﬁed)

and broadcast to the entire network. A change once mined,

cannot be reverted.

This paper is organised as follows. Section II discusses

package managers and existing Blockchain systems. Sec-

tion III outlines our proposed method to manage software

packages with Blockchain, including the proposed algorithm

for smart contracts. Section V shows the implementation of

versions on the Smart contracts. Solidity was used to write the

Smart contract and peer to peer storage was used to upload the

packages. The version tree is described which shows how one

version is different from another. The pattern used to store the

contract also helps us to seamlessly change the processing

logic from the storage. Section VI presents our evaluation

using 4338 packages along with some of their versions. These

4338 packages include packages which are the most depended-

upon and key utilities. The section also outlines the bandwidth

requirements of the blockchain node and the latency involved

in pulling the packages from the system. It also shows a part

of a network graph modelled directly from the data coming

from the blockchain node. Finally, Section VII presents some

concluding remarks.

Table I

LIS T OF LA NG UAGE S AND T HE IR PACK AGE M ANAG ER S

Language Package manager

Java Maven

Nodejs NPM

Python PyPI

C# Nuget

PHP Packagist

Ruby Ruby gems

II. LITERATURE REVIEW

Traditionally, software has been created using a waterfall

model where every change goes through some pre-requisite

number of stages. With the advent of open-source and dy-

namic collaborative environments, rapid application devel-

opment has increasingly become the norm for applicative

environments [3]. Agile, Scrum, Extreme Programming, and

other rapid application development methodologies have be-

come more popular, since they allow changes to be added

dynamically, leading to continuous implementation using a

backlog. Developers pick software features from the backlog

and releases are made at shorter durations compared to the tra-

ditional model, leading to adaptive software development [4].

While software modularisation has been previously studied

in the literature using different approaches [5], [6], the grass-

roots distributed administration of software packages remains

an open problem. Code bases are constantly evolving over time

and version control tools like Git and Subversion are widely

used to manage versions and control changes.

Managing dynamic versions and software dependencies

is complicated, as the program dependency graph for large

programs is long known to be difﬁcult to handle [7].

Microservice-based cloud architecture—where package de-

pendencies can be linked to different packages—then increases

the challenge at hand [8], since the search space is signiﬁcantly

large to completely understand conﬂicts between dependen-

cies.

The term ’DLL hell’ has been coined to describe many

different versions of the same library [9]. Programs using the

different versions of the same library tend to break in case

there are major changes in the package, so package managers

are expected to be ‘intelligent enough’ to handle different

versions of the same package. A CUDF (Common Upgrade

Description Format) document was proposed to keep a track

of the package deﬁnition and its dependencies [10], similar to

PyPI’s requirement.txt ﬁle or NPM’s package.json.

However, modern-day application package managers such as

NPM can have multiple versions of the package [11], and

common version requirements are kept in a common directory

and alternate versions are kept local to the package which

helps to eliminate collisions of different versions. Some pack-

age managers use semantic versioning or ’semver’ principles.

These principles should be clearly understood by the authors

and users. Authors must understand that the breaking changes

must always be released as major changes. Users must care-

fully review the changes in the packages to avoid build errors.

x4:1

x5:0

x2:1

x4:0

x3:0

x2:0

x3:0

x2:0

x1:0

Figure 1. Transitive nature of packages.

Failure to understand these laws puts build systems at risk.

Versions are divided into three parts major, minor and patch

[12].

A security-oriented management framework, CHAINIAC

has been used to verify integrity and authenticity for software-

release processes based on decentralised nodes [13]. However,

CHAINIAC does not appear to address the immutability issue

as changes in versions may eventually break compilation and

software components. Major version bumps are for breaking

changes in the API or when big parts of the package are being

rewritten. Minor versions are for new feature additions to the

existing set without breaking changes. A patch is a bug ﬁx

which is backword compatible.

Table I has a list of selected languages along with their

package managers.

Having openly released their architecture for dependency

management, NPM is the package manager for JavaScript and

is also widely considered among the largest code repositories

in the world [14].

This work then focuses on the application of smart contracts

to maintain an immutable decentralised change control system

for packages and versions. It can assure the provenance of

a given set of packages to marshal the correct development

and deployment for a given set of new packages. We have

evaluated our work using 4338 packages from NPM.

III. MET HO D

Every package manager has to deal with direct and transitive

dependencies. The transitivity can be seen from 1. Here,

all the nodes are packages and the edges represent the de-

pendence. We can see that package F depends on C1, C2and

C3. Also, packages C1, C2and C3directly depend on other

packages.

Here, each C clause has three versions each of which

depends on different packages. For example, C1:0 depends on

x1:0 and C1:1 depends on x2:0. A dependency chart was used

for the 3SAT reduction as shown by [15]. If the x packages

were to be removed, it would affect F and all C packages.

The removal of x would lead to situation similar to left-pad

explained by [2].

A. Data structure

The data schema can be best seen as a tree shown in 2.

The root node acts the package name and the leaf node is the

actual package information. The nodes are version numbers.

The data structure provided in Figure 3 gives a summary of

the data structure. Every package would have its own package

gcc

v0.0.1

v0.0.2

v0.1.0

Figure 2. Placement of new versions on version tree

{

own er : ’ e xa mple ow n er id ’ ,

d e p e n d e n c i e s : {

’ P ac k ag e1 ’ : ’ 1 . 1 . 2 ’ ,

’ P ac k ag e 2 ’ : ’ 1 . 2 . 1 ’

l i n k : ’ e x a m p l e l i n k ’ ,

chec ks um : ’ ex am pl e checksu m ’

}

Figure 3. Crypto Assets stored for each version.

tree. This tree would evolve over time. The data structure

is ﬂexible to both the minor major versioning technique as

well as semantic versioning. An oversimpliﬁcation of this

data structure would be to have to nested map pointing to

a resource. Using an hashmap keeps the time complexity to

O(1).

The package information will contain information which

is important to install the package like dependencies, link,

dependents. The package dependencies are the packages which

will be installed with the package. The version of the de-

pendencies is important here to exactly install the dependency

the package needs. The data can be fed to the client either

in a single go or differently for every package. The ownerId

helps us to identify the owner of the package. The package

ecosystem can be looked upon as a giant graph where each

can depend on other packages or other packages depend on it.

The link is the pointer to the package binary, which will be

stored using the Inter Planetary File System (IPFS).

Developers can host the Ethereum nodes in their environ-

ment or use a centralised service which gives them access to a

node. Some users would like to keep a copy of the blockchain

in the event the network goes down. The usage of the node

or the network will be conﬁgurable via the package manager

client.

B. Storage solution

We need to keep the bare minimum information on the

ledger so that the intermediation of metadata on the peer to

peer network is fast. A compressed version of the package

would be kept on the IPFS. IPFS is a peer to peer storage

solution. While uploading the package ﬁle to IPFS we receive

an immutable hash. This hash can be used to retrieve the ﬁle in

the future. IPFS uses a Distributed hash table to store the hash

and the data is stored locally in the node where it is published.

Any other node which requests a ﬁle has to download the ﬁle

from the nearest node and keep it locally.

We use a Coral Distributed hash table to store the data

which concentrates more on locality [16]. A replication factor

can be added to have some replication of blocks. The hash

returned is stored on the Smart contract. IPFS internally uses

the Bit Torrent protocol where it opens up many connections

to the different peers and downloads bits and pieces of the ﬁle

simultaneously. By using IPFS we make sure that no part of

the architecture is centralised. IPFS is used for storing web

archival records where the payloads are stored in IPFS and

the indexes are stored in a format called CDXJ [17]. CDXJ is

an extension to CDX which has JSON support. CDXJ plays a

similar role to the blockchain node in our system except that

it is mutable.

C. Algorithms

The algorithm outlined is used for storing a package which

the developer wants to make available. The client side would

have to provide the owner name, package name, version, and

dependencies. The package version, name, and dependencies

are extracted from the dependency ﬁle on the client. The

package to be published is stored in the data structure men-

tioned above which will contain trees for all packages. The

complexity of this algorithm is O(1). The package Info is the

same as the mentioned above. It will contain the dependencies,

dependency versions and link to the current package.

The algorithm for downloading an installed package is

given. The asymptotic complexity of installing the entire pack-

age depends on the number of dependencies in the package.

Requesting a package with its version from the storage layer

would have a time complexity of O(1). Web3 does not support

passing structures downstream yet. As and when support is

added for dynamic structures being returned downstream, the

model contract can have the processing logic to collect the

package with all its dependencies. This would decouple the

storage and the processing and the model contract can be

changed with time.

D. Diagrams

Class diagram

Contracts are immutable in nature. In a centralized setup, we

just update the code and deploy to all the servers. Contracts

cannot be updated, new contracts need to be deployed. We

need to make sure we do not lose references to our storage

layer. The storage layer is separated to ensure a reference is

maintained. All the package metadata would be stored within

Figure 4. Class Diagram for the Contract Deployment Pattern.

Algorithm 1 Publish algorithm

1: procedure PUBLISHPACKAG E

*Smart contract storage

2: packages ←package map

*inputs

3: pn ←name of the package

4: packageI nf o ←packageInfo

5: v←version

6: major ←version major

7: minor ←version minor

8: patch ←version patch

9: if packages[pn]then

10: if packages[pn][major][minor][patch]== null then

11: packages[pn][major][minor][patch]←

packageInfo

12: success ←true

13: else

14: success ←false

15: end if

16: end if

return success

17: end procedure

the storage contract. The relationships with other packages

are stored in the contract as well. Whenever new contracts are

deployed, the client needs to be notiﬁed with the address of

the new contract. This becomes an issue because an update

would have to be released on every new contract change.

To overcome this issue, we propose a register-contract which

contains a reference to the main contract. The client function

will have to execute the register-contract and get the address

of the main contract. Once the address is received, it will be

Algorithm 2 Install package algorithm

1: procedure GET PACK AGE

*Smart contract storage

2: packages ←map of packages

*inputs

3: major ←version major

4: minor ←version minor

5: patch ←version patch

6: result ←[]

7: if packages[pn][major][minor][patch]!= null then

8: packageInfo ←packages[pn][major][minor][patch]

9: dependencies ←packageInfo[’dependencies’]

10: result ←packageInfo, dependencies

11: success ←true

12: else

13: success ←false

14: end if

return success,result

15: end procedure

able to get the latest code of the contract.

Sequence diagrams

The installation sequence diagram shows the installation of

a package. Figure 5 shows the interaction between the client,

the node, and the IPFS server during an installation. In HDFS,

the blocks do not ﬂow through the name-node. However, the

name-node does keep a log of all the blocks on the data

nodes [18]. Similar to HDFS, the ﬁle data does not go through

the blockchain nodes. The ﬁle data is fetched from the IPFS

servers. This would increase the throughput and also avoid the

blockchain intermediation issue.

Figure 5. Package installation.

Figure 6. Publishing a package.

Figure 6 shows the interaction between the client, IPFS and

the Blockchain node. The package to be published is converted

to a tar ﬁle and uploaded to IPFS. IPFS returns a hash link

which helps us uniquely identify the ﬁle when needed. We

take this hash link and send it to the blockchain node which

stores the package information.

IV. PROPOSED SYSTEM

The proposed system employs blockchain-based smart con-

tracts to decentralise the package management. Blockchain

decentralised systems have demonstrated not to have a single

point of failure and also improve resource utilization [19]. In

our case, we have employed the blockchain network Ethereum,

which uses Solidity for the creation of smart contracts.

The smart contract storage is used for the metadata of the

packages. The contract is immutable and once the package is

published it is marked in cryptographic stone. IPFS gives us

peer to peer storage which is used for storing the binaries.

The IPFS cluster module can be used over the IPFS server

to ensure there are replicas of the binaries that are published.

Table II shows a comparison of existing systems and proposed

systems.

Table II

COMPARISON OF EXISTING SYSTEMS AND THE PROPOSED METHOD

Variable name Different Package Managers

Head Maven Nuget NPM Proposed Method

Decentralized No No No Yes

Write Throughput High High High Low

Read Throughput High High High High

Immutable No No No Yes

A blockchain can be construed as a network rather than a

technology or system. The smart contract storage is essentially

a public database which is immutable and decentralized. It has

primarily been used in ﬁnancial systems. It only came into

prominence when Nakamoto released the Bitcoin paper [20].

The system does not have a central authority for verifying

transactions and involves having a group of miners to which

verify transactions. The miners are paid for the electricity and

CPU spent in verifying the transactions.

Ethereum is considered as a natural extension of Bitcoin

which is also a cryptocurrency. It is a Turing complete solution

where the user can program and deploy bytecode to the

Ethereum nodes [21]. This led to interesting opportunities

where users could actually harness the actual strength of the

network. The concept of smart contracts allowed the user to

add custom logic to the Ethereum nodes. The smart contract

functions can be called from thin clients like browsers which

we call decentralized applications. Transactions sent to the

Blockchain are mined by the Ethereum community miners

and there is a gas price which the miner gets for blocks that

they add to the system. Ethereum uses a distributed ledger for

transactions and provides a decentralized architecture for smart

contract execution. The usage of distributed ledger ensures that

reliability as changes are replicated across all the nodes. The

downtime of some nodes does not affect the other nodes.

V. IMPLEMENTATION

Table III shows the software requirements for the imple-

mentation. The Geth binary was used to run the Ethereum

node. The web3 package was used for the client side imple-

mentation. Solidity was used to write the smart contract. The

web3 client gives a nice interface to interact with Ethereum

based smart contract. A CentOS instance was created on

OpenStack to run the Geth binary and also the IPFS server.

The setup can be used as a decentralised system or in

collaboration with another package manager as shown in

Figure 7. The decentralised system would require users identi-

fying themselves to the network and would require sufﬁcient

funds to submit transactions. Every user would need an

Ethereum address and would publish the packages with their

own addresses. The collaboration with other package managers

would allow anyone to submit transactions without the need

of Ethereum addresses. The publishing would be done by a

single user address which will be shared by the services. The

collaboration mode does not require an Ethereum address from

the user.

Table III

SOFTWARE REQUIREMENTS.

Software Version

Geth v1.8.13

Solidity v0.4.24

Node.js v6.9.1

IPFS v0.4.17

web3 v0.18.4

The ownership model of the packages in a decentralised

system would have every user owning their packages. This

means that only the owner could add new versions to the

packages. In the collaborative system, we would have a single

user owning many packages. The collaborative system shown

in Figure 7 can potentially be used where users need high

reliability, but do not want create wallets to maintain packages.

The NPM listener pushes metadata to interested services.

The metadata consists of the name, version, license etc. A

daemon was created to get the metadata from NPM. A queue

was placed to improve the ﬂow control between the workers

and the listener. This also improved the package publishing

throughput of system.

Another CentOS instance was used to run the workers and

the service. The messages are given to the workers in a round

robin fashion. The workers are the processes which upload the

binaries to the peer to peer storage and push the metadata on

the Ethereum network. The workers can be horizontally scaled

out to improve job throughput. The workers are decoupled

from the listener. The listener is a single point of failure in

this architecture.

The smart contract models the dependencies graph of each

package. A graph network can be visualized with contract stor-

ing all the dependencies and the dependents on the package.

The public nature of the Ethereum makes it well suited for

open source packages. The storage contract has data operations

and is primarily the data layer of the system. A model contract

as shown in ﬁgure 4 can be used to improve the algorithm or

add other interesting features on top of the storage layer. This

pattern is also used to ensure that we do not lose the reference

to our storage data as contracts are immutable and structures

cannot be changed once the contract is deployed.

Web3 v0.20 was chosen over v1.0 beta because it has been

around longer. Also, some of the experimental solidity features

on string arrays were tried. The bytes array was found to be

more stable than the string array in solidity. This, however,

has an additional cost of converting all the byte elements to

string elements after getting them from the contract.

VI. RE SU LTS

Our evaluation has been performed using 4338 packages,

including the top 950 packages which are used directly and

transitively by other packages as documented in [22].

It has been seen that if the Ethereum node is bombarded

with low price transactions, these get stuck on the node as

pending transactions. These transactions are removed form the

Table IV

LATEN CY TE ST F OR GE TT ING PAC KAG E FRO M THE S YS TEM .

Number of packages Statistics (ms)

Head Mean Median standard deviation

250 148.3 146 8.73

500 149.578 147 9.23

1000 148.991 147 9.05

Table V

LATEN CY TE ST F OR GE TT ING PAC KAG E FRO M THE S YS TEM U SI NG

AS YNC HRO NOU S AP PROAC H.

Number of packages Statistics (ms)

Head Mean

250 0.748

500 0.67

1000 0.559

pending transactions buffer once the transactions have been

veriﬁed by the network.

Consequently, we have to set the gas price appropriately in

order for the transactions to be mined quickly. If the price

offered is low, the transactions will not be mined immediately

because the miners would have found other more lucrative

transactions. Consequently, our initial gas price was 1 gwei

for transactions and, by using a simple demand-supply iterative

function, it was ﬁnally increased to 5 gwei.

Our tests ran for a week. It was initially noticed that the

workers would crash due to insufﬁcient funds. As the funds

come from the faucet, which happens to be rate limited, we

decided to slowly submit transactions and keep adaptively

funding the address with more ether.

Bandwidth monitoring was performed on the instance where

the blockchain node was installed. The results are plotted in

Figure 8. The evaluation involved requesting a list of packages

along with their versions present on the node. The metadata

for these packages was then requested individually.

Nload was used to track the bandwidth of the instance.

There is a sharp increase in the outgoing bandwidth, as much

as 650.48 Kbit/s. The event happens because all the events

are requested from the node. Requesting the events for the

packages caused the bandwidth to spike up. The outgoing

bandwidth gradually trickled down to 396.32 Kbit/s as other

package metadata was requested.

Latency test was conducted on the Blockchain node. The

results for the test are given in Tables IV and V.

It was noticed that the mean latency to get the package

metadata from the Ethereum node is approximately 148 ms.

This test includes ﬁle I/O time. The test was done with

the synchronous API. This approach is not scalable as the

client libraries are run on user’s operating system. A test was

conducted with the asynchronous API which reduced the mean

latency to under a millisecond. The cost to get the overall

package would be higher as there is no handling for parsing

dynamic structures on the web3 client. Once there is a stable

support for dynamic structures, metadata for many packages

can be requested at once.

Figure 7. System architecture for publishing new packages

0 20 40 60 80 100

100

200

300

400

500

600

700

seconds

Bandwidth in kBit/s

Bandwidth measurement

Figure 8. Empirical evaluation of Bandwidth usage

The block synchronisation of Ethereum nodes depends on

other peers. The Ethereum node should be fully synchronized

in order to serve data correctly. The full synchronization

mode requires a lot of time as the chain is very large. The

light mode, on the other hand, is the fastest option but does

not have much support from peers and is experimental. The

fast synchronization mode just downloads the block headers

instead of the entire block and it also has better support from

the community compared to the light mode.

The testing was done with single IPFS node. The IPFS

cluster feature can be used to make sure there are replicas

distributed across cluster peers in the network. An IPFS node

which does not participate in the cluster can also request a

binary from the IPFS network on an on-demand basis where

it will have only a selected number of package binaries.

A replication factor can be decided upon to replicate the

packages to a minimum number of cluster peers.

Figure 9 presents a directed graph with all ﬁrst level

dependencies of the NPM repository. It is noted that a number

of packages in this Javascript ecosystem can have multiple

levels of dependencies. The graph is directly derived from

the data received from the Ethereum node using ﬁrst level

dependencies only. The highest number of direct dependents

observed in the subset which was tested is 361. Some packages

such as lodash, commander and body-parser show

an increasing number of ﬁrst level dependents. This number

is likely to grow as previously shown [22]. These packages

are critical to the community as they form the basis of any

sort of development.

Some packages recorded in the ledger have cyclic depend-

encies. The impact of the cyclic nature of the dependencies has

not been fully studied, but we argue that it may be detrimental

to the overall traceability and version control for a repository.

VII. CONCLUSIONS

The proposed system is arguably a reliable decentralised

architecture for package management. The system relies on the

Ethereum to keep systematic track of software dependencies

and provenance preservation. Speciﬁcally, smart contracts have

been systematically used for integrating our logic onto the

Ethereum network, where IPFS has been proposed for storing

actual binaries. We strongly believe this work should eventu-

ally have some impact into the open-source ecosystem as it

can be coupled into existing repositories to maintain a cohesive

chain of version and dependencies in software packages.

The Ethereum network can have any number of nodes in the

network at a time. The users can install the Ethereum node on

their infrastructure or use it as a service in order to keep track

of the dependencies. Also, the underlying architecture can be

also be used for any kind of dependency management with

tweaks to the client and version changes in the contract. The

native client will, however, change according to the platform.

Ethereum blockchain network has a good read throughout

compared to the write throughput which could be ideal for

dependency management as the current state of the art receives

billions of installation requests.

Further work should study dependencies at multiple levels

as our current work has only taken into account single-level.

Ideally, it may be useful to study in detail cliques which can

uncover not only functional software properties, but also non-

functional characteristics such as developer relationships, code

(a) Step 1 (b) Step 2

Figure 9. 2-step sequence of the directed graph of packages generated by our system. The centrality of a number of packages (e.g. lodash, commander)

identiﬁes their importance within a speciﬁc repository, in this case NPM.

styles, and other useful traits. One distinct possibility is to

build upon our previous work on software migration (pattern-

based [23] and document to graph NoSQL databases [24]) to

enable multi-level, multi-dependency component migration.

REFERENCES

[1] C. W. Krueger, “Software reuse,” ACM Computing Surveys, vol. 24,

no. 2, pp. 131–183, Jun. 1992.

[2] A. Decan, T. Mens, and M. Claes, “On the topology of package

dependency networks: A comparison of three programming language

ecosystems,” in ECSA ’16. Copenhagen: ACM, 2016, pp. 21:1–21:4.

[3] N. B. Ruparelia, “Software development lifecycle models,” SIGSOFT

Softw. Eng. Notes, vol. 35, no. 3, pp. 8–13, May 2010.

[4] J. Highsmith and A. Cockburn, “Agile software development: the

business of innovation,” Computer, vol. 34, no. 9, pp. 120–127, Sep

2001.

[5] D. Qiu, Q. Zhang, and S. Fang, “Reconstructing software high-level

architecture by clustering weighted directed class graph,” International

Journal of Software Engineering and Knowledge Engineering, vol. 25,

no. 04, pp. 701–726, 2015.

[6] R. Naseem, O. Maqbool, and S. Muhammad, “Cooperative clustering

for software modularization,” Journal of Systems and Software, vol. 86,

no. 8, pp. 2045–2062, 2013.

[7] K. J. Ottenstein and L. M. Ottenstein, “The program dependence graph

in a software development environment,” SIGPLAN Not., vol. 19, no. 5,

pp. 177–184, Apr. 1984.

[8] G. Toffetti, S. Brunner, M. Blchlinger, J. Spillner, and T. M. Bohnert,

“Self-managing cloud-native applications: Design, implementation, and

experience,” Future Generation Computer Systems, vol. 72, pp. 165 –

179, 2017.

[9] C. Tucker, D. Shuffelton, R. Jhala, and S. Lerner, “OPIUM: Optimal

package install/uninstall manager,” in ICSE ’07. Minneapolis: IEEE,

2007, pp. 178–188.

[10] P. Abate, R. D. Cosmo, R. Treinen, and S. Zacchiroli, “Dependency

solving: A separate concern in component evolution management,”

Journal of Systems and Software, vol. 85, no. 10, pp. 2228–2240, 2012.

[11] I. Schlueter. (2010) The node package manager and registry. (Last

accessed:1/Dec/2018). [Online]. Available: https://www.npmjs.org

[12] Semantic Versioning user guide, 2013, (Last accessed:1/Dec/2018).

[Online]. Available: https://semver.org/

[13] K. Nikitin, E. Kokoris-Kogias, P. Jovanovic, N. Gailly, L. Gasser,

I. Khofﬁ, J. Cappos, and B. Ford, “CHAINIAC: proactive software-

update transparency via collectively signed skipchains and veriﬁed

builds,” in USENIX Security 2017. Vancouver: USENIX Association,

Aug. 2017, pp. 1271–1287.

[14] E. Wittern, P. Suter, and S. Rajagopalan, “A look at the dynamics of the

JavaScript package ecosystem,” in MSR’16. Austin: ACM/IEEE, May

2016, pp. 351–361.

[15] R. Cox, “Version SAT,” 2016, (Last accessed:1/Dec/2018). [Online].

Available: https://research.swtch.com/version-sat

[16] J. Benet. (2014) IPFS-content addressed, versioned, P2P ﬁle system.

(Last accessed:1/Dec/2018). [Online]. Available: https://arxiv.org/pdf/

1407.3561.pdf

[17] S. Alam, M. Kelly, and M. L. Nelson, “Interplanetary wayback: The

permanent web archive,” in JCDL ’16. Newark: ACM, 2016, pp. 273–

274.

[18] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoop

distributed ﬁle system,” in MSST’10. Lake Tahoe: IEEE, May 2010,

pp. 1–10.

[19] F. Hawlitschek, B. Notheisen, and T. Teubner, “The limits of trust-

free systems: A literature review on blockchain technology and trust in

the sharing economy,” Electronic Commerce Research and Applications,

vol. 29, pp. 50–63, 2018.

[20] S. Nakamoto. (2008) Bitcoin: A peer-to-peer electronic cash system.

(Last accessed:1/Dec/2018). [Online]. Available: https://bitcoin.org/

bitcoin.pdf

[21] V. Buterin et al., “A next-generation smart contract and decentralized

application platform,” Tech. Rep., 2014, (Last accessed:1/Dec/2018).

[Online]. Available: https://github.com/ethereum/wiki/wiki/White-Paper

[22] A. Kashcha. (2018) Npm rank. (Last accessed:1/Dec/2018). [Online].

Available: https://gist.github.com/anvaka/8e8fa57c7ee1350e3491

[23] S. Boob, H. Gonz´

alez-V´

elez, and A. M. Popescu, “Automated instanti-

ation of heterogeneous fast ﬂow CPU/GPU parallel pattern applications

in clouds,” in PDP 2014. Torino: IEEE, Feb. 2014, pp. 162–169.

[24] A. Bansel, H. Gonz´

alez-V´

elez, and A. E. Chis, “Cloud-based NoSQL

data migration,” in PDP 2016. Heraklion: IEEE, Feb. 2016, pp. 224–

231.

... Bubbles' sizes represent the frequency of studies that fall within these intersections (The total number of studies in this map overcomes the number of selected studies, because one study, in specific, [30] refers to more than one topic). Inspired by SWEBOK, we classified the selected studies into 8 categories, namely, (i) software requirements [43], (ii) software engineering process [30,[44][45][46], (iii) software testing [47,48], (iv) software quality [49][50][51], (v) software maintenance [29], (vi) software configuration management [30,41,52], (vii) software engineering management [37,42,[53][54][55][56], and (viii) professional practice [57][58][59]. We briefly describe the SE applications below. ...

... This approach aims to replace centralized package management systems, in order to democratize and professionalize the SE field. In turn, D'mello and González-Vélez [52] proposed a blockchain-enabled package control system that stores metadata of the new packages (owner name, package name, version and dependencies) on the distributed ledger and package files on IPFS (InterPlanetary File System). Their model was successfully tested with 4338 packages from NPM (Node Package Manager). ...

... However, these authors do not provide information about how consensus is reached regarding the order of transactions. Ethereum [29], [37], [44] *, [52], [53] ∆ , [54] ¥ , [55] † , [59] Hyperledger Fabric [48], [49], [56] ♦ , [58] Others Susereum [50] * ChainSoft, ∆ CAG, ¥ Blinker, † ShIFt, ♦ Blockhub. ...

The novel, yet disruptive blockchain technology has witnessed growing attention, due to its intrinsic potential. Besides the conventional domains that benefit from such potential, such as finance, supply chain and healthcare, blockchain use cases in software engineering have emerged recently. In this study, we aim to contribute to the body of knowledge of blockchain-oriented software engineering by providing an adequate overview of the software engineering applications enabled by blockchain technology. To do so, we carried out a systematic mapping study and identified 22 primary studies. Then, we extracted data within the research type, research topic and contribution type facets. Findings suggest an increasing trend of studies since 2018. Additionally, findings reveal the potential of using blockchain technologies as an alternative to centralized systems, such as GitHub, Travis CI, and cloud-based package managers, and also to establish trust between parties in collaborative software development. We also found out that smart contracts can enable the automation of a variety of software engineering activities that usually require human reasoning, such as the acceptance phase, payments to software engineers, and compliance adherence. In spite of the fact that the field is not yet mature, we believe that this systematic mapping study provides a holistic overview that may benefit researchers interested in bringing blockchain to the software industry, and practitioners willing to understand how blockchain can transform the software development industry.

... The MAXnpm tool exhibits enhanced efficacy compared to the conventional npm approach, particularly in its ability to mitigate vulnerable dependencies by 30.51% and provide newer package solutions with an average improvement of 2.62%. The second tool is by D'mello et al. [20]. The solution makes use of smart contracts and Ethereum's blockchain technology. ...

We conducted a thorough SLR to better grasp the challenges and possible solutions associated with existing npm security tools. Our goal was to delve into documented experiences and findings. Specifically, we were keen to learn about the motivations behind choosing third-party packages, software engineers' responses to warning messages, and their overall understanding of security issues. The main aim of this review was to pinpoint prevailing trends, methods, and concerns in trust tools for the present npm environment. Furthermore, we sought to understand the complexities of integrating SECO into platforms such as npm. By analyzing earlier studies, our intention was to spot any overlooked areas and steer our research to address them.

... Consequently, different research groups have explored the impact of blockchain technology on business models in different domains including healthcare (Hardin & Kotz, 2021), Industry 4.0 (Putz, Dietz, Empl, & Pernul, 2021;da Rosa Righi, Alberti, & Singh, 2020), payments (Holotiuk, Pisani, & Moormann, 2017), smart cities (Esposito, Ficco, & Gupta, 2021;Hirtan, Dobre, & González-Vélez, 2020), smart vehicles (Oham, Michelin, Jurdak, Kanhere, & Jha, 2021), supply chain (Chang, Chen, & Lu, 2019), fake news detection (Chen, Srivastava, Parizi, Aloqaily, & Ridhawi, 2020), and tourism (Leal, Veloso, Malheiro and González-Vélez, 2020). Moreover, blockchain has been explored to ensure data integrity in computer systems F. Leal et al. including cloud storage (Li, Wu, Jiang, & Srikanthan, 2020), Internet of Things (Zhao, Chen, Liu, Baker, & Zhang, 2020), and opensource software repositories (D'Mello & González-Vélez, 2019). Specifically, Li et al. (2020) propose an auditing scheme to ensure completeness and correctness of the data underlying blockchain technology for enhanced reliability in decentralised networks. ...

Multi-service networks aim to efficiently supply distinct goods within the same infrastructure by relying on a (typically centralised) authority to manage and coordinate their differential delivery at specific prices. In turn, final customers constantly seek to lower costs whilst maximising quality and reliability. This paper proposes a decentralised business model for multi-service networks using Ethereum blockchain features – gas, transactions, and smart contracts – to execute multiple services at different prices. By employing the Ethereum cryptocurrency token, Ether, to quantify the quality of service and reliability of distinct private Ethereum networks, our model concurrently processes streams of services at different gas prices while differentially delivering reliability and service quality. This multi-service business model has been extensively tested on five concurrent Ethereum networks with various combinations of gas prices, miners, and regular nodes using a Proof of Authority consensus algorithm and throughput as the evaluation metric. It has exhibited linear scalability, providing increased throughput in high-quality Ethereum networks, i.e., composed of more validator nodes. The results also indicate that different mining prices do not impact the network performance, but networks with more miners had limited scalability and an increased level of trustworthiness and reliability.

... Indeed, the most widely known for cryptocurrency, the Bitcoin [94], remains something like the gold Standard for financial blockchain applications. Nonetheless, while blockchains have been used extensively in financial entities, their decentralized immutability characteristics have made them particularly suitable for applications in other domains as diverse as Law [92], Food Traceability [50], and Open-source Software Management [42]. To highlight the importance of blockchain technologies evaluation, an example in surrounding blockchain technologies for Energy markets follows in the next subsection. ...

This chapter surveys the state-of-the-art in forecasting cryptocurrency value by Sentiment Analysis. Key compounding perspectives of current challenges are addressed, including blockchains, data collection, annotation, and filtering, and sentiment analysis metrics using data streams and cloud platforms. We have explored the domain based on this problem-solving metric perspective, i.e., as technical analysis, forecasting, and estimation using a standardized ledger-based technology. The envisioned tools based on forecasting are then suggested, i.e., ranking Initial Coin Offering (ICO) values for incoming cryptocurrencies, trading strategies employing the new Sentiment Analysis metrics, and risk aversion in cryptocurrencies trading through a multi-objective portfolio selection. Our perspective is rationalized on the perspective on elastic demand of computational resources for cloud infrastructures.

In our world today, organizations are adopting various means and pathways to collaborate strategically with other organizations toward boosting their productivity, expanding their business growth, and attaining new frontiers in innovation. One critical challenge has inhibited this process in service collaboration among various chains of businesses, which is the lack of trust among organizations. A forthcoming approach to mitigate these issues is by leveraging the applications of blockchain technology made possible by embedding business execution workflow with smart contracts. This approach can drastically improve transparency, accountability, and trust decentralization among corporations. Blockchain technology is able to provide an immutable and auditable distributed architecture capable of supporting a distributed workflow management system, which is the scope of this chapter. Overall, it enables the fast automation of business processes and workflows using smart contract to enforce project agreements across the network. This would bolster collaboration between trustless entities without a dependency of third-party involvement. In general, the existing blockchain-based workflow management systems focus on workflow coordination. This chapter thus proposes to illustrate an information workflow mechanism for inter-organizational collaboration enforced via smart contract and deployed on the blockchain network as well as a mini-implementation of this process on a Solidity IDE programming platform and associated results.

Transportation is the most convenient way of movement for any object or person from one location to another. Modern day transportation has become a big concern due to the increasing accident rates, traffic congestion, carbon emissions, air pollutions, etc. Such complexities needed to be resolved using the virtual technology for transportations, popularly known as Intelligent Transport System (ITS). Whilst digitalizing the transport system, one of the major concerns was collecting the user data, authenticating them, and storing them in a tamper-proof manner. This paper helps us to explore how Blockchain can be used to build a tamper-proof database for ITS. ITS is the application which can sense, analyse, control and communicate information with the physical transportation. ITS has already been applied in several applications, few of them could be seen in car navigations, traffic signal control systems, container management systems, automatic number plate recognition systems, security CCTV systems, feedback systems, parking guidance and information systems, weather information systems, and many more.KeywordsIntelligent transport systemBlockchainSecurityTrustReputation

Software-update mechanisms are critical to the security of modern systems, but their typically centralized design presents a lucrative and frequently attacked target. In this work, we propose CHAINIAC, a decentralized software-update framework that eliminates single points of failure , enforces transparency, and provides efficient verifi-ability of integrity and authenticity for software-release processes. Independent witness servers collectively verify conformance of software updates to release policies, build verifiers validate the source-to-binary correspondence , and a tamper-proof release log stores collectively signed updates, thus ensuring that no release is accepted by clients before being widely disclosed and validated. The release log embodies a skipchain, a novel data structure , enabling arbitrarily out-of-date clients to efficiently validate updates and signing keys. Evaluation of our CHAINIAC prototype on reproducible Debian packages shows that the automated update process takes the average of 5 minutes per release for individual packages, and only 20 seconds for the aggregate timeline. We further evaluate the framework using real-world data from the PyPI package repository and show that it offers clients security comparable to verifying every single update themselves while consuming only one-fifth of the bandwidth and having a minimal computational overhead.

To facilitate permanence and collaboration in web archives, we built InterPlanetary Wayback to disseminate the contents of WARC files into the IPFS network. IPFS is a peer-to-peer content-addressable file system that inherently allows deduplication and facilitates opt-in replication. We split the header and payload of WARC response records before disseminating into IPFS to leverage the deduplication, build a CDXJ index, and combine them at the time of replay. From a 1.0 GB sample Archive-It collection of WARCs containing 21,994 mementos, we found that on an average, 570 files can be indexed and disseminated into IPFS per minute. We also found that in our naive prototype implementation, replay took on an average 370 milliseconds per request.

At the tip of the hype cycle, trust-free systems based on blockchain technology promise to revolutionize interactions between peers that require high degrees of trust, usually facilitated by third party providers. Peer-to-peer platforms for resource sharing represent a frequently discussed field of application for “trust-free” blockchain technology. However, trust between peers plays a crucial and complex role in virtually all sharing economy interactions. In this article, we hence shed light on how these conflicting notions may be resolved and explore the potential of blockchain technology for dissolving the issue of trust in the sharing economy. By means of a dual literature review we find that 1) the conceptualization of trust differs substantially between the contexts of blockchain and the sharing economy, 2) blockchain technology is to some degree suitable to replace trust in platform providers, and that 3) trust-free systems are hardly transferable to sharing economy interactions and will crucially depend on the development of trusted interfaces for blockchain-based sharing economy ecosystems.

Running applications in the cloud efficiently requires much more than deploying software in virtual machines. Cloud applications have to be continuously managed: (1) to adjust their resources to the incoming load and (2) to face transient failures replicating and restarting components to provide resiliency on unreliable infrastructure. Continuous management monitors application and infrastructural metrics to provide automated and responsive reactions to failures (health management) and changing environmental conditions (auto-scaling) minimizing human intervention. In the current practice, management functionalities are provided as infrastructural or third party services. In both cases they are external to the application deployment. We claim that this approach has intrinsic limits, namely that separating management functionalities from the application prevents them from naturally scaling with the application and requires additional management code and human intervention. Moreover, using infrastructure provider services for management functionalities results in vendor lock-in effectively preventing cloud applications to adapt and run on the most effective cloud for the job. In this paper we discuss the main characteristics of cloud native applications, propose a novel architecture that enables scalable and resilient self-managing applications in the cloud, and relate on our experience in porting a legacy application to the cloud applying cloud-native principles.

Package-based software ecosystems are composed of thousands of interdependent software packages. Many empirical studies have focused on software packages belonging to a single software ecosystem, and suggest to generalise the results to more ecosystems. We claim that such a generalisation is not always possible, because the technical structure of software ecosystems can be very different, even if these ecosystems belong to the same domain. We confirm this claim through a study of three big and popular package- based programming language ecosystems: R’s CRAN archive network, Python’s PyPI distribution, and JavaScript’s NPM package manager. We study and compare the structure of their package dependency graphs and reveal some important differences that may make it difficult to generalise the findings of one ecosystem to another one.

The node package manager (npm) serves as the frontend to a large repository of JavaScript-based software packages, which foster the development of currently huge amounts of server-side Node. js and client-side JavaScript applications. In a span of 6 years since its inception, npm has grown to become one of the largest software ecosystems, hosting more than 230, 000 packages, with hundreds of millions of package installations every week. In this paper, we examine the npm ecosystem from two complementary perspectives: 1) we look at package descriptions, the dependencies among them, and download metrics, and 2) we look at the use of npm packages in publicly available applications hosted on GitHub. In both perspectives, we consider historical data, providing us with a unique view on the evolution of the ecosystem. We present analyses that provide insights into the ecosystem's growth and activity, into conflicting measures of package popularity, and into the adoption of package versions over time. These insights help understand the evolution of npm, design better package recommendation engines, and can help developers understand how their packages are being used.

The internal program representation chosen for a software development environment plays a critical role in the nature of that environment. A form should facilitate implementation and contribute to the responsiveness of the environment to the user. The program dependence graph (PDG) may be a suitable internal form. It allows programs to be sliced in linear time for debugging and for use by language-directed editors. The slices obtained are more accurate than those obtained with existing methods because I/O is accounted for correctly and irrelevant statements on multi-statement lines are not displayed. The PDG may be interpreted in a data driven fashion or may have highly optimized (including vectorized) code produced from it. It is amenable to incremental data flow analysis, improving response time to the user in an interactive environment and facilitating debugging through data flow anomaly detection. It may also offer a good basis for software complexity metrics, adding to the completeness of an environment based on it.

Software architecture reconstruction plays an important role in software reuse, evolution and maintenance. Clustering is a promising technique for software architecture reconstruction. However, the representation of software, which serves as clustering input, and the clustering algorithm need to be improved in real applications. The representation should contain appropriate and adequate information of software. Furthermore, the clustering algorithm should be adapted to the particular demands of software architecture reconstruction well. In this paper, we first extract Weighted Directed Class Graph (WDCG) to represent object-oriented software. WDCG is a structural and quantitative representation of software, which contains not only the static information of software source code but also the dynamic information of software execution. Then we propose a WDCG-based Clustering Algorithm (WDCG-CA) to reconstruct high-level software architecture. WDCG-CA makes full use of the structural and quantitative information of WDCG, and avoids wrong compositions and arbitrary partitions successfully in the process of reconstructing software architecture. We introduce four metrics to evaluate the performance of WDCG-CA. The results of the comparative experiments show that WDCG-CA outperforms the comparative approaches in most cases in terms of the four metrics.

Juan Benet

The InterPlanetary File System (IPFS) is a peer-to-peer distributed file system that seeks to connect all computing devices with the same system of files. In some ways, IPFS is similar to the Web, but IPFS could be seen as a single BitTorrent swarm, exchanging objects within one Git repository. In other words, IPFS provides a high throughput content-addressed block storage model, with content-addressed hyper links. This forms a generalized Merkle DAG, a data structure upon which one can build versioned file systems, blockchains, and even a Permanent Web. IPFS combines a distributed hashtable, an incentivized block exchange, and a self-certifying namespace. IPFS has no single point of failure, and nodes do not need to trust each other.