Figures - uploaded by Horacio Gonzalez-Velez
Author content
All figure content in this area was uploaded by Horacio Gonzalez-Velez
Content may be subject to copyright.
Discover the world's research
- 25+ million members
- 160+ million publication pages
- 2.3+ billion citations
Distributed Software Dependency Management
using Blockchain
Gavin D’mello, Horacio Gonz´
alez–V´
elez
Cloud Competency Centre, National College of Ireland
http://www.ncirl.ie/cloud
dmellogavin5000@gmail.com, horacio@ncirl.ie
Abstract—Contemporary software deployments rely on cloud-
based package managers for installation, where existing packages
are installed on demand from remote code repositories. Usually
frameworks or common utilities, packages increase the code
reusability within the ecosystem, whilst keeping the code base
small. However, disruptions in the package management services
can potentially affect development and deployment workflows.
Furthermore, cloud package managers have arguably an ambigu-
ous ownership model and offer limited visibility of packages to
the users. This work describes the development of a blockchain-
based package control system which is decentralised, reliable, and
transparent. Blockchain nodes are installed within the distributed
infrastructure to provide immutability, and then a dependency
graph is constructed with the help of smart contracts to trace
the software provenance. Our system has been successfully tested
with 4338 packages from NPM, 950 out of which are the top
depended-upon packages.
Index Terms—Software Reuse; Blockchain; Cloud Computing;
Software packaging; Smart Contracts
I. INTRODUCTION
Long considered a key practice in the industry, software re-
use entails the creation of new software systems using existing
software packages [1]. Typically composed of complementary
software modules, most software packages are made available
as common utility tools or frameworks which are used by
millions of users which can in turn be used by other packages.
With the advent of a microservices and cloud architectures,
package reuse has increased as each service has its lifetime
and own state to manage independently of other services. Each
language and community tend to have a different operating
package manager, and the proliferation of online version con-
trol tools such as Github and Bitbucket have led to the creation
of wider range of interdependent software components [2].
Package managers provide a platform for code sharing. Re-
liable application package managers are of prime importance
to software developers. Most packages need to be installed
immediately before the deployment phase.
Software packages tend to have direct and transitive de-
pendencies on other packages, which make them vulnerable
and/or prone to failure if any dependency is unpublished
or compromised. Dependencies are not necessarily straight-
forward and can have multiple nesting levels. For example,
the package libcurl, which is used for sending HTTP
requests, depends on other packages like zlib which is
used to compress data. Any failure in getting the package
metadata or binaries could lead to build failures and hinder
the development processes. An example of such a scenario
is the left-pad
¯
problem discussed by [2] which led to many
installation and build failures on NPM. Two percent of the
transitive packages installations failed in this event.
While different software package managers offer mirrors
and streaming, most package managers are heavily centralised
in their architectures, which can present a single point of
failure and, more relevant to this work, a source of incon-
sistencies when package components or versions change. It is
therefore important to check if existing package managers can
be decentralised to improve the reliability of the ecosystem.
Widely considered immutable time-stamped data structures,
blockchains implement peer-to-peer networks where parti-
cipants can concurrently verify interactions using decentralised
consensus protocols. Blockchain smart contracts allow us to
store data and execute functions on them in such decentralised
setup. Once a smart contract is deployed, transactions can be
sent to the contract. In our case, transactions are the versions
or new packages submitted by developers. The changes made
by the transactions to our data structure are “mined” (verified)
and broadcast to the entire network. A change once mined,
cannot be reverted.
This paper is organised as follows. Section II discusses
package managers and existing Blockchain systems. Sec-
tion III outlines our proposed method to manage software
packages with Blockchain, including the proposed algorithm
for smart contracts. Section V shows the implementation of
versions on the Smart contracts. Solidity was used to write the
Smart contract and peer to peer storage was used to upload the
packages. The version tree is described which shows how one
version is different from another. The pattern used to store the
contract also helps us to seamlessly change the processing
logic from the storage. Section VI presents our evaluation
using 4338 packages along with some of their versions. These
4338 packages include packages which are the most depended-
upon and key utilities. The section also outlines the bandwidth
requirements of the blockchain node and the latency involved
in pulling the packages from the system. It also shows a part
of a network graph modelled directly from the data coming
from the blockchain node. Finally, Section VII presents some
concluding remarks.
Table I
LIS T OF LA NG UAGE S AND T HE IR PACK AGE M ANAG ER S
Language Package manager
Java Maven
Nodejs NPM
Python PyPI
C# Nuget
PHP Packagist
Ruby Ruby gems
II. LITERATURE REVIEW
Traditionally, software has been created using a waterfall
model where every change goes through some pre-requisite
number of stages. With the advent of open-source and dy-
namic collaborative environments, rapid application devel-
opment has increasingly become the norm for applicative
environments [3]. Agile, Scrum, Extreme Programming, and
other rapid application development methodologies have be-
come more popular, since they allow changes to be added
dynamically, leading to continuous implementation using a
backlog. Developers pick software features from the backlog
and releases are made at shorter durations compared to the tra-
ditional model, leading to adaptive software development [4].
While software modularisation has been previously studied
in the literature using different approaches [5], [6], the grass-
roots distributed administration of software packages remains
an open problem. Code bases are constantly evolving over time
and version control tools like Git and Subversion are widely
used to manage versions and control changes.
Managing dynamic versions and software dependencies
is complicated, as the program dependency graph for large
programs is long known to be difficult to handle [7].
Microservice-based cloud architecture—where package de-
pendencies can be linked to different packages—then increases
the challenge at hand [8], since the search space is significantly
large to completely understand conflicts between dependen-
cies.
The term ’DLL hell’ has been coined to describe many
different versions of the same library [9]. Programs using the
different versions of the same library tend to break in case
there are major changes in the package, so package managers
are expected to be ‘intelligent enough’ to handle different
versions of the same package. A CUDF (Common Upgrade
Description Format) document was proposed to keep a track
of the package definition and its dependencies [10], similar to
PyPI’s requirement.txt file or NPM’s package.json.
However, modern-day application package managers such as
NPM can have multiple versions of the package [11], and
common version requirements are kept in a common directory
and alternate versions are kept local to the package which
helps to eliminate collisions of different versions. Some pack-
age managers use semantic versioning or ’semver’ principles.
These principles should be clearly understood by the authors
and users. Authors must understand that the breaking changes
must always be released as major changes. Users must care-
fully review the changes in the packages to avoid build errors.
F
C3
x4:1
x5:0
x2:1
C2
x4:0
x3:0
x2:0
C1
x3:0
x2:0
x1:0
Figure 1. Transitive nature of packages.
Failure to understand these laws puts build systems at risk.
Versions are divided into three parts major, minor and patch
[12].
A security-oriented management framework, CHAINIAC
has been used to verify integrity and authenticity for software-
release processes based on decentralised nodes [13]. However,
CHAINIAC does not appear to address the immutability issue
as changes in versions may eventually break compilation and
software components. Major version bumps are for breaking
changes in the API or when big parts of the package are being
rewritten. Minor versions are for new feature additions to the
existing set without breaking changes. A patch is a bug fix
which is backword compatible.
Table I has a list of selected languages along with their
package managers.
Having openly released their architecture for dependency
management, NPM is the package manager for JavaScript and
is also widely considered among the largest code repositories
in the world [14].
This work then focuses on the application of smart contracts
to maintain an immutable decentralised change control system
for packages and versions. It can assure the provenance of
a given set of packages to marshal the correct development
and deployment for a given set of new packages. We have
evaluated our work using 4338 packages from NPM.
III. MET HO D
Every package manager has to deal with direct and transitive
dependencies. The transitivity can be seen from 1. Here,
all the nodes are packages and the edges represent the de-
pendence. We can see that package F depends on C1, C2and
C3. Also, packages C1, C2and C3directly depend on other
packages.
Here, each C clause has three versions each of which
depends on different packages. For example, C1:0 depends on
x1:0 and C1:1 depends on x2:0. A dependency chart was used
for the 3SAT reduction as shown by [15]. If the x packages
were to be removed, it would affect F and all C packages.
The removal of x would lead to situation similar to left-pad
¯
explained by [2].
A. Data structure
The data schema can be best seen as a tree shown in 2.
The root node acts the package name and the leaf node is the
actual package information. The nodes are version numbers.
The data structure provided in Figure 3 gives a summary of
the data structure. Every package would have its own package
gcc
0
0
1
v0.0.1
2
v0.0.2
1
0
v0.1.0
Figure 2. Placement of new versions on version tree
{
own er : ’ e xa mple ow n er id ’ ,
d e p e n d e n c i e s : {
’ P ac k ag e1 ’ : ’ 1 . 1 . 2 ’ ,
’ P ac k ag e 2 ’ : ’ 1 . 2 . 1 ’
},
l i n k : ’ e x a m p l e l i n k ’ ,
chec ks um : ’ ex am pl e checksu m ’
}
Figure 3. Crypto Assets stored for each version.
tree. This tree would evolve over time. The data structure
is flexible to both the minor major versioning technique as
well as semantic versioning. An oversimplification of this
data structure would be to have to nested map pointing to
a resource. Using an hashmap keeps the time complexity to
O(1).
The package information will contain information which
is important to install the package like dependencies, link,
dependents. The package dependencies are the packages which
will be installed with the package. The version of the de-
pendencies is important here to exactly install the dependency
the package needs. The data can be fed to the client either
in a single go or differently for every package. The ownerId
helps us to identify the owner of the package. The package
ecosystem can be looked upon as a giant graph where each
can depend on other packages or other packages depend on it.
The link is the pointer to the package binary, which will be
stored using the Inter Planetary File System (IPFS).
Developers can host the Ethereum nodes in their environ-
ment or use a centralised service which gives them access to a
node. Some users would like to keep a copy of the blockchain
in the event the network goes down. The usage of the node
or the network will be configurable via the package manager
client.
B. Storage solution
We need to keep the bare minimum information on the
ledger so that the intermediation of metadata on the peer to
peer network is fast. A compressed version of the package
would be kept on the IPFS. IPFS is a peer to peer storage
solution. While uploading the package file to IPFS we receive
an immutable hash. This hash can be used to retrieve the file in
the future. IPFS uses a Distributed hash table to store the hash
and the data is stored locally in the node where it is published.
Any other node which requests a file has to download the file
from the nearest node and keep it locally.
We use a Coral Distributed hash table to store the data
which concentrates more on locality [16]. A replication factor
can be added to have some replication of blocks. The hash
returned is stored on the Smart contract. IPFS internally uses
the Bit Torrent protocol where it opens up many connections
to the different peers and downloads bits and pieces of the file
simultaneously. By using IPFS we make sure that no part of
the architecture is centralised. IPFS is used for storing web
archival records where the payloads are stored in IPFS and
the indexes are stored in a format called CDXJ [17]. CDXJ is
an extension to CDX which has JSON support. CDXJ plays a
similar role to the blockchain node in our system except that
it is mutable.
C. Algorithms
The algorithm outlined is used for storing a package which
the developer wants to make available. The client side would
have to provide the owner name, package name, version, and
dependencies. The package version, name, and dependencies
are extracted from the dependency file on the client. The
package to be published is stored in the data structure men-
tioned above which will contain trees for all packages. The
complexity of this algorithm is O(1). The package Info is the
same as the mentioned above. It will contain the dependencies,
dependency versions and link to the current package.
The algorithm for downloading an installed package is
given. The asymptotic complexity of installing the entire pack-
age depends on the number of dependencies in the package.
Requesting a package with its version from the storage layer
would have a time complexity of O(1). Web3 does not support
passing structures downstream yet. As and when support is
added for dynamic structures being returned downstream, the
model contract can have the processing logic to collect the
package with all its dependencies. This would decouple the
storage and the processing and the model contract can be
changed with time.
D. Diagrams
Class diagram
Contracts are immutable in nature. In a centralized setup, we
just update the code and deploy to all the servers. Contracts
cannot be updated, new contracts need to be deployed. We
need to make sure we do not lose references to our storage
layer. The storage layer is separated to ensure a reference is
maintained. All the package metadata would be stored within
Figure 4. Class Diagram for the Contract Deployment Pattern.
Algorithm 1 Publish algorithm
1: procedure PUBLISHPACKAG E
*Smart contract storage
2: packages ←package map
*inputs
3: pn ←name of the package
4: packageI nf o ←packageInfo
5: v←version
6: major ←version major
7: minor ←version minor
8: patch ←version patch
9: if packages[pn]then
10: if packages[pn][major][minor][patch]== null then
11: packages[pn][major][minor][patch]←
packageInfo
12: success ←true
13: else
14: success ←false
15: end if
16: end if
return success
17: end procedure
the storage contract. The relationships with other packages
are stored in the contract as well. Whenever new contracts are
deployed, the client needs to be notified with the address of
the new contract. This becomes an issue because an update
would have to be released on every new contract change.
To overcome this issue, we propose a register-contract which
contains a reference to the main contract. The client function
will have to execute the register-contract and get the address
of the main contract. Once the address is received, it will be
Algorithm 2 Install package algorithm
1: procedure GET PACK AGE
*Smart contract storage
2: packages ←map of packages
*inputs
3: major ←version major
4: minor ←version minor
5: patch ←version patch
6: result ←[]
7: if packages[pn][major][minor][patch]!= null then
8: packageInfo ←packages[pn][major][minor][patch]
9: dependencies ←packageInfo[’dependencies’]
10: result ←packageInfo, dependencies
11: success ←true
12: else
13: success ←false
14: end if
return success,result
15: end procedure
able to get the latest code of the contract.
Sequence diagrams
The installation sequence diagram shows the installation of
a package. Figure 5 shows the interaction between the client,
the node, and the IPFS server during an installation. In HDFS,
the blocks do not flow through the name-node. However, the
name-node does keep a log of all the blocks on the data
nodes [18]. Similar to HDFS, the file data does not go through
the blockchain nodes. The file data is fetched from the IPFS
servers. This would increase the throughput and also avoid the
blockchain intermediation issue.
Figure 5. Package installation.
Figure 6. Publishing a package.
Figure 6 shows the interaction between the client, IPFS and
the Blockchain node. The package to be published is converted
to a tar file and uploaded to IPFS. IPFS returns a hash link
which helps us uniquely identify the file when needed. We
take this hash link and send it to the blockchain node which
stores the package information.
IV. PROPOSED SYSTEM
The proposed system employs blockchain-based smart con-
tracts to decentralise the package management. Blockchain
decentralised systems have demonstrated not to have a single
point of failure and also improve resource utilization [19]. In
our case, we have employed the blockchain network Ethereum,
which uses Solidity for the creation of smart contracts.
The smart contract storage is used for the metadata of the
packages. The contract is immutable and once the package is
published it is marked in cryptographic stone. IPFS gives us
peer to peer storage which is used for storing the binaries.
The IPFS cluster module can be used over the IPFS server
to ensure there are replicas of the binaries that are published.
Table II shows a comparison of existing systems and proposed
systems.
Table II
COMPARISON OF EXISTING SYSTEMS AND THE PROPOSED METHOD
Variable name Different Package Managers
Head Maven Nuget NPM Proposed Method
Decentralized No No No Yes
Write Throughput High High High Low
Read Throughput High High High High
Immutable No No No Yes
A blockchain can be construed as a network rather than a
technology or system. The smart contract storage is essentially
a public database which is immutable and decentralized. It has
primarily been used in financial systems. It only came into
prominence when Nakamoto released the Bitcoin paper [20].
The system does not have a central authority for verifying
transactions and involves having a group of miners to which
verify transactions. The miners are paid for the electricity and
CPU spent in verifying the transactions.
Ethereum is considered as a natural extension of Bitcoin
which is also a cryptocurrency. It is a Turing complete solution
where the user can program and deploy bytecode to the
Ethereum nodes [21]. This led to interesting opportunities
where users could actually harness the actual strength of the
network. The concept of smart contracts allowed the user to
add custom logic to the Ethereum nodes. The smart contract
functions can be called from thin clients like browsers which
we call decentralized applications. Transactions sent to the
Blockchain are mined by the Ethereum community miners
and there is a gas price which the miner gets for blocks that
they add to the system. Ethereum uses a distributed ledger for
transactions and provides a decentralized architecture for smart
contract execution. The usage of distributed ledger ensures that
reliability as changes are replicated across all the nodes. The
downtime of some nodes does not affect the other nodes.
V. IMPLEMENTATION
Table III shows the software requirements for the imple-
mentation. The Geth binary was used to run the Ethereum
node. The web3 package was used for the client side imple-
mentation. Solidity was used to write the smart contract. The
web3 client gives a nice interface to interact with Ethereum
based smart contract. A CentOS instance was created on
OpenStack to run the Geth binary and also the IPFS server.
The setup can be used as a decentralised system or in
collaboration with another package manager as shown in
Figure 7. The decentralised system would require users identi-
fying themselves to the network and would require sufficient
funds to submit transactions. Every user would need an
Ethereum address and would publish the packages with their
own addresses. The collaboration with other package managers
would allow anyone to submit transactions without the need
of Ethereum addresses. The publishing would be done by a
single user address which will be shared by the services. The
collaboration mode does not require an Ethereum address from
the user.
Table III
SOFTWARE REQUIREMENTS.
Software Version
Geth v1.8.13
Solidity v0.4.24
Node.js v6.9.1
IPFS v0.4.17
web3 v0.18.4
The ownership model of the packages in a decentralised
system would have every user owning their packages. This
means that only the owner could add new versions to the
packages. In the collaborative system, we would have a single
user owning many packages. The collaborative system shown
in Figure 7 can potentially be used where users need high
reliability, but do not want create wallets to maintain packages.
The NPM listener pushes metadata to interested services.
The metadata consists of the name, version, license etc. A
daemon was created to get the metadata from NPM. A queue
was placed to improve the flow control between the workers
and the listener. This also improved the package publishing
throughput of system.
Another CentOS instance was used to run the workers and
the service. The messages are given to the workers in a round
robin fashion. The workers are the processes which upload the
binaries to the peer to peer storage and push the metadata on
the Ethereum network. The workers can be horizontally scaled
out to improve job throughput. The workers are decoupled
from the listener. The listener is a single point of failure in
this architecture.
The smart contract models the dependencies graph of each
package. A graph network can be visualized with contract stor-
ing all the dependencies and the dependents on the package.
The public nature of the Ethereum makes it well suited for
open source packages. The storage contract has data operations
and is primarily the data layer of the system. A model contract
as shown in figure 4 can be used to improve the algorithm or
add other interesting features on top of the storage layer. This
pattern is also used to ensure that we do not lose the reference
to our storage data as contracts are immutable and structures
cannot be changed once the contract is deployed.
Web3 v0.20 was chosen over v1.0 beta because it has been
around longer. Also, some of the experimental solidity features
on string arrays were tried. The bytes array was found to be
more stable than the string array in solidity. This, however,
has an additional cost of converting all the byte elements to
string elements after getting them from the contract.
VI. RE SU LTS
Our evaluation has been performed using 4338 packages,
including the top 950 packages which are used directly and
transitively by other packages as documented in [22].
It has been seen that if the Ethereum node is bombarded
with low price transactions, these get stuck on the node as
pending transactions. These transactions are removed form the
Table IV
LATEN CY TE ST F OR GE TT ING PAC KAG E FRO M THE S YS TEM .
Number of packages Statistics (ms)
Head Mean Median standard deviation
250 148.3 146 8.73
500 149.578 147 9.23
1000 148.991 147 9.05
Table V
LATEN CY TE ST F OR GE TT ING PAC KAG E FRO M THE S YS TEM U SI NG
AS YNC HRO NOU S AP PROAC H.
Number of packages Statistics (ms)
Head Mean
250 0.748
500 0.67
1000 0.559
pending transactions buffer once the transactions have been
verified by the network.
Consequently, we have to set the gas price appropriately in
order for the transactions to be mined quickly. If the price
offered is low, the transactions will not be mined immediately
because the miners would have found other more lucrative
transactions. Consequently, our initial gas price was 1 gwei
for transactions and, by using a simple demand-supply iterative
function, it was finally increased to 5 gwei.
Our tests ran for a week. It was initially noticed that the
workers would crash due to insufficient funds. As the funds
come from the faucet, which happens to be rate limited, we
decided to slowly submit transactions and keep adaptively
funding the address with more ether.
Bandwidth monitoring was performed on the instance where
the blockchain node was installed. The results are plotted in
Figure 8. The evaluation involved requesting a list of packages
along with their versions present on the node. The metadata
for these packages was then requested individually.
Nload was used to track the bandwidth of the instance.
There is a sharp increase in the outgoing bandwidth, as much
as 650.48 Kbit/s. The event happens because all the events
are requested from the node. Requesting the events for the
packages caused the bandwidth to spike up. The outgoing
bandwidth gradually trickled down to 396.32 Kbit/s as other
package metadata was requested.
Latency test was conducted on the Blockchain node. The
results for the test are given in Tables IV and V.
It was noticed that the mean latency to get the package
metadata from the Ethereum node is approximately 148 ms.
This test includes file I/O time. The test was done with
the synchronous API. This approach is not scalable as the
client libraries are run on user’s operating system. A test was
conducted with the asynchronous API which reduced the mean
latency to under a millisecond. The cost to get the overall
package would be higher as there is no handling for parsing
dynamic structures on the web3 client. Once there is a stable
support for dynamic structures, metadata for many packages
can be requested at once.
Figure 7. System architecture for publishing new packages
0 20 40 60 80 100
0
100
200
300
400
500
600
700
seconds
Bandwidth in kBit/s
Bandwidth measurement
Figure 8. Empirical evaluation of Bandwidth usage
The block synchronisation of Ethereum nodes depends on
other peers. The Ethereum node should be fully synchronized
in order to serve data correctly. The full synchronization
mode requires a lot of time as the chain is very large. The
light mode, on the other hand, is the fastest option but does
not have much support from peers and is experimental. The
fast synchronization mode just downloads the block headers
instead of the entire block and it also has better support from
the community compared to the light mode.
The testing was done with single IPFS node. The IPFS
cluster feature can be used to make sure there are replicas
distributed across cluster peers in the network. An IPFS node
which does not participate in the cluster can also request a
binary from the IPFS network on an on-demand basis where
it will have only a selected number of package binaries.
A replication factor can be decided upon to replicate the
packages to a minimum number of cluster peers.
Figure 9 presents a directed graph with all first level
dependencies of the NPM repository. It is noted that a number
of packages in this Javascript ecosystem can have multiple
levels of dependencies. The graph is directly derived from
the data received from the Ethereum node using first level
dependencies only. The highest number of direct dependents
observed in the subset which was tested is 361. Some packages
such as lodash, commander and body-parser show
an increasing number of first level dependents. This number
is likely to grow as previously shown [22]. These packages
are critical to the community as they form the basis of any
sort of development.
Some packages recorded in the ledger have cyclic depend-
encies. The impact of the cyclic nature of the dependencies has
not been fully studied, but we argue that it may be detrimental
to the overall traceability and version control for a repository.
VII. CONCLUSIONS
The proposed system is arguably a reliable decentralised
architecture for package management. The system relies on the
Ethereum to keep systematic track of software dependencies
and provenance preservation. Specifically, smart contracts have
been systematically used for integrating our logic onto the
Ethereum network, where IPFS has been proposed for storing
actual binaries. We strongly believe this work should eventu-
ally have some impact into the open-source ecosystem as it
can be coupled into existing repositories to maintain a cohesive
chain of version and dependencies in software packages.
The Ethereum network can have any number of nodes in the
network at a time. The users can install the Ethereum node on
their infrastructure or use it as a service in order to keep track
of the dependencies. Also, the underlying architecture can be
also be used for any kind of dependency management with
tweaks to the client and version changes in the contract. The
native client will, however, change according to the platform.
Ethereum blockchain network has a good read throughout
compared to the write throughput which could be ideal for
dependency management as the current state of the art receives
billions of installation requests.
Further work should study dependencies at multiple levels
as our current work has only taken into account single-level.
Ideally, it may be useful to study in detail cliques which can
uncover not only functional software properties, but also non-
functional characteristics such as developer relationships, code
(a) Step 1 (b) Step 2
Figure 9. 2-step sequence of the directed graph of packages generated by our system. The centrality of a number of packages (e.g. lodash, commander)
identifies their importance within a specific repository, in this case NPM.
styles, and other useful traits. One distinct possibility is to
build upon our previous work on software migration (pattern-
based [23] and document to graph NoSQL databases [24]) to
enable multi-level, multi-dependency component migration.
REFERENCES
[1] C. W. Krueger, “Software reuse,” ACM Computing Surveys, vol. 24,
no. 2, pp. 131–183, Jun. 1992.
[2] A. Decan, T. Mens, and M. Claes, “On the topology of package
dependency networks: A comparison of three programming language
ecosystems,” in ECSA ’16. Copenhagen: ACM, 2016, pp. 21:1–21:4.
[3] N. B. Ruparelia, “Software development lifecycle models,” SIGSOFT
Softw. Eng. Notes, vol. 35, no. 3, pp. 8–13, May 2010.
[4] J. Highsmith and A. Cockburn, “Agile software development: the
business of innovation,” Computer, vol. 34, no. 9, pp. 120–127, Sep
2001.
[5] D. Qiu, Q. Zhang, and S. Fang, “Reconstructing software high-level
architecture by clustering weighted directed class graph,” International
Journal of Software Engineering and Knowledge Engineering, vol. 25,
no. 04, pp. 701–726, 2015.
[6] R. Naseem, O. Maqbool, and S. Muhammad, “Cooperative clustering
for software modularization,” Journal of Systems and Software, vol. 86,
no. 8, pp. 2045–2062, 2013.
[7] K. J. Ottenstein and L. M. Ottenstein, “The program dependence graph
in a software development environment,” SIGPLAN Not., vol. 19, no. 5,
pp. 177–184, Apr. 1984.
[8] G. Toffetti, S. Brunner, M. Blchlinger, J. Spillner, and T. M. Bohnert,
“Self-managing cloud-native applications: Design, implementation, and
experience,” Future Generation Computer Systems, vol. 72, pp. 165 –
179, 2017.
[9] C. Tucker, D. Shuffelton, R. Jhala, and S. Lerner, “OPIUM: Optimal
package install/uninstall manager,” in ICSE ’07. Minneapolis: IEEE,
2007, pp. 178–188.
[10] P. Abate, R. D. Cosmo, R. Treinen, and S. Zacchiroli, “Dependency
solving: A separate concern in component evolution management,”
Journal of Systems and Software, vol. 85, no. 10, pp. 2228–2240, 2012.
[11] I. Schlueter. (2010) The node package manager and registry. (Last
accessed:1/Dec/2018). [Online]. Available: https://www.npmjs.org
[12] Semantic Versioning user guide, 2013, (Last accessed:1/Dec/2018).
[Online]. Available: https://semver.org/
[13] K. Nikitin, E. Kokoris-Kogias, P. Jovanovic, N. Gailly, L. Gasser,
I. Khoffi, J. Cappos, and B. Ford, “CHAINIAC: proactive software-
update transparency via collectively signed skipchains and verified
builds,” in USENIX Security 2017. Vancouver: USENIX Association,
Aug. 2017, pp. 1271–1287.
[14] E. Wittern, P. Suter, and S. Rajagopalan, “A look at the dynamics of the
JavaScript package ecosystem,” in MSR’16. Austin: ACM/IEEE, May
2016, pp. 351–361.
[15] R. Cox, “Version SAT,” 2016, (Last accessed:1/Dec/2018). [Online].
Available: https://research.swtch.com/version-sat
[16] J. Benet. (2014) IPFS-content addressed, versioned, P2P file system.
(Last accessed:1/Dec/2018). [Online]. Available: https://arxiv.org/pdf/
1407.3561.pdf
[17] S. Alam, M. Kelly, and M. L. Nelson, “Interplanetary wayback: The
permanent web archive,” in JCDL ’16. Newark: ACM, 2016, pp. 273–
274.
[18] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoop
distributed file system,” in MSST’10. Lake Tahoe: IEEE, May 2010,
pp. 1–10.
[19] F. Hawlitschek, B. Notheisen, and T. Teubner, “The limits of trust-
free systems: A literature review on blockchain technology and trust in
the sharing economy,” Electronic Commerce Research and Applications,
vol. 29, pp. 50–63, 2018.
[20] S. Nakamoto. (2008) Bitcoin: A peer-to-peer electronic cash system.
(Last accessed:1/Dec/2018). [Online]. Available: https://bitcoin.org/
bitcoin.pdf
[21] V. Buterin et al., “A next-generation smart contract and decentralized
application platform,” Tech. Rep., 2014, (Last accessed:1/Dec/2018).
[Online]. Available: https://github.com/ethereum/wiki/wiki/White-Paper
[22] A. Kashcha. (2018) Npm rank. (Last accessed:1/Dec/2018). [Online].
Available: https://gist.github.com/anvaka/8e8fa57c7ee1350e3491
[23] S. Boob, H. Gonz´
alez-V´
elez, and A. M. Popescu, “Automated instanti-
ation of heterogeneous fast flow CPU/GPU parallel pattern applications
in clouds,” in PDP 2014. Torino: IEEE, Feb. 2014, pp. 162–169.
[24] A. Bansel, H. Gonz´
alez-V´
elez, and A. E. Chis, “Cloud-based NoSQL
data migration,” in PDP 2016. Heraklion: IEEE, Feb. 2016, pp. 224–
231.
... Bubbles' sizes represent the frequency of studies that fall within these intersections (The total number of studies in this map overcomes the number of selected studies, because one study, in specific, [30] refers to more than one topic). Inspired by SWEBOK, we classified the selected studies into 8 categories, namely, (i) software requirements [43], (ii) software engineering process [30,[44][45][46], (iii) software testing [47,48], (iv) software quality [49][50][51], (v) software maintenance [29], (vi) software configuration management [30,41,52], (vii) software engineering management [37,42,[53][54][55][56], and (viii) professional practice [57][58][59]. We briefly describe the SE applications below. ...
... This approach aims to replace centralized package management systems, in order to democratize and professionalize the SE field. In turn, D'mello and González-Vélez [52] proposed a blockchain-enabled package control system that stores metadata of the new packages (owner name, package name, version and dependencies) on the distributed ledger and package files on IPFS (InterPlanetary File System). Their model was successfully tested with 4338 packages from NPM (Node Package Manager). ...
... However, these authors do not provide information about how consensus is reached regarding the order of transactions. Ethereum [29], [37], [44] *, [52], [53] ∆ , [54] ¥ , [55] † , [59] Hyperledger Fabric [48], [49], [56] ♦ , [58] Others Susereum [50] * ChainSoft, ∆ CAG, ¥ Blinker, † ShIFt, ♦ Blockhub. ...
The novel, yet disruptive blockchain technology has witnessed growing attention, due to its intrinsic potential. Besides the conventional domains that benefit from such potential, such as finance, supply chain and healthcare, blockchain use cases in software engineering have emerged recently. In this study, we aim to contribute to the body of knowledge of blockchain-oriented software engineering by providing an adequate overview of the software engineering applications enabled by blockchain technology. To do so, we carried out a systematic mapping study and identified 22 primary studies. Then, we extracted data within the research type, research topic and contribution type facets. Findings suggest an increasing trend of studies since 2018. Additionally, findings reveal the potential of using blockchain technologies as an alternative to centralized systems, such as GitHub, Travis CI, and cloud-based package managers, and also to establish trust between parties in collaborative software development. We also found out that smart contracts can enable the automation of a variety of software engineering activities that usually require human reasoning, such as the acceptance phase, payments to software engineers, and compliance adherence. In spite of the fact that the field is not yet mature, we believe that this systematic mapping study provides a holistic overview that may benefit researchers interested in bringing blockchain to the software industry, and practitioners willing to understand how blockchain can transform the software development industry.
... The MAXnpm tool exhibits enhanced efficacy compared to the conventional npm approach, particularly in its ability to mitigate vulnerable dependencies by 30.51% and provide newer package solutions with an average improvement of 2.62%. The second tool is by D'mello et al. [20]. The solution makes use of smart contracts and Ethereum's blockchain technology. ...
We conducted a thorough SLR to better grasp the challenges and possible solutions associated with existing npm security tools. Our goal was to delve into documented experiences and findings. Specifically, we were keen to learn about the motivations behind choosing third-party packages, software engineers' responses to warning messages, and their overall understanding of security issues. The main aim of this review was to pinpoint prevailing trends, methods, and concerns in trust tools for the present npm environment. Furthermore, we sought to understand the complexities of integrating SECO into platforms such as npm. By analyzing earlier studies, our intention was to spot any overlooked areas and steer our research to address them.
... Consequently, different research groups have explored the impact of blockchain technology on business models in different domains including healthcare (Hardin & Kotz, 2021), Industry 4.0 (Putz, Dietz, Empl, & Pernul, 2021;da Rosa Righi, Alberti, & Singh, 2020), payments (Holotiuk, Pisani, & Moormann, 2017), smart cities (Esposito, Ficco, & Gupta, 2021;Hirtan, Dobre, & González-Vélez, 2020), smart vehicles (Oham, Michelin, Jurdak, Kanhere, & Jha, 2021), supply chain (Chang, Chen, & Lu, 2019), fake news detection (Chen, Srivastava, Parizi, Aloqaily, & Ridhawi, 2020), and tourism (Leal, Veloso, Malheiro and González-Vélez, 2020). Moreover, blockchain has been explored to ensure data integrity in computer systems F. Leal et al. including cloud storage (Li, Wu, Jiang, & Srikanthan, 2020), Internet of Things (Zhao, Chen, Liu, Baker, & Zhang, 2020), and opensource software repositories (D'Mello & González-Vélez, 2019). Specifically, Li et al. (2020) propose an auditing scheme to ensure completeness and correctness of the data underlying blockchain technology for enhanced reliability in decentralised networks. ...
Multi-service networks aim to efficiently supply distinct goods within the same infrastructure by relying on a (typically centralised) authority to manage and coordinate their differential delivery at specific prices. In turn, final customers constantly seek to lower costs whilst maximising quality and reliability. This paper proposes a decentralised business model for multi-service networks using Ethereum blockchain features – gas, transactions, and smart contracts – to execute multiple services at different prices. By employing the Ethereum cryptocurrency token, Ether, to quantify the quality of service and reliability of distinct private Ethereum networks, our model concurrently processes streams of services at different gas prices while differentially delivering reliability and service quality. This multi-service business model has been extensively tested on five concurrent Ethereum networks with various combinations of gas prices, miners, and regular nodes using a Proof of Authority consensus algorithm and throughput as the evaluation metric. It has exhibited linear scalability, providing increased throughput in high-quality Ethereum networks, i.e., composed of more validator nodes. The results also indicate that different mining prices do not impact the network performance, but networks with more miners had limited scalability and an increased level of trustworthiness and reliability.
... Indeed, the most widely known for cryptocurrency, the Bitcoin [94], remains something like the gold Standard for financial blockchain applications. Nonetheless, while blockchains have been used extensively in financial entities, their decentralized immutability characteristics have made them particularly suitable for applications in other domains as diverse as Law [92], Food Traceability [50], and Open-source Software Management [42]. To highlight the importance of blockchain technologies evaluation, an example in surrounding blockchain technologies for Energy markets follows in the next subsection. ...
This chapter surveys the state-of-the-art in forecasting cryptocurrency value by Sentiment Analysis. Key compounding perspectives of current challenges are addressed, including blockchains, data collection, annotation, and filtering, and sentiment analysis metrics using data streams and cloud platforms. We have explored the domain based on this problem-solving metric perspective, i.e., as technical analysis, forecasting, and estimation using a standardized ledger-based technology. The envisioned tools based on forecasting are then suggested, i.e., ranking Initial Coin Offering (ICO) values for incoming cryptocurrencies, trading strategies employing the new Sentiment Analysis metrics, and risk aversion in cryptocurrencies trading through a multi-objective portfolio selection. Our perspective is rationalized on the perspective on elastic demand of computational resources for cloud infrastructures.
In our world today, organizations are adopting various means and pathways to collaborate strategically with other organizations toward boosting their productivity, expanding their business growth, and attaining new frontiers in innovation. One critical challenge has inhibited this process in service collaboration among various chains of businesses, which is the lack of trust among organizations. A forthcoming approach to mitigate these issues is by leveraging the applications of blockchain technology made possible by embedding business execution workflow with smart contracts. This approach can drastically improve transparency, accountability, and trust decentralization among corporations. Blockchain technology is able to provide an immutable and auditable distributed architecture capable of supporting a distributed workflow management system, which is the scope of this chapter. Overall, it enables the fast automation of business processes and workflows using smart contract to enforce project agreements across the network. This would bolster collaboration between trustless entities without a dependency of third-party involvement. In general, the existing blockchain-based workflow management systems focus on workflow coordination. This chapter thus proposes to illustrate an information workflow mechanism for inter-organizational collaboration enforced via smart contract and deployed on the blockchain network as well as a mini-implementation of this process on a Solidity IDE programming platform and associated results.
Transportation is the most convenient way of movement for any object or person from one location to another. Modern day transportation has become a big concern due to the increasing accident rates, traffic congestion, carbon emissions, air pollutions, etc. Such complexities needed to be resolved using the virtual technology for transportations, popularly known as Intelligent Transport System (ITS). Whilst digitalizing the transport system, one of the major concerns was collecting the user data, authenticating them, and storing them in a tamper-proof manner. This paper helps us to explore how Blockchain can be used to build a tamper-proof database for ITS. ITS is the application which can sense, analyse, control and communicate information with the physical transportation. ITS has already been applied in several applications, few of them could be seen in car navigations, traffic signal control systems, container management systems, automatic number plate recognition systems, security CCTV systems, feedback systems, parking guidance and information systems, weather information systems, and many more.KeywordsIntelligent transport systemBlockchainSecurityTrustReputation
Software-update mechanisms are critical to the security of modern systems, but their typically centralized design presents a lucrative and frequently attacked target. In this work, we propose CHAINIAC, a decentralized software-update framework that eliminates single points of failure , enforces transparency, and provides efficient verifi-ability of integrity and authenticity for software-release processes. Independent witness servers collectively verify conformance of software updates to release policies, build verifiers validate the source-to-binary correspondence , and a tamper-proof release log stores collectively signed updates, thus ensuring that no release is accepted by clients before being widely disclosed and validated. The release log embodies a skipchain, a novel data structure , enabling arbitrarily out-of-date clients to efficiently validate updates and signing keys. Evaluation of our CHAINIAC prototype on reproducible Debian packages shows that the automated update process takes the average of 5 minutes per release for individual packages, and only 20 seconds for the aggregate timeline. We further evaluate the framework using real-world data from the PyPI package repository and show that it offers clients security comparable to verifying every single update themselves while consuming only one-fifth of the bandwidth and having a minimal computational overhead.
To facilitate permanence and collaboration in web archives, we built InterPlanetary Wayback to disseminate the contents of WARC files into the IPFS network. IPFS is a peer-to-peer content-addressable file system that inherently allows deduplication and facilitates opt-in replication. We split the header and payload of WARC response records before disseminating into IPFS to leverage the deduplication, build a CDXJ index, and combine them at the time of replay. From a 1.0 GB sample Archive-It collection of WARCs containing 21,994 mementos, we found that on an average, 570 files can be indexed and disseminated into IPFS per minute. We also found that in our naive prototype implementation, replay took on an average 370 milliseconds per request.
At the tip of the hype cycle, trust-free systems based on blockchain technology promise to revolutionize interactions between peers that require high degrees of trust, usually facilitated by third party providers. Peer-to-peer platforms for resource sharing represent a frequently discussed field of application for “trust-free” blockchain technology. However, trust between peers plays a crucial and complex role in virtually all sharing economy interactions. In this article, we hence shed light on how these conflicting notions may be resolved and explore the potential of blockchain technology for dissolving the issue of trust in the sharing economy. By means of a dual literature review we find that 1) the conceptualization of trust differs substantially between the contexts of blockchain and the sharing economy, 2) blockchain technology is to some degree suitable to replace trust in platform providers, and that 3) trust-free systems are hardly transferable to sharing economy interactions and will crucially depend on the development of trusted interfaces for blockchain-based sharing economy ecosystems.
Running applications in the cloud efficiently requires much more than deploying software in virtual machines. Cloud applications have to be continuously managed: (1) to adjust their resources to the incoming load and (2) to face transient failures replicating and restarting components to provide resiliency on unreliable infrastructure. Continuous management monitors application and infrastructural metrics to provide automated and responsive reactions to failures (health management) and changing environmental conditions (auto-scaling) minimizing human intervention. In the current practice, management functionalities are provided as infrastructural or third party services. In both cases they are external to the application deployment. We claim that this approach has intrinsic limits, namely that separating management functionalities from the application prevents them from naturally scaling with the application and requires additional management code and human intervention. Moreover, using infrastructure provider services for management functionalities results in vendor lock-in effectively preventing cloud applications to adapt and run on the most effective cloud for the job. In this paper we discuss the main characteristics of cloud native applications, propose a novel architecture that enables scalable and resilient self-managing applications in the cloud, and relate on our experience in porting a legacy application to the cloud applying cloud-native principles.
Package-based software ecosystems are composed of thousands of interdependent software packages. Many empirical studies have focused on software packages belonging to a single software ecosystem, and suggest to generalise the results to more ecosystems. We claim that such a generalisation is not always possible, because the technical structure of software ecosystems can be very different, even if these ecosystems belong to the same domain. We confirm this claim through a study of three big and popular package- based programming language ecosystems: R’s CRAN archive network, Python’s PyPI distribution, and JavaScript’s NPM package manager. We study and compare the structure of their package dependency graphs and reveal some important differences that may make it difficult to generalise the findings of one ecosystem to another one.
The node package manager (npm) serves as the frontend to a large repository of JavaScript-based software packages, which foster the development of currently huge amounts of server-side Node. js and client-side JavaScript applications. In a span of 6 years since its inception, npm has grown to become one of the largest software ecosystems, hosting more than 230, 000 packages, with hundreds of millions of package installations every week. In this paper, we examine the npm ecosystem from two complementary perspectives: 1) we look at package descriptions, the dependencies among them, and download metrics, and 2) we look at the use of npm packages in publicly available applications hosted on GitHub. In both perspectives, we consider historical data, providing us with a unique view on the evolution of the ecosystem. We present analyses that provide insights into the ecosystem's growth and activity, into conflicting measures of package popularity, and into the adoption of package versions over time. These insights help understand the evolution of npm, design better package recommendation engines, and can help developers understand how their packages are being used.
The internal program representation chosen for a software development environment plays a critical role in the nature of that environment. A form should facilitate implementation and contribute to the responsiveness of the environment to the user. The program dependence graph (PDG) may be a suitable internal form. It allows programs to be sliced in linear time for debugging and for use by language-directed editors. The slices obtained are more accurate than those obtained with existing methods because I/O is accounted for correctly and irrelevant statements on multi-statement lines are not displayed. The PDG may be interpreted in a data driven fashion or may have highly optimized (including vectorized) code produced from it. It is amenable to incremental data flow analysis, improving response time to the user in an interactive environment and facilitating debugging through data flow anomaly detection. It may also offer a good basis for software complexity metrics, adding to the completeness of an environment based on it.
Software architecture reconstruction plays an important role in software reuse, evolution and maintenance. Clustering is a promising technique for software architecture reconstruction. However, the representation of software, which serves as clustering input, and the clustering algorithm need to be improved in real applications. The representation should contain appropriate and adequate information of software. Furthermore, the clustering algorithm should be adapted to the particular demands of software architecture reconstruction well. In this paper, we first extract Weighted Directed Class Graph (WDCG) to represent object-oriented software. WDCG is a structural and quantitative representation of software, which contains not only the static information of software source code but also the dynamic information of software execution. Then we propose a WDCG-based Clustering Algorithm (WDCG-CA) to reconstruct high-level software architecture. WDCG-CA makes full use of the structural and quantitative information of WDCG, and avoids wrong compositions and arbitrary partitions successfully in the process of reconstructing software architecture. We introduce four metrics to evaluate the performance of WDCG-CA. The results of the comparative experiments show that WDCG-CA outperforms the comparative approaches in most cases in terms of the four metrics.
The InterPlanetary File System (IPFS) is a peer-to-peer distributed file system that seeks to connect all computing devices with the same system of files. In some ways, IPFS is similar to the Web, but IPFS could be seen as a single BitTorrent swarm, exchanging objects within one Git repository. In other words, IPFS provides a high throughput content-addressed block storage model, with content-addressed hyper links. This forms a generalized Merkle DAG, a data structure upon which one can build versioned file systems, blockchains, and even a Permanent Web. IPFS combines a distributed hashtable, an incentivized block exchange, and a self-certifying namespace. IPFS has no single point of failure, and nodes do not need to trust each other.

